Counting duplicates in big files

I have about 14million rows of data with 6 columns in csv format.

Created a working solution in Power BI that do the trick within 30mins but the program has limitation of row size that can be exported for further processing and can only run 2 files (sometimes buggy) whereas i need to run 6 files in a day.


-a program or any data manipulation software, sql codes that return the counts of the number of rows or entries that have similar content as the current row - from 1 entry only to all 6 columns/entries

-the position of the column is not important in the check e.g. for count of 5 similar entries, the following 2 (representative entries, not actual) rows will have the result of 1 because of 2,3,4,5,6

1,2,3,4,5,6 - 1

2,3,4,5,6,7 - 1

-It should able to return the result fast - not more than 30mins (can be discussed)/ or maximum 4 hours for 6 files.

note: Unfortunately, I cannot give milestone payment for program/solution that cannot meet the processing timing.

Skills: Power BI, Python, SQL, MySQL

See more: upload big files 4gb, ajax upload big files script, upload big files ajax php, php cant upload big files, can load big files internet others, transfer big files, find duplicates excel files, mysql load big files, upload big files php http ajax, virtuemart upload big files, load big files, file upload perl big files, upload big files php status, can delevered big files, upload big files via script, joomla component upload big files, download big files sugarcrm, failed find flength file big files

About the Employer:
( 1 review ) Singapore, Singapore

Project ID: #20307066

27 freelancers are bidding on average $47 for this job


Hi, I'm a data engineer with over 5 years of industry experience on a wide array of tech stacks including databases, data warehouses, machine learning, big data/Hadoop. I'm currently pursuing my Master's in Data Scien More

$50 USD in 3 days
(44 Reviews)

Hi, My name is Ali and I can work on the task with immediate availability. I can do duplication check in SQL Server. Let's have quick discussion so I can work on it.

$30 USD in 3 days
(24 Reviews)

I can upload the file into SQL db using SSIS ETL with removal of duplicate records with efficient performance. And there will not be any restriction of no. of files. You can load N number of files in one go. Let me kno More

$30 USD in 1 day
(7 Reviews)

Hi. I can make a program that can solve your problem. I have enough experience to tackle the problem. Message me to discuss

$25 USD in 7 days
(13 Reviews)

Hi. I can write this program on native language (not c# or pypton) and it will calculate very fast. See my reviews and completion rate on this site. Regards, Alex.

$250 USD in 3 days
(1 Review)

Okay the program will process in your given time. But you need to discuss more over chat about job. Thanks

$30 USD in 2 days
(2 Reviews)

Hey I have got your requirement and can deliver you a SQL script that will compute results within maximum 10 minutes. You can message me to get query and check if it is giving you result within time and then you can a More

$35 USD in 1 day
(1 Review)

I Will fit the time and the requirements after we discuss some points Best regards Ahmed Samir

$55 USD in 3 days
(2 Reviews)

Hi, I can manupulate your csv file by python in 1 day. Please send me message so that we could discuss it further. To make sure that employment will truly serve your requirement, you can evaluate my skill by giving pa More

$30 USD in 1 day
(3 Reviews)

Did you manage to make a decision to pick the freelancer? I have got the code ready and I will test it with the 14million rows of data if you can get me a sample CSV. It’s written in Python and is fairly looks for a More

$10 USD in 2 days
(1 Review)

I see what you want, however its not completely clear. So, I might want to ask a few things first if we decide to work on it. It won't take more than 2 days to complete such a program, so 7 days which I am proposing is More

$25 USD in 7 days
(1 Review)

Hi there! I am 4+ years experienced developer as Python, Django, RoR & ReactJS. Please open the chat box for further discussion. Regards,

$25 USD in 10 days
(1 Review)

Hello, Thanks for posting this job and giving us opportunity to apply on it. I have read project description and can assure you that I can handle this job. Please reply back to get into more details over chat board. More

$20 USD in 7 days
(0 Reviews)

Dear sir. I have read your project details carefully. I am a web full stack developer. I can do your project, perfectly. My Target is best service to customer, credit is first, high quality result. I have 5+ years expe More

$30 USD in 3 days
(1 Review)

Hi, I am an expert in java and python and I can complete this job within a day. I have read your requirements and look forward to working with you. Let's continue this in freelance chat

$40 USD in 1 day
(0 Reviews)

Hi! I can make an application for you on C#. It will be maximally fast and process files in minimum time. I can do that in 1-2 hours. Write me to discuss details. Thanks!

$30 USD in 1 day
(0 Reviews)

I can upload the file into free version on SQL Express DB using Openquery/Openrowset .The csv file to dump in a location in the system where SQL Express .Then using a Tsql script to get the desire result. The whole pro More

$70 USD in 2 days
(0 Reviews)

Myself Anil have more then 10 years of experience in SQL Server databse development and Administration. I have worked with big Databases for clients like match. Com, nationstar mortgage and with TCS. I am also good f More

$35 USD in 1 day
(0 Reviews)

Hi, I understood your problem very nicely, processing large amount of csv data in an efficient and speedy way. Well Python is your tool for this task. This is the type of problem (Data Processing) Python solves the be More

$20 USD in 2 days
(0 Reviews)

I am software engineer and ready to do this work as already experience in doing db query and optimization. Relevant Skills and Experience Having 6 years of experience in Microsoft technologies, Azure Cloud,, c More

$19 USD in 3 days
(0 Reviews)