Python script to automate data combination of CSVs
This project was successfully completed by networls for $150 USD in 3 days.Get free quotes for a project like this
Project Budget$30-$250 USD
Completed In3 days
This project is to write a python script and turn it into a .exe that can run on windows 7 or later, and a .app that can run on mac OSX 10.4 or later. The python script, the .exe, and the .app are all deliverables.
We are a renewable energy company that works in buildings. In every building we work in, we get dozens of .CSV, .XLS, and .XLSX files that need to be combined into a single CSV. Each file has differing header information in the top in multiple rows (not necessarily just 2, it could be 20 rows of header info), then multiple columns of data beneath the header information. Some files may only have two columns of data, whereas other files may have 20 or more columns of data.
Each file also has a column (or two) of date/time stamps, showing the date and time each measurement in a row was taken. The time stamps from different files do not necessarily start at the same time – e.g. one may start at 3pm on July 24th, whereas another may start at 7pm on July 27th. Furthermore, the time stamps do not increment in the same step size – e.g. one may increase by five minutes per row, one may increase by 15 minutes per row, and a third may increase randomly per row.
Here is what the program must do:
• Read every CSV, XLS, and XLSX file in whatever folder the .app or .exe is placed. This way we can copy the app to a new folder full of csvs to combine.
• Combine all the CSVs, XLS, and XLSX files into one CSV, with header information from each individual file and the name of that file preserved above the columns from that file.
• Make a master time stamp column on the left side of the document:
o Find the smallest increment of any date/time stamp column – e.g. if there are time stamps incrementing at 1 minute, 5 minutes, and random minutes, make the master column of date/time stamps on the left increment at 1 minute per row.
o The master date/time stamp should start with the date/time of the earliest data point from any file, and end at the date/time of the latest data point from any column.
• Line up and space out all data columns so they correspond correctly to the master time stamp on the left.
• Delete redundant date/time stamp columns so there is only the master date/time stamp column on the left and no other date/time columns
• Make sure all header information is preserved over the proper columns
o Add in the name of the original file above the columns!
• Write everything to a new CSV called “[url removed, login to view]”
I’ve uploaded examples of data files of the type that will need to be combined, as well as an example of the final output file needed so you can see what I’m talking about. The examples are "14836...", "18102", and "condensate pump". The example output file is "combined_data". (Please note, in “[url removed, login to view]", columns AD through the end are NOT empty. There is data in row 6615, for example.)
Thank you for your help, and I look forward to working with you!
Looking to make some money?
- Set your budget and the timeframe
- Outline your proposal
- Get paid for your work
Hire Freelancers who also bid on this project
Looking for work?
Work on projects like this and make money from home!Sign Up Now
- The New York Times
- Wall Street Journal
- Times Online