Python script to automate data combination of CSVs

IN PROGRESS
Bids
8
Avg Bid (USD)
$164
Project Budget (USD)
$30 - $250

Project Description:
This project is to write a python script and turn it into a .exe that can run on windows 7 or later, and a .app that can run on mac OSX 10.4 or later. The python script, the .exe, and the .app are all deliverables.

OVERVIEW
We are a renewable energy company that works in buildings. In every building we work in, we get dozens of .CSV, .XLS, and .XLSX files that need to be combined into a single CSV. Each file has differing header information in the top in multiple rows (not necessarily just 2, it could be 20 rows of header info), then multiple columns of data beneath the header information. Some files may only have two columns of data, whereas other files may have 20 or more columns of data.

Each file also has a column (or two) of date/time stamps, showing the date and time each measurement in a row was taken. The time stamps from different files do not necessarily start at the same time – e.g. one may start at 3pm on July 24th, whereas another may start at 7pm on July 27th. Furthermore, the time stamps do not increment in the same step size – e.g. one may increase by five minutes per row, one may increase by 15 minutes per row, and a third may increase randomly per row.

Here is what the program must do:

• Read every CSV, XLS, and XLSX file in whatever folder the .app or .exe is placed. This way we can copy the app to a new folder full of csvs to combine.
• Combine all the CSVs, XLS, and XLSX files into one CSV, with header information from each individual file and the name of that file preserved above the columns from that file.
• Make a master time stamp column on the left side of the document:
o Find the smallest increment of any date/time stamp column – e.g. if there are time stamps incrementing at 1 minute, 5 minutes, and random minutes, make the master column of date/time stamps on the left increment at 1 minute per row.
o The master date/time stamp should start with the date/time of the earliest data point from any file, and end at the date/time of the latest data point from any column.
• Line up and space out all data columns so they correspond correctly to the master time stamp on the left.
• Delete redundant date/time stamp columns so there is only the master date/time stamp column on the left and no other date/time columns
• Make sure all header information is preserved over the proper columns
o Add in the name of the original file above the columns!
• Write everything to a new CSV called “combined_data.csv”

I’ve uploaded examples of data files of the type that will need to be combined, as well as an example of the final output file needed so you can see what I’m talking about. The examples are "14836...", "18102", and "condensate pump". The example output file is "combined_data". (Please note, in “combined_data.csv", columns AD through the end are NOT empty. There is data in row 6615, for example.)

Thank you for your help, and I look forward to working with you!

Best,
Brenden

Skills required:
Data Processing, Excel, Python, Software Architecture
Additional Files: 14764.xlsx 18102.csv combined_data.csv Condensate_Pump.csv
About the employer:
Verified
Public Clarification Board
Bids are hidden by the project creator. Log in as the employer to view bids or to bid on this project.
You will not be able to bid on this project if you are not qualified in one of the job categories. To see your qualifications click here.


$ 200
in 8 days
$ 250
in 2 days
Hire networls
$ 150
in 3 days
Hire sofina2006
$ 150
in 4 days
$ 230
in 3 days
Hire euripedesrocha
$ 100
in 5 days
Hire mrezam
$ 200
in 2 days
$ 30
in 1 days