Hello!
I have a huge txt file with about 2 mio rows. The structure is like this:
Datum Uhrzeit Kurs Volume
09.08.2013 09:03:45 120,4 -148
09.08.2013 09:03:45 120,4 -85
09.08.2013 09:06:00 120,6 163
09.08.2013 09:06:02 120,6 612
09.08.2013 09:07:38 120,1 -58
09.08.2013 09:07:47 120,25 51
09.08.2013 09:07:59 120,25 63
09.08.2013 09:08:04 120,25 -58
09.08.2013 09:08:30 120,4 80
[...]
I need a tool (can run as PHP on my server or coded in any programming language like C#, .NET, Java or else which let the tool run on my Windows 7 computer) which must
1. convert the daytime to timestamp for every row and
2. cluster all entries concerning the same minute to one row. Means: Finally there shall be one row for one minute each.
Let's start with the description for that last point (clustering) first:
Clustering:
As you can see in the example above there are 2 rows with the time value 09:03:xx . If there is more than one row with the same minute value these rows must be aggregated. So all "Volume" values (last value in each row) must be added up and as the "Kurs" (the second last value in each row) the latest saved value from that certain minute is the right one. That means for the example:
-148
- 85
= - 233 (as Volume for the timepoint 09:03:xx)
and
120.4 as the Kurs. By the way: Your script must convert the volume values first because 120,4 for example is german formatting for 120.4 . So to make the mySQL database unterstand the volumes correctly please just replace all ',' with '.' in the very first step for the whole txt file.
The result for the next available minute (09:06:xx) would be
120.6 775
The result for minute 09:07:xx would be
120.25 56
and so on.
Some exeptions:
Exeption 1:
If there is a minute with not even one row then just copy the value of the "Kurs" from the row of the minute before. The value "Volumen" please set to 0 then. Example:
[...]
09.08.2013 11:07:11 118,25 51
09.08.2013 11:07:47 120,35 -22
09.08.2013 11:09:54 121,05 13
[...]
the missing minute 11:08:xx would be
09.08.2013 11:08:00 120,35 0
Exeption 2:
The time frame of the recorded data is always from 09:00 am to 05:35 pm. That means that your script must not care about gaps when the time is later than 05:35:xx pm or earlier than 09:00:00 am the day.
Normally there is a gap between the last dataset from minute 05:30 pm and 05:35 pm. Just handle it as described before and copy the values from 5:30 am to 5:31 5:32 5:33 and 5:34. Finally 5:35 comes with new values. The next available row will not be found before 09:00 am of the next day. Your script shall ignore all timeframes between 5:36 pm and 8:59 am, both included. Normally the very first row of a day starts not in the first minute (9:00:xx) but a couple of minutes later. If so, please use again the copy step but this time the other way round. Look for the first available row of that day and copy the value of the "Kurs" to all minutes before where no values are available. The "Value" must be set to 0.
Example:
[...]
10.08.2013 09:03:45 121,2 122
[...]
will copy back like this:
10.08.2013 09:00:00 121,2 0
10.08.2013 09:01:00 121,2 0
10.08.2013 09:02:00 121,2 0
Convert date/time into timestamp
This is the second preparing job which your script/tool must be able to do. It needs only 5% of the coding effort from the step above. I only excluded it as a second step because I think it is senseful to do converting to timestamp not before your script did all above first. Because it is easier to detect single minutes if there is kind of "9:03:42" instead of 1376038962 :-)
The daytime format you find in every row is german style. It is [login to view URL]
(please check the attached file "[login to view URL]" to read the rest of my description)
Hi Marc,
I had a chance to work with you last year and today I received notification about your new project.
Your requirement is very clear and I am confident that I can do it and give you expected result.
Look forward to work with you again :)
Regards,
Dat
€53 EUR in 2 days
5.0 (80 reviews)
6.4
6.4
7 freelancers are bidding on average €105 EUR for this job