You have chosen to sponsor your bid up to a maximum amount of .
Hi, we are looking to hire someone to manipulate already existing data files (will be given web link) that are in a standard .txt file format with numeric and text entries to a format used for computing.
1) We would like you to start with taking 100 of the entries (randomly selected with random number generator) in one of the 30 files we will give you.
2) We would like you to transform these 100 entries into a matrix in .csv form based on pre-specified categories given by us. Two of the columns are word and word count. Another is entry ID.
3) We also would like a sparse representation of the two columns of word and word count where there is a new matrix (rows are entry #, columns are word label - filled with the count) and that depends on size of file. We can talk about this.
4) The deliverable should be in manageable csv file sizes, which won't be a problem for this data...
But, we will definitely have more work if this is done successfully (over all files and more entries needed), so scalable routines are highly encouraged. Thinking about a million entries with a higher budget, if this goes well.
Thank you very much.
Additional Project Description:
10/08/2013 at 6:25 AKDT
Please note that we will only hire someone who has the ability to do this automatically since we are looking for FUTURE work primarily. This is just a pilot.
Once we go from 100 entries to 1 million, manual typing will not work. We realize that file size will be an issue depending on the matrix, so if things eventually need to be broken apart into let's say 1000 files of 1000 entries, we will then use this with parallel computing routines for our computations. Thank you so much and we look forward to working with you.