Closed

Data Processing/Scraping from Standard Format txt Files

This project was awarded to lafor for $250 USD.

Get free quotes for a project like this
Employer working
Awarded to:
Project Budget
$30 - $250 USD
Total Bids
42
Project Description

Hi, we are looking to hire someone to manipulate already existing data files (will be given web link) that are in a standard .txt file format with numeric and text entries to a format used for computing.

1) We would like you to start with taking 100 of the entries (randomly selected with random number generator) in one of the 30 files we will give you.

2) We would like you to transform these 100 entries into a matrix in .csv form based on pre-specified categories given by us. Two of the columns are word and word count. Another is entry ID.

3) We also would like a sparse representation of the two columns of word and word count where there is a new matrix (rows are entry #, columns are word label - filled with the count) and that depends on size of file. We can talk about this.

4) The deliverable should be in manageable csv file sizes, which won't be a problem for this data...

But, we will definitely have more work if this is done successfully (over all files and more entries needed), so scalable routines are highly encouraged. Thinking about a million entries with a higher budget, if this goes well.

Thank you very much.

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online