Perl/Python Programmer for Text Parsing

This project received 16 bids from talented freelancers with an average bid price of $412 USD.

Get free quotes for a project like this
Project Budget
$50 - $500 USD
Total Bids
Project Description

We are looking for an experienced programmer for engagement in long-term freelance work. A background in natural language processing (NLP) and/or computational linguistics would be an asset, but is not required. Pay is commensurate with experience and can either be project- or hourly-based. As part of our hiring process, we ask that interested candidates successfully complete the following tasks to demonstrate basic competency:

1. The SEC stores various text files they receive from companies on their Edgar website. These files are available for public download via FTP. A listing of all files sent to the SEC is stored in a quarterly index file. Go to the SEC’s Edgar website and download any 4 consecutive Company Index Files for PC here, [url removed, login to view] Do not download any index files prior to 2007. The index will point to the physical location of various file types.

2. Using the index files, download all of the full .txt files with a file type of “10-K” only for the 4 consecutive quarters you have chosen.

3. You will have downloaded .txt files which embed HTML, SGML, or XBRL code, in addition to tables, special characters, images, and other embedded files, such as PDF, etc. Flatten the .txt file to its raw text. That is, remove all code, tables, images, or embedded files. All that should remain is raw text. Write the raw text to a .txt file. The filename for the raw text file should be that of its parent with the suffix “_raw” added.

4. Count the number of words and sentences in your flat text file. Count the number of words that match any word in the following array of words: {growth, sales, billion, forecast}. Write the output to an Excel or CSV file.

5. Identify any outstanding issues, questions, or concerns regarding the steps above.

For full consideration, please send your resume, a random sample of 100 raw/flat text files, the output files (counts and matches), and your response to #5 above by June 30th, 2013. We are an equal opportunity employer. Work permits or visas are not required.

Skills Required

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online