Perl programming for a simple text parsing script

  • Status Closed
  • Budget $10 - $30 USD
  • Total Bids 24

Project Description

We are mining a database of English articles and need a Perl programmer to assist us in generating some simple Perl scripts for this purpose.

The following task is one of the many upcoming projects and we a looking to hire somebody with a longer term employment relationship in mind.

The task:

We have a large tab-delimited file, the 6th columns of which contains the data of our interest. On the 6th column of each row is a set of comma-separated English words that we have already reduced to their dictionary form. We require a script that loops through all the sets of words and generates the following statistics as output.

1. A list of all unique words present over the entire column. Associated with each unique word should be term frequency, term rate (term frequency of current word divided by total number of all words in the file), document frequency (number of rows the word appears in) and document rate (document frequency of current word divided by number of rows).

2. All of the above statistics, but this time for bi-grams: bi-gram frequency, bi-gram rate, and document frequency and rate for bi-grams. Bi-grams are neighboring pairs of words. For example in the previous sentence, the bi-grams would be (all,of), (of,the), (the,above), etc. but NOT (of,all).

Please include both cost and duration estimates in your application. Please also include a brief sample of your previous Perl code.

Get free quotes for a project like this
Skills Required

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online