We are mining a database of English articles and need a Perl programmer to assist us in generating some simple Perl scripts for this purpose.
The following task is one of the many upcoming projects and we a looking to hire somebody with a longer term employment relationship in mind.
We have a large tab-delimited file, the 6th columns of which contains the data of our interest. On the 6th column of each row is a set of comma-separated English words that we have already reduced to their dictionary form. We require a script that loops through all the sets of words and generates the following statistics as output.
1. A list of all unique words present over the entire column. Associated with each unique word should be term frequency, term rate (term frequency of current word divided by total number of all words in the file), document frequency (number of rows the word appears in) and document rate (document frequency of current word divided by number of rows).
2. All of the above statistics, but this time for bi-grams: bi-gram frequency, bi-gram rate, and document frequency and rate for bi-grams. Bi-grams are neighboring pairs of words. For example in the previous sentence, the bi-grams would be (all,of), (of,the), (the,above), etc. but NOT (of,all).
Please include both cost and duration estimates in your application. Please also include a brief sample of your previous Perl code.
Update Feb 26, 2014:
We just need a simple barebones script that reads the file, splits the columns, splits the words, counts them and outputs to STDOUT. For a competent Perl programmer, this job should take no more than 30 minutes. Please bid accordingly, taking into account both your working hours needed and your hourly rate.
22 freelancers are bidding on average $60 for this job
well versed perl programmer, with experience on CSV parsing. Send the sample file and I can share how output would look like. If it looks ok, we can move forward.