Closed

Craigslist scraper and parser

This project received 1 bids from talented freelancers with an average bid price of $500 USD.

Get free quotes for a project like this
Employer working
Project Budget
$250 - $750 USD
Total Bids
1
Project Description

We need a Craigslist scraper and parser with (source code; preferably python) that automatically archives multiple RSS feeds from Craigslist.

Running the parsed on the scraped logfile should provide word usage frequency based on gender of poster (extracted from the w4m or m4w header), city, day of posting. The program should allow the user to choose a range of dates (extracted from the timestamps) to pull the statistics from.

Outputs: (1) XML files with the archived feeds for each city and craigslist category
(2) Daily CSV files listing 100 most frequent words categorized by each gender , city, and age group (excluding articles and common modifiers like "a" "an" "the" "for" etc).
An example text file would look like this:

header: 07-01-2009,female, atlanta, 20-25
love,112
passion,93
independent,56
caring, 46
......
.......

CSV files should also be generated for all cities, and all age groups.
Headers for these files would look like this
header: 07-01-2009,female, all, 20-25
header: 07-01-2009, female, miami, all
header: 07-01-2009, female, all, all

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online