Craigslist scraper and parser

Cancelled

We need a Craigslist scraper and parser with (source code; preferably python) that automatically archives multiple RSS feeds from Craigslist.

Running the parsed on the scraped logfile should provide word usage frequency based on gender of poster (extracted from the w4m or m4w header), city, day of posting. The program should allow the user to choose a range of dates (extracted from the timestamps) to pull the statistics from.

Outputs: (1) XML files with the archived feeds for each city and craigslist category

(2) Daily CSV files listing 100 most frequent words categorized by each gender , city, and age group (excluding articles and common modifiers like "a" "an" "the" "for" etc).

An example text file would look like this:

header: 07-01-2009,female, atlanta, 20-25

love,112

passion,93

independent,56

caring, 46

......

.......

CSV files should also be generated for all cities, and all age groups.

Headers for these files would look like this

header: 07-01-2009,female, all, 20-25

header: 07-01-2009, female, miami, all

header: 07-01-2009, female, all, all

Skills: C Programming, Data Processing, Python

See more: python look for file, rss parser scraper, python craigstlist scraper, statistics python, xml parser, scraper, Passion, parsed, miami, frequency, craigslist program, archives, craigslist text w4m, pull csv xml, posting multiple craigslist cities, word poster, python csv xml, parser csv, cities file, python parser csv, group scraper, csv xml python, source code word frequency, w4m code, python csv parser

Project ID: #460955

1 freelancer is bidding on average $500 for this job

akshayfastin

Please check my PM for details.

$500 USD in 7 days
(4 Reviews)
3.6