You have chosen to sponsor your bid up to a maximum amount of .
I am a researcher at a university and I need someone who is experienced in crawling and data collection to help me with the following:
1. Crawl into a newspaper website (I will provide which site) and scape (1) the text of articles that appeared on the site in the past 1-2 years (2) scrape the comments to the articles (written by users) for each article.
2. Collect the information that is available on the news website for each user.
3. From the congress database (open to public) collect the congressional speech texts from the past 1-2 years.
If we can manage the above, I have follow up projects that I can potentially work with you given we mutually agree on it. I am looking for someone I can work with for a long term if I am happy with the work.
Additional Project Description:
12/10/2011 at 10:20 SGT
SPECIFIC PROJECT DETAILS
Here is what I need in detail:
1. Go to NY Times
2. From the most popular list, find the 10 articles that are most viewed for that day:
3. For each link, collect the data
- Author Name,
- Text of the Article itself
- Comments for the Article (Commentor Name and Comment Text)
Repeat this for the past 365 days (1 year)
4. Then for each commentor in overall the list, when their names are clickable collect information on the number of previous comments, the date of the earliest comment, number of people following and followed by this person.
For half the articles, there will not be comments, and for another half, the commentors' links will not be clickable. So the actual end data is likely to be smaller.
Can you conduct this data collection for the past year?
Let me know if you think this is feasible, before I agree.