Closed

Screen-scraping comments for academic research purposes

This project received 18 bids from talented freelancers with an average bid price of $156 USD.

Get free quotes for a project like this
Employer working
Skills Required
Project Budget
$30 - $250 USD
Total Bids
18
Project Description

I am looking for java developers who can write a java code* that screen-scrape a specific site. I am only interested in collecting the comments found in that site for academic research purposes. It is an Arabic news site, where it has major sections like: Political news, Financial News, Sports news, Technology news,... each major section has subsections, e.g. Political News has the following subsections: Middle East News, Global News, ... . Each subsection has news items, and each news item may or may not have comments. The comments are paged.

I need the scraping code to collect the comments and put them in the following structure:
ID | Section Title | Subsection Title | News Item Title | URL | Comment | Comment Author | Timestamp

example:
1 | Political News | Global News | Mission impossible diplomacy in Beijing | [url removed, login to view] | this is a comment | someone | 2012-03-01 12:00:00
2 | Political News | Global News | Mission impossible diplomacy in Beijing | [url removed, login to view] | this is 2nd comment | som2 | 2012-03-01 12:00:00
3 | Financial News | Banking | IMF to support Some country | [url removed, login to view] | this is a comment | someone | 2012-05-11 12:00:00

I will run the script daily, and the output should be a CSV file.

The code must be provided.

* If you prefer to code in a different language, like Python, you may still bid on this project, but putting in mind that you must then deliver an annotated and explained code to be used and run by someone who only knows java.

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online