Web Scraping Needed - JavaScript or Node.JS Solutions Only

  • Status Closed
  • Budget $30 - $250 USD
  • Total Bids 3

Project Description

Initial project: You will be given a live web forum with three data variables of interest to obtain: Topic, Absolute URL, and Total Number of Posts.

Use a JavaScript solution like [url removed, login to view] to scrape every single Topic from this forum, including all subforums and paginated history. Forum does have an RSS feed, but you cannot get the Total Number of Posts from it alone.

Each Topic is to be grouped with its Absolute URL and Total Number of Posts. Data should be further tagged with Subforum, of which there are less than 3. You could conceivably do so manually if needed.

The final deliverable of this project is twofold:

1) Provide a file with a list of all forum topics, sorted by Total Number of Posts. A CSV file will suffice for this purpose, but I am happy to let you propose a more creative solution. So long as, at the end of the day, I can sort all of this forum's posts in Descending Order by Total Number of Posts, I will be satisfied.

2) Provide the source code in the form of a Git or Mercurial repository.

The forum itself is quite small, with less than 5,000 topics in total needing to be scraped and sorted.

No content has to be scraped from the forum posts themselves. Your only requirement is to scrape the Topic, the URL, and the Total Number of Posts. That said, if you think you can get the forum posts as well (e.g. the individual threads and posts themselves), it could make you more valuable for future projects.

Once the work is finished, you will instantly have the opportunity for a second project: scrape a blog and sort all blog posts by total number of Blog Comments.

Get free quotes for a project like this
Skills Required

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online