Initial project: You will be given a live web forum with three data variables of interest to obtain: Topic, Absolute URL, and Total Number of Posts.
Each Topic is to be grouped with its Absolute URL and Total Number of Posts. Data should be further tagged with Subforum, of which there are less than 3. You could conceivably do so manually if needed.
The final deliverable of this project is twofold:
1) Provide a file with a list of all forum topics, sorted by Total Number of Posts. A CSV file will suffice for this purpose, but I am happy to let you propose a more creative solution. So long as, at the end of the day, I can sort all of this forum's posts in Descending Order by Total Number of Posts, I will be satisfied.
2) Provide the source code in the form of a Git or Mercurial repository.
The forum itself is quite small, with less than 5,000 topics in total needing to be scraped and sorted.
No content has to be scraped from the forum posts themselves. Your only requirement is to scrape the Topic, the URL, and the Total Number of Posts. That said, if you think you can get the forum posts as well (e.g. the individual threads and posts themselves), it could make you more valuable for future projects.
Once the work is finished, you will instantly have the opportunity for a second project: scrape a blog and sort all blog posts by total number of Blog Comments.