Forums Parser

IN PROGRESS
Bids
16
Avg Bid (CAD)
$302
Project Budget (CAD)
$300 - $300

Project Description:
BUDGET: 300$
YOU HAVE 36H to do the job. Do not bid if your not up to the challenge.

Source target: http://forums.unfiction.com/forums/

Features required:

#1 - Extract *all* Thread from discussion folder
PARAMETERS:
- Folder path (ex: http://forums.unfiction.com/forums/index.php?f=10 )
- Number of pages (ex: 2 page to crawl, or * for all page)
- Thread Filter - Regex to include or exclude thread by title. (Ex: include "tralhead" you only extract date for thread with the word trailhead in the title)

OUTPUT IN CSV:
- Thread ID (ex: t=31495)
- Thread Title (ex: [Trailhead] Behind The Yellow Curtain BTYC Ep5)
- Authors ID (ex u=7899)
- Replies (ex: 1241)
- Views (ex: 123421)
- Last Post Date (ex: 2011/12/30)


2- Batch Extract thread stats
PARAMETERS:
- Load a list of Thread ID from a CSV file. (Ex: t=31495, t=24481, etc...)

OUTPUT IN CSV:
- Thread ID (ex: t=31495)
- Number of post in thread
- Number of unique author in thread
- First Post Date (ex: 2011/12/30)
- Last Post Date (ex: 2011/12/30)


3- Deep extract thread stats
PARAMETERS:
- Unique Thread ID. (Ex: t=31495)
- Bolean (yes - no) - Strict word count. (Exclude "Quote" content and Signature from word count and spoiler / href tag detection)

OUTPUT IN CSV:
- Post ID
- Post Date & time (ex: 2011/12/30 23:09)
- Author ID
- Author Name
- Word count
- Spoiler tag present (true/false)
- Video tag present (true/false)
- URL href present (true/false)


4- Batch Extract users stats
PARAMETERS:
- Load a list of User ID from a CSV file. (Ex: u=7899)

OUTPUT IN CSV:
- Joined Date
- Total Post
- Posts per day
- Location


NICE TO HAVE;
- Throttle request per seconds (so I don't have any impact on the website while extracting the stats)
- Automatically crawl everything and extract all data into an access database with 4 table and 'joint' to store all data.

Skills required:
Data Entry, Data Mining, PHP, Web Scraping
Hire Anashel
Project posted by:
Anashel Canada
Verified
Public Clarification Board
Bids are hidden by the project creator. Log in as the project creator or as one of the bidders to view bids.
You will not be able to bid on this project if you are not qualified in one of the job categories. To see your qualifications click here.