Looking for someone to build HTML scrapers for various websites to scrape, clean and collect the data in a database.
The code should run easily and be able to generate CVS files etc, or save in a local space
you will be required to provide code for scraping 30 websites at least, websites you can see that are of the type include William Hill, [url removed, login to view], World Betting Exchange...
you will need to parse the webpages, normalise the data, build a UI and present basic functionalities.
every line of code will need to be documented
data will need to be stored and cleaned, normalised and saved on AWS, and on CSV files.
You will need to commit to an NDA before the project is assigned.
All real time data (odd values, team details etc.) will be scrapped from various bookie sites and exchanges using their individual APIs/feeds; the script for this purpose will be written in PHP and HipHop VM will be used to achieve superior performance with a just-in-time compilation approach in C++. or Java All processed data will then be passed/ streamed to the TRIDENT API which will be integrated at the top of the Storm.