This is a project to scrap websites and to load results in a MySQL database.
This database will then be used by our content managers to write short descriptions of results and route users to the original website.
The data consists in interesting activities to be performed in a given city (restaurants, bars, visits, highlights...with addresses, descriptions, pictures...).
The developer will have to:
- Develop a web scraper scrapping 3 different web sites (these sites will be provided once the freelancer selected)
- Load HTML pages into a collection [url removed, login to view] class (Object Oriented programming, class to be developed)
- Load the Activity objects into a MySQL database via an [url removed, login to view] object
- When loading the content in the database, the code will have to check if a similar activity name already exists in the database for this city.
If the activitiy already exists, from the same website, the existing entry is updated
If the activity already exists, from another website, a new entry is created by it is linked to the existing one
If no matching name is found, a new entry is created.
- IMPORTANT note: the scraper MUST respect the [url removed, login to view] policy of the website. If the [url removed, login to view] of the provide websites prevent from doing this project, we will provide you alternative websites
- Pictures will be downloaded into a local folder (the URL to this folder will be loaded in the database)
- Build a simple HTML page to test the crawler. The HTML page will allow the tester to do the following things:
1. Select on of the 3 web sites
2. Manually enter the name of a city
3. The page will build the corresponding URL and start scarping it and loading in the database
- The source code will have to be commented
- Programming language: PHP 5.5.3
No PHP framework is authorized on this project, only standard PHP function and packages have to be used.
Programming must be object oriented.
PDO must be used for data access layer.
- Database : MySQL 5.5
- The deliverables will be tested using a MAMP apache and mysql installation on a Mac OS X environment
The developer will be provided with:
- a graphical datamodel
- an SQL file to create the corresponding data structure
- the URL of the 3 web sites to scrap
The project will consist in 3 steps, which will divide the payment:
- Delivery of the scraper + loading in the database for the 1st website : 50%
- Delivery of the scraper + loading in the database for the 2nd website: 25%
- Delivery of the scraper + loading in the database for the 3rd website: 25%