We need a simple web scrapper with the following requirements. Please indicate the delivery time frame in your response as this is important for this project. The requirements:
- the output must be .csv file. also, the scrapper should retrieve and store images, see below.
- the scrapper must be JAVA based (java 6 will do) utilizing [url removed, login to view] libraries/packages for network communication. The code must be commented and clean (e.g. using constants for all major attributes such as URL to be crawled, etc.)
- the project must be provided in a form of an Eclipse project for easier testing on our side.
- the project needs to use Google GEOcoding API [url removed, login to view] to find exact latitude and longitude of various business addresses. The returned data (xml or JSON) will have to be converted into text format and stored in the .csv
- i am attaching the sample spreadsheet showing the data that needs to be collected.
- the scraping chain will start at [url removed, login to view] where you will have to retrieve and parse data for all businesses covered on this site (e.g. Accommodation in NSW) and will also need to visit 1-2 different sites to extract and add further business information. See attached .xls. Apart, for extracting the required data, you will also need to download and store images for each business (one image per business listing, again from [url removed, login to view]). Also, the Google GEOcoding API will be used to get the latitude and longitude of each business.
*** MAKE SURE THAT YOU EXAMINE THE PROVIDED .XLS and VISIT THE RELEVANT SITES BEFORE BIDDING!!!