You have chosen to sponsor your bid up to a maximum amount of .
We need a simple web scrapper with the following requirements. Please indicate the delivery time frame in your response as this is important for this project. The requirements:
- the output must be .csv file. also, the scrapper should retrieve and store images, see below.
- the scrapper must be JAVA based (java 6 will do) utilizing http://hc.apache.org/downloads.cgi libraries/packages for network communication. The code must be commented and clean (e.g. using constants for all major attributes such as URL to be crawled, etc.)
- the project must be provided in a form of an Eclipse project for easier testing on our side.
- the project needs to use Google GEOcoding API https://developers.google.com/maps/documentation/geocoding/ to find exact latitude and longitude of various business addresses. The returned data (xml or JSON) will have to be converted into text format and stored in the .csv
- i am attaching the sample spreadsheet showing the data that needs to be collected.
- the scraping chain will start at http://www.ambassadorcard.com.au/ where you will have to retrieve and parse data for all businesses covered on this site (e.g. Accommodation in NSW) and will also need to visit 1-2 different sites to extract and add further business information. See attached .xls. Apart, for extracting the required data, you will also need to download and store images for each business (one image per business listing, again from http://www.ambassadorcard.com.au/). Also, the Google GEOcoding API will be used to get the latitude and longitude of each business.
***We have a large number of scrapping projects in the pipeline and are looking for a long term partnership. Check our feedback, it speaks for itself.
*** It would be great if this can be done promptly as this is a small and easy (but interesting) project for someone who knows his/hers java and web scraping well.
*** MAKE SURE THAT YOU EXAMINE THE PROVIDED .XLS and VISIT THE RELEVANT SITES BEFORE BIDDING!!!