Web Site Scraping

  • Status Closed
  • Budget min $5000 USD
  • Total Bids 72

Project Description

Web Site Scraping

We want to build a service which srapes web sites in order to maintain an external database and to extract data from dynamic web pages. The targeted website has to be entered through a log into site.

The service will be initiated by an external scheduler. The external scheduler uses XML code which contains all information for the service. The service shall execute the following steps

a) receive XML

d) pass the log into site

c) maintain the external database

d) extract data

e) send XML

Once the service is finished, it shall report its success (XML).

Technical details: Communication only via XML interface. The XML schema is given. We expect cURL or Java. Multiple instances on the same machine are required.

As a contractor you can use a testing system for the XML interface. Regarding the third party websites you will receive the login data for a user account and a screen shot documentation of the manually maintenance for every targeted web site. Please note that we cannot provide a testing system for third party websites, every change is real life and has to be restored to the original data.

We want to scrape 250 web sites successive within the next months. This is an enquiry for the first package of 25 web sites. Ongoing we need another 10 a month, eventually up to 25 a month.

At the moment we are asking for external development only and will do the ongoing maintenance by ourselves. In a further stage we will shift this work as well.

Get free quotes for a project like this
Awarded to:

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online