Create a site scraper that outputs JSON

  • Status Closed
  • Budget $30 - $250 USD
  • Total Bids 20

Project Description

I need a web scraper for a site that's written in spanish. You don't really need to know spanish, although perhaps it would be helpful for you to know a little bit.

The scraper should start at the following address: [url removed, login to view]

I attach a screenshot with instructions on how to get to the pages where the data resides.

You'll encounter two kinds of pages while scraping, here's an example from each:

[url removed, login to view]

[url removed, login to view]

From those pages I'll want several of the data, along with images when available, and map coordinates.

Also, for each record, you'll be scraping extra data from this page: [url removed, login to view]

Please take the time to review that last one, as getting results from there might not be as straightforward as from the other ones. Use this to submit the form and see a sample result: 09PBT0018D

I want the following features:

- Able to first scrape all the index pages, and save the target URLs in a file that can be later read by the script, so it can first get a list of everything, and then I can tell it to scrape using those URLs, and it will comment out the ones scraped as it goes along so in case of failure it can pick up where it left off.

- Able to scrape just one specific record where I provide the vcct, vsubn, and vturno parameters (and also scrape its matching extra data).

- It saves a file in valid JSON format with all the records (or just the one record in the case where I specify the parameters).

The language you use is not very important, but if you can do it in Ruby, that would be a plus.

Feel free to ask any questions, I don't want you to dive in to a job that's not clear and later you run into unreasonable delays or troubles.

Get free quotes for a project like this
Skills Required

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online