Closed

Build a scraper in PHP using curl and regular expressions

This project received 15 bids from talented freelancers with an average bid price of $179 USD.

Get free quotes for a project like this
Employer working
Skills Required
Project Budget
N/A
Total Bids
15
Project Description

I'm looking for a skilled developer with good experience and knowledge of curl and regular expressions. Please read the below explanation very carefully. I have many scrapers that need to be worked on because the website changed. It may in some cases be required to change the whole script. However you will receive the old script for reference. You will receive a weblogin which you can use to upload your script and to test your script in an easy and convenient way.

Please bid a price you ask per scraper. We can then later see how many scrapers you can do.

GENERAL:
- The scraper should scrape journey details(prices, departure times, etc..) from travel websites when a ONE-WAY trip is considered.
- The scraper has 2 basic functions: 1. Get the content cotaining travel information from the website. 2. Parse the data we need using regular expressions.
- The output will be in arrays. Each array contains information of a certain property of the travel such as price, departure time, departure date etc. Each array key MUST represent details for the same travel. For example $price[1] and $depdate[1] and $deptime[1] etc has to belong to the same travel.

IMPORTANT:
- Each array key MUST represent details for the same travel. For example $price[1] and $depdate[1] and $deptime[1] etc has to belong to the same travel.
- The size of all arrays must be equal.
- Check manually if the output of the script is accurate by using the website of the operator for different travels.
- This is the base file for making a script that scrapes a website
- Save the file using the name of the operator as: [url removed, login to view]
- When the scraper is finished test it for at least 20 combinations of input for $travelfrom and $travelto
- Check if the output is according to format and all scraped
- When you get a log-in to the website where you can tes the scraper make sure that the database input check was succesful
- The scraper should work for all available journeys from the website
- Use regular expressions for the data parsing.
- Use only content from the website to parse the data that will not quickly change such as class and id names. Don't use spaces or number of pixels etc..
- The scraper should be working for all available input form the destinations array.

OUTPUT ARRAYS:
- $price
- $currency always a 3 letter code example: EUR (for euro) or USD (United states dollar) THis is the currency where the price was given for.
- $pricetag this contains a string IF 1 or more carriers are given, which is the
- $depdate FORMAT: yyyy-mm-dd
- $deptime FORMAT: 00:00
- $arrdate FORMAT: yyyy-mm-dd
- $arrtime FORMAT: 00:00
- $transfers
- $bookinglink a deep link for the particular travel
- $bookinglinkretour a deeplink for the particular travel where the returndate is defined by: retourday retourmonth retouryear these 'variables' represent numbers. Note that it is not a php variable but just a string that you put in the deep link for a retour journey where you find the return date.

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online