PHP Crawler/Scraper

This project received 7 bids from talented freelancers with an average bid price of $274 USD.

Get free quotes for a project like this
Employer working
Skills Required
Project Budget
$200 - $400 USD
Total Bids
Project Description

We would like someone to build a PHP crawler/scraper using cURL.

The application should have a form with 2 input fields.

Input 1: a URL
Input 2: text string for search

Input 1 is the starting URL to start crawling a web directory. The application will crawl the directory and follow outgoing links to websites listed in the web directory.

It should be able to search the HTML code of the website for the text string we specify in Input 2 and then search for the specified string through a maximum of 5 pages.

If the text string is not found in any of the first 5 pages of the site, the application should stop crawling that site. That domain should be stored in the database as a domain to not attempt to crawl again in the future.

If it finds the text string in the code, the scraper should crawl the entire site and collect the following data:

Scraper should retrieve the following content:

The Domain
Meta Description Tag
Email Address - Email Address should be associated with domain it was found on and not page URL it was acquired from.

This data is to be placed into a MYSQL database. One table should contain Domain, URL, Titles and Meta Description Tag. Second table should contain Domain and email information.

We would also like a throttle function to control the number of URL's the program will be crawling at a given time.

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online