PHP Crawler/Scraper

  • Status Closed
  • Budget $200 - $400 USD
  • Total Bids 7

Project Description

We would like someone to build a PHP crawler/scraper using cURL.

The application should have a form with 2 input fields.

Input 1: a URL

Input 2: text string for search

Input 1 is the starting URL to start crawling a web directory. The application will crawl the directory and follow outgoing links to websites listed in the web directory.

It should be able to search the HTML code of the website for the text string we specify in Input 2 and then search for the specified string through a maximum of 5 pages.

If the text string is not found in any of the first 5 pages of the site, the application should stop crawling that site. That domain should be stored in the database as a domain to not attempt to crawl again in the future.

If it finds the text string in the code, the scraper should crawl the entire site and collect the following data:

Scraper should retrieve the following content:

The Domain



Meta Description Tag

Email Address - Email Address should be associated with domain it was found on and not page URL it was acquired from.

This data is to be placed into a MYSQL database. One table should contain Domain, URL, Titles and Meta Description Tag. Second table should contain Domain and email information.

We would also like a throttle function to control the number of URL's the program will be crawling at a given time.

Get free quotes for a project like this
Skills Required

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online