- Status Closed
- Budget $200 - $400 USD
- Total Bids 7
We would like someone to build a PHP crawler/scraper using cURL.
The application should have a form with 2 input fields.
Input 1: a URL
Input 2: text string for search
Input 1 is the starting URL to start crawling a web directory. The application will crawl the directory and follow outgoing links to websites listed in the web directory.
It should be able to search the HTML code of the website for the text string we specify in Input 2 and then search for the specified string through a maximum of 5 pages.
If the text string is not found in any of the first 5 pages of the site, the application should stop crawling that site. That domain should be stored in the database as a domain to not attempt to crawl again in the future.
If it finds the text string in the code, the scraper should crawl the entire site and collect the following data:
Scraper should retrieve the following content:
Meta Description Tag
Email Address - Email Address should be associated with domain it was found on and not page URL it was acquired from.
This data is to be placed into a MYSQL database. One table should contain Domain, URL, Titles and Meta Description Tag. Second table should contain Domain and email information.
We would also like a throttle function to control the number of URL's the program will be crawling at a given time.Get free quotes for a project like this
Looking to make some money?
- Set your budget and the timeframe
- Outline your proposal
- Get paid for your work
Hire Freelancers who also bid on this project
Looking for work?
Work on projects like this and make money from home!Sign Up Now
- The New York Times
- Wall Street Journal
- Times Online