Closed

PHP5 CLI - Crawl for host and domain names

I need a PHP 5.3+ CLI crawler using CURL / DOM to extract host and domain names from websites. Crawling must read and follow [url removed, login to view] files. Must be multi-treaded to crawling is fast an efficient. Crawling given host name, supplied by a JSON data feed ([url removed, login to view]) returning a list of ALL domains and hostnames that site links to in JSON format. This should be a unique list so no hostname / domain is repeated. This list then will be submitted via an API to another script. This system MUST be very memory efficient and follow PHP 5.3+ recommended programming standards.

Items to check for host / domain names should be images, scripts and href values, but allow for expansion whilst coding.

This script will only be run from the Debian command line using PHP so make sure you really know CLI before bidding. This is the first of MANY small projects that will link together so clear well documented approach is essential.

Update: Must support UTF-8 and international domain names. CURL references should support compression if the remote server supports it to reduce bandwidth usage.

Skills: PHP

See more: php5 cli, www dom com, programming robots, php5.0, dom programming, data host, coding websites recommended, api expansion, json programming, make curl php script, fast and efficient, domain names, crawling of data, crawl data, CLI , script crawling images, php link crawler script, php5 api, php list json, php json list, cli list files, fast crawler, dom json, domain name check, extract via api

About the Employer:
( 0 reviews ) Birmingham, United Kingdom

Project ID: #4343815

4 freelancers are bidding on average ₱10000 for this job

linuxfreak1985

Hi there, We are experts in PHP Open Source (any kind of PHP/MySQL work),wordpress and Ajax/Web 2.0 technology. Some of the projects we completed for php are mentioned below [url removed, login to view] More

₱10000 PHP in 3 days
(57 Reviews)
5.7
jitendraparmar07

Automation expert here. I can easily write such a bot/[url removed, login to view] check your PMB.

₱10000 PHP in 5 days
(8 Reviews)
4.5
miniric3

on2itonline: Hi, KIWI Team here at on2itonline would love to work for you on this project, we do "full service" websites on a clear and fixed budget. We have lots of offer that others don't, like our li More

₱10000 PHP in 10 days
(0 Reviews)
0.0
tetsuo13

This is a very interesting project. The PCNTL functions in PHP will provide the means to make it act like a multi-threaded application (PHP has no such thing). There will need to be several built in limitations as a sc More

₱10000 PHP in 3 days
(0 Reviews)
0.0