We require a 2 process crawler (or 2 separate crawlers - up to you)building which will crawl from a starting URL / site to find new web sites of a particular type. Once a matching site is found then this site will be crawled for information.
The crawler should work as follows.
The crawler should crawl the web looking for 'matching' web sites.
* It should start from a URL.
* We should be able to set the start URL and the crawler should spider out to sites from there.
* It should be able to run for a set period of time (minutes which we set) or number of matching sites found (which we set).
* It should be able to scrape data from the source html to check for 'specific' text or 'specific' html code present.
* We should be able to set the specific text through a user defined field.
* We should be able to set if the 'match is case sensitive or not.
* Once the crawler finds a matching site it should store this URL in either an MS SQL or MySQl database.
The crawler should crawl all matching web sites stored in the database and search each one looking for specific details.
* The details we will require are telephone number, email address or other contact information.
* once this information is found this info should be stored in the database along side the Website URL
We are really only interested in dealing with companies or individual who can demonstrate:
* Proven experience or crawler / spider development.
* Proven track record for good work on Getafreelance.
We would prefer the application is written in .NET and is easily expanded / adapted later on as more requirements are set.
HOWEVER, if you feel that better development platforms (than .NET) should be used then please make the case for this and be clear on how you would perform this job.
Please state the keywords - READ IT - in your bid so that we can be sure you have read this specification clearly and understand it clearly.
Any questions please PMB me.