Web Crawler - Business Contact Details

$300-1500 USD

Closed

Posted

over 16 years ago

$300-1500 USD

Paid on delivery

I require a web crawler to extract web base contact information regarding a businesses including Name, Website URL, Address, Phone, Mobile, Fax Number and business specialty if one is present. The crawler must also be able to accomodate multiple addresses and contact ph and fax numbers for one business. The primary contact sites to crawl through are [login to view URL] and www.yellowpages.com.au. Data results will be checked against an existing database for accuracy of results. Requirements: 1. I must be able to set the starting URL from which the spider will intitiate from on the websites. The format of the data on each website should be examined closely before commencing as there are multiple data fields that are displayed if information is present. 2. The spider should contain its own database of products, professions and service such that it can use these as a basis of initiation of searches. Data is to be extracted into XML or ASCII format and then imported directly into a MySQL or Postgres Database file. 3. Spider must crawl through multiple pages until the final page for that category is completed. However, at the very beginning of most categories, there are businesses listed under the "Yellow Pages - Advertisers" heading. These are businesses that are not from the area that I have chosen but are advertising in that area. I do not want these entries included. The spider does not neccessarily need to know how my list was created, only to avoid entries under the "Advertisers" section. 4. When completed, an update function should let me choose a new search profession name and initiate the search. 5. Search and purge function that can be run anytime on any of the database files that have been created to ensure no two entires have the same telephone number/fax number. If duplicates telephone/fax numbers are found, records with the least information are automatically deleted. For example, 2 records with the same telephone/fax numbers but one lists a website and the other doesn't, then delete the one without the website number. 6. I require that this program be functional for both websites and that the system can reinitiate the searches to capture update info after say 4-5 months. 7. Finally, the crawler must function despite any anti-crawler or anti search / DOS protection (if any) being run by the site administrators. My Requirements: 1. You will be easily contacted. Either by phone, or you will be required to answer any e-mail I send to you within 10 hours time. 2. Must speak and write english well. 3. Code must be well commented in english. 4. All source code must be given to me. 5. I would prefer if this was written in Java, perl or python but XML is also OK. 6. I would like this done by no later than November 21st, 2007. 7. Must be able to run on my Windows XP machine or hosted in a USA data centre. Data usage is not an issue.

Data Processing

Project ID: 183261

About the project

7 proposals

Remote project

Active 16 yrs ago

Looking to make some money?

Email address

Benefits of bidding on Freelancer

Set your budget and timeframe

Get paid for your work

Outline your proposal

It's free to sign up and bid on jobs

7 freelancers are bidding on average $537 USD for this job

@aruhat

Hi, Thanks for given an opportunity at "ARUHAT TECHNOLOGIES". Kindly go through PM for detail analysis of your requirement. Regards, Maulik

$500 USD in 10 days

4.0

(2 reviews)

5.7

@aronBD

Hi I am new at GAF but I have four years of experience as Software Engineer. I have working experience of PERL, C, C++, PHP and MySQL. I am working to develop Web Crawlers for various video sharing sites like youtube, dailymotion, metacafe, AOL and many more using PERL and MySQL for the last 4 years. I have vast experience of PERL. I am bidding on this project as its pretty similar to what I am doing right now and I can provide you an efficient solution.

$600 USD in 20 days

0.0

(0 reviews)

0.0

@eduardobaret

Check PM please.

$550 USD in 29 days

0.0

(0 reviews)

2.5

@andreidumitrescu

Please check PM

$560 USD in 10 days

0.0

(0 reviews)

0.0

@extremetarun

hello, i have made a similar project for e marketting. i have a demo email crawler which i can show you if you like.

$400 USD in 7 days

0.0

(0 reviews)

0.0

@zeesoftpk

Hi, I have three years of experience as Software Engineer. From last couple year, I have developed two same type of project. One of my web crawler (Server Client Architecture) fetching the data from the 3 properties website simultaneously, and other web crawler fetched data from sports website. Still, i am working on these type of projects and I have well known knowledge of this domain. I can provide you an efficient solution as u want with as much facilities u want...

$600 USD in 30 days