Web Scraping PRO

CANCELLED
Bids
21
Avg Bid (USD)
$3812
Project Budget (USD)
$3000 - $5000

Project Description:
Hi,
I need a script (or application) that will run some company names, and from the first 2 pages of google results, to extract the phones and emails.

Example : Search in google like this (no inurl, no intext) - <"Company Name"> e m a i l :*arond*.*
As an specific example, we are searching a specific company :
"Dexia région Tubize SCRL". This is a company from denmark, so it would be wise to search on g o o g l e . d k , instead of g o o g l e . c o m , so :

h t t p s : / / w w w . g o o g l e . d k / # q = D e x i a + r % C 3 % A 9 g i o n + T u b i z e + S C R L + + e m a i l % 3 A * . *
The results are :
(2nd position), (6th position), (8th position).

1. avoid possibility of getting black listed, by studying the time between two searches so it would not trigger
any alarm in google, and put a limit per hour, or per day in scrap emalis of companies, in order to avoid ban or any any other problem that might appear. It does not
matter how slow the application goes as long as we are not getting blacklisted or ban, or any other illegal stuff it may appears, and do not make the application to use proxies, unless you can make it work in some manner as users (like me) not to have to deal in obtaining proxies (maybe use TOR - use your imagination)
2. input company list in excel or .csv (differing from country to country it has special characters - like umlaut in german language) -> Button - Upload Company List
3. scrap company after company, and each email found should be kept along with the phone number if that exists -> Button - Start Scraping
4. the time between searches should be a random function between 3 and 24 seconds -> Button -> Scrap Step
5.scrap email and phone of that company and output it in an .csv file (the output has to be automatic after each scraped email and phone). Also, the exact url address of the page where the emails address of that company resides should be saved in the same excel file
6. Settings Button wich will contain the possibility of adding a ftp address where the .csv file will be sent from 5 to 5 minutes. The name of .csv file will be composed from : <country of scraping><_><object of scraping>_<date of scraping>_<militarytime>.csv. Example of file name : Belgium_emailsANDphones_23.04.2013_1824.csv , so Belgium automatically will be taken from the google.be, if users decide to scrap from Belgium, time and military date (1824 = 18:14).
7. do not scrap any data if there is no email in plain text (i mean in those 3 rows of plain text belonging to the result position where the company is found, on the first or second page of the results)
8.possibility to select google search engine from different countries : Italy, Denmark, Norway, Swiss, Austria, Germany and Belgium and Goggle International (g o o g l e . c o m )
9. the script (or application) and the source code will become employers property after the final payment
10. if there is a linux specialist, or pearl or c+, then prepare the application in a manner that a non-IT person to use it very easy (linux - make debian package....and so on)

Skills required:
Data Entry, Data Mining, Excel, Web Scraping
Additional Files: project_extract_emails.txt
About the employer:
Verified
Public Clarification Board
Bids are hidden by the project creator. Log in as the employer to view bids or to bid on this project.
You will not be able to bid on this project if you are not qualified in one of the job categories. To see your qualifications click here.


$ 3157
in 12 days
$ 3092
in 15 days
$ 4210
in 3 days
Hire webscrapinggurus
$ 3608
in 14 days
$ 3711
in 20 days
Hire mhmhz
$ 4210
in 10 days
$ 3000
in 15 days
$ 3888
in 15 days
$ 4444
in 3 days
$ 4444
in 3 days