Web Scraping PRO

  • Status Closed
  • Budget $3000 - $5000 USD
  • Total Bids 21

Project Description


I need a script (or application) that will run some company names, and from the first 2 pages of google results, to extract the phones and emails.

Example : Search in google like this (no inurl, no intext) - <"Company Name"> e m a i l :*arond*.*

As an specific example, we are searching a specific company :

"Dexia région Tubize SCRL". This is a company from denmark, so it would be wise to search on g o o g l e . d k , instead of g o o g l e . c o m , so :

h t t p s : / / w w w . g o o g l e . d k / # q = D e x i a + r % C 3 % A 9 g i o n + T u b i z e + S C R L + + e m a i l % 3 A * . *

The results are :

(2nd position), (6th position), (8th position).

1. avoid possibility of getting black listed, by studying the time between two searches so it would not trigger

any alarm in google, and put a limit per hour, or per day in scrap emalis of companies, in order to avoid ban or any any other problem that might appear. It does not

matter how slow the application goes as long as we are not getting blacklisted or ban, or any other illegal stuff it may appears, and do not make the application to use proxies, unless you can make it work in some manner as users (like me) not to have to deal in obtaining proxies (maybe use TOR - use your imagination)

2. input company list in excel or .csv (differing from country to country it has special characters - like umlaut in german language) -> Button - Upload Company List

3. scrap company after company, and each email found should be kept along with the phone number if that exists -> Button - Start Scraping

4. the time between searches should be a random function between 3 and 24 seconds -> Button -> Scrap Step

[url removed, login to view] email and phone of that company and output it in an .csv file (the output has to be automatic after each scraped email and phone). Also, the exact url address of the page where the emails address of that company resides should be saved in the same excel file

6. Settings Button wich will contain the possibility of adding a ftp address where the .csv file will be sent from 5 to 5 minutes. The name of .csv file will be composed from : <country of scraping><_><object of scraping>_<date of scraping>_<militarytime>.csv. Example of file name : [url removed, login to view] , so Belgium automatically will be taken from the [url removed, login to view], if users decide to scrap from Belgium, time and military date (1824 = 18:14).

7. do not scrap any data if there is no email in plain text (i mean in those 3 rows of plain text belonging to the result position where the company is found, on the first or second page of the results)

[url removed, login to view] to select google search engine from different countries : Italy, Denmark, Norway, Swiss, Austria, Germany and Belgium and Goggle International (g o o g l e . c o m )

9. the script (or application) and the source code will become employers property after the final payment

10. if there is a linux specialist, or pearl or c+, then prepare the application in a manner that a non-IT person to use it very easy (linux - make debian package....and so on)

Get free quotes for a project like this

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online