Database from web scraping of Google results

This project was awarded to Evs1 for $275 USD.

Get free quotes for a project like this
Employer working
Awarded to:
Project Budget
$250 - $750 USD
Total Bids
Project Description

This job requires an automated method to create a database of pages, the the text on those pages, and links from those pages and their text.

What I am trying to do is build a database of pages that will enable me to figure out

1) Which cities, towns, schools and other public institutions in the US and Canada have an emergency notification system
2) Which vendor they are using.

Because the text this will find is not consistently formatted, we've come up with this method, but would be interested in any suggestions you have for improving it.

1) run Google search on these search terms: register OR registration "emergency notification"
2) Identify the URLs of the found pages
For each URL:
3) Copy all of the text on the page
4) Copy all of the source of the page (separate from text of page)
5) If there is a link on the page with any of the text below in the address, go to that link and store as "Linked URL"
6) copy the text on that Linked URL page and store that as "Linked URL Text"
7) If any of the links appear on either the first page source or or firstt page text or the second URL, enter the appropriate brand (Noted below in parentheses)
8) If the US state in the the text - put that in the State column or field

The collected data needs to be stored in an excel spreadsheet or other format we agree on.

Links to registration pages will have this text in them. Each line represents one link :

[url removed, login to view] (brand is CityWatch)
[url removed, login to view] (brand is FirstCallNetwork)
[url removed, login to view] (brand is CodeRed)
[url removed, login to view] (brand is Everbridge)
[url removed, login to view] (brand is TwentyFirst Century)
[url removed, login to view] (brand is Rave)
[url removed, login to view] (brand is Deltalert)
[url removed, login to view] (brand is OneCallNow)
[url removed, login to view] (brand is RepidNotify)
[url removed, login to view] (brand is Nixle)
[url removed, login to view] (brand is Swift911)

[url removed, login to view] (brand is Cassidian) - this one will be in the form of [url removed, login to view], where XXXXXX is the name of their client, as in

US State: Abbreviation:
Alabama AL
Alaska AK
Arizona AZ
Arkansas AR
California CA
Colorado CO
Connecticut CT
Delaware DE
Florida FL
Georgia GA
Hawaii HI
Idaho ID
Illinois IL
Indiana IN
Iowa IA
Kansas KS
Kentucky KY
Louisiana LA
Maine ME
Maryland MD
Massachusetts MA
Michigan MI
Minnesota MN
Mississippi MS
Missouri MO
Montana MT
Nebraska NE
Nevada NV
New Hampshire NH
New Jersey NJ
New Mexico NM
New York NY
North Carolina NC
North Dakota ND
Ohio OH
Oklahoma OK
Oregon OR
Pennsylvania PA
Rhode Island RI
South Carolina SC
South Dakota SD
Tennessee TN
Texas TX
Utah UT
Vermont VT
Virginia VA
Washington WA
West Virginia WV
Wisconsin WI
Wyoming WY

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online