Closed

Database from web scraping of Google results

This project was awarded to Evs1 for $275 USD.

Get free quotes for a project like this
Employer working
Project Budget
$250 - $750 USD
Total Bids
12
Project Description

This job requires an automated method to create a database of pages, the the text on those pages, and links from those pages and their text.

What I am trying to do is build a database of pages that will enable me to figure out

1) Which cities, towns, schools and other public institutions in the US and Canada have an emergency notification system

and

2) Which vendor they are using.

Because the text this will find is not consistently formatted, we've come up with this method, but would be interested in any suggestions you have for improving it.

1) run Google search on these search terms: register OR registration "emergency notification"

2) Identify the URLs of the found pages

For each URL:

3) Copy all of the text on the page

4) Copy all of the source of the page (separate from text of page)

5) If there is a link on the page with any of the text below in the address, go to that link and store as "Linked URL"

6) copy the text on that Linked URL page and store that as "Linked URL Text"

7) If any of the links appear on either the first page source or or firstt page text or the second URL, enter the appropriate brand (Noted below in parentheses)

8) If the US state in the the text - put that in the State column or field

The collected data needs to be stored in an excel spreadsheet or other format we agree on.

Links to registration pages will have this text in them. Each line represents one link :

[url removed, login to view] (brand is CityWatch)

[url removed, login to view] (brand is FirstCallNetwork)

[url removed, login to view] (brand is CodeRed)

[url removed, login to view] (brand is Everbridge)

[url removed, login to view] (brand is TwentyFirst Century)

[url removed, login to view] (brand is Rave)

[url removed, login to view] (brand is Deltalert)

[url removed, login to view] (brand is OneCallNow)

[url removed, login to view] (brand is RepidNotify)

[url removed, login to view] (brand is Nixle)

[url removed, login to view] (brand is Swift911)

[url removed, login to view] (brand is Cassidian) - this one will be in the form of [url removed, login to view], where XXXXXX is the name of their client, as in madisoncounty.onthealtert.com.

US State: Abbreviation:

Alabama AL

Alaska AK

Arizona AZ

Arkansas AR

California CA

Colorado CO

Connecticut CT

Delaware DE

Florida FL

Georgia GA

Hawaii HI

Idaho ID

Illinois IL

Indiana IN

Iowa IA

Kansas KS

Kentucky KY

Louisiana LA

Maine ME

Maryland MD

Massachusetts MA

Michigan MI

Minnesota MN

Mississippi MS

Missouri MO

Montana MT

Nebraska NE

Nevada NV

New Hampshire NH

New Jersey NJ

New Mexico NM

New York NY

North Carolina NC

North Dakota ND

Ohio OH

Oklahoma OK

Oregon OR

Pennsylvania PA

Rhode Island RI

South Carolina SC

South Dakota SD

Tennessee TN

Texas TX

Utah UT

Vermont VT

Virginia VA

Washington WA

West Virginia WV

Wisconsin WI

Wyoming WY

Awarded to:

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online