Database from web scraping of Google results
This project was awarded to Evs1 for $275 USD.Get free quotes for a project like this
Project Budget$250-$750 USD
This job requires an automated method to create a database of pages, the the text on those pages, and links from those pages and their text.
What I am trying to do is build a database of pages that will enable me to figure out
1) Which cities, towns, schools and other public institutions in the US and Canada have an emergency notification system
2) Which vendor they are using.
Because the text this will find is not consistently formatted, we've come up with this method, but would be interested in any suggestions you have for improving it.
1) run Google search on these search terms: register OR registration "emergency notification"
2) Identify the URLs of the found pages
For each URL:
3) Copy all of the text on the page
4) Copy all of the source of the page (separate from text of page)
5) If there is a link on the page with any of the text below in the address, go to that link and store as "Linked URL"
6) copy the text on that Linked URL page and store that as "Linked URL Text"
7) If any of the links appear on either the first page source or or firstt page text or the second URL, enter the appropriate brand (Noted below in parentheses)
8) If the US state in the the text - put that in the State column or field
The collected data needs to be stored in an excel spreadsheet or other format we agree on.
Links to registration pages will have this text in them. Each line represents one link :
[url removed, login to view] (brand is CityWatch)
[url removed, login to view] (brand is FirstCallNetwork)
[url removed, login to view] (brand is CodeRed)
[url removed, login to view] (brand is Everbridge)
[url removed, login to view] (brand is TwentyFirst Century)
[url removed, login to view] (brand is Rave)
[url removed, login to view] (brand is Deltalert)
[url removed, login to view] (brand is OneCallNow)
[url removed, login to view] (brand is RepidNotify)
[url removed, login to view] (brand is Nixle)
[url removed, login to view] (brand is Swift911)
[url removed, login to view] (brand is Cassidian) - this one will be in the form of [url removed, login to view], where XXXXXX is the name of their client, as in madisoncounty.onthealtert.com.
US State: Abbreviation:
New Hampshire NH
New Jersey NJ
New Mexico NM
New York NY
North Carolina NC
North Dakota ND
Rhode Island RI
South Carolina SC
South Dakota SD
West Virginia WV
Looking to make some money?
- Set your budget and the timeframe
- Outline your proposal
- Get paid for your work
Hire Freelancers who also bid on this project
Looking for work?
Work on projects like this and make money from home!Sign Up Now
- The New York Times
- Wall Street Journal
- Times Online