This job requires an automated method to create a database of pages, the the text on those pages, and links from those pages and their text.
What I am trying to do is build a database of pages that will enable me to figure out
1) Which cities, towns, schools and other public institutions in the US and Canada have an emergency notification system
2) Which vendor they are using.
Because the text this will find is not consistently formatted, we've come up with this method, but would be interested in any suggestions you have for improving it.
1) run Google search on these search terms: register OR registration "emergency notification"
2) Identify the URLs of the found pages
For each URL:
3) Copy all of the text on the page
4) Copy all of the source of the page (separate from text of page)
5) If there is a link on the page with any of the text below in the address, go to that link and store as "Linked URL"
6) copy the text on that Linked URL page and store that as "Linked URL Text"
7) If any of the links appear on either the first page source or or firstt page text or the second URL, enter the appropriate brand (Noted below in parentheses)
8) If the US state in the the text - put that in the State column or field
The collected data needs to be stored in an excel spreadsheet or other format we agree on.
Links to registration pages will have this text in them. Each line represents one link :
ww2.citywatchonline.com (brand is CityWatch)
alertregistration.com/ (brand is FirstCallNetwork)
cne.coderedweb.com/ (brand is CodeRed)
ww2.everbridge.net/citizen/ (brand is Everbridge)
signup2.tfcci.com/signup-web/ (brand is TwentyFirst Century)
www.getrave.com/ (brand is Rave)
alerts-1.deltalert.com/ (brand is Deltalert)
secure.onecallnow.com/ (brand is OneCallNow)
alert.rapidnotify.com/ (brand is RepidNotify)
www.nixle.com (brand is Nixle)
swift911.swiftreach.com/ (brand is Swift911)
onthealert.com (brand is Cassidian) - this one will be in the form of XXXXXX.ontheralert.com, where XXXXXX is the name of their client, as in madisoncounty.onthealtert.com.
US State: Abbreviation:
New Hampshire NH
New Jersey NJ
New Mexico NM
New York NY
North Carolina NC
North Dakota ND
Rhode Island RI
South Carolina SC
South Dakota SD
West Virginia WV
Additional Project Description:
04/12/2013 at 13:36 EDT
I have added a spreadsheet filled out by hand to show what we want and the Word document shows the start with the google search and also the final result we are trying to achieve (although not part of this project.)