Need some work done? Post a Project Today
I would like someone to mine http://dsbs.sba.gov/dsbs/search/dsp_dsbs.cfm, obtaining all of the profile pages placed into a database. For instance, if you click on Arizona, and then go down to "search using these criteria", you will get a return of ~4800 results. Under the "name and trade name of firm field, you will see a link to the companies profile. http://dsbs.sba.gov/dsbs/search/dsp_profile.cfm?DUNS=039301993 is a random example of a page returned.
There are two ways I can think to gather this data: one is to use a brute force query DUNS=000000000, DUNS=000000001, etc. The other is to automate the form filling out procedure (using something like selenium) to grab the currently existing pages with a valid DUNS number, and then use that generated DUNS list to grab the pages themselves. I would then like each field in the page (email, phone number, ownership and self certification, etc) to be entered into a searchable database.