The site to be scraped is:
[url removed, login to view]
The information I would like scraped is the “Today’s Law As Amended” for each bill text version.
To see this for a bill, choose the Quick Search bar upper right of the screen and enter in AB 1. From the list of measures displayed choose
AB 1 Water quality: integrated plan: Salinas Valley. You will see a tab labeled “Today’s Law As Amended” I would like the html content displayed scraped when you choose that tab.
You will notice above the content to the right is a pull-down for Version:, I would want each version of the content.
If it is of any help, I can provide the updated list of Measures and Versions daily that will have a new “Today’s Law As Amended” for your scraper to work from. Once a version has been posted its’ content will not change.
The scrape frequency would be once a day.
Thanks everyone for responding. I'd like to add some specifics to the project before I start following up with you.
* I would like to host the scraper on scraperwiki.com.
* I've attached the example results I would like to have scraped from the http://leginfo.legislature.ca.gov site using SB 322 as an example. Also, the screen shot shows the pulldown for both versions of "Today's Law As Amended" for the bill.
A bill can have 1 to many versions. I would want every version for each bill.
* If needed, I can provide a database with the Measures and versions that would need to be scraped and would possibly use for the storage of the results.
* The content of the page is the div with an id of "bill_all"
* I do not have any experience scraping, but I do want to point out the site being scraped logs me out after inactivity.