U.S. Federal Regulations (RIN) -- Data Collection by XML or Web Scraping

IN PROGRESS
Bids
12
Avg Bid (USD)
$377
Project Budget (USD)
$250 - $750

Project Description:
I need data on United States federal agency rules scraped from the regulations.gov website. I would provide a list of Regulatory Information Numbers (RIN). This requires a search of the regulations.gov website for each RIN. There are about 569 unique RIN numbers for which I need data. Each RIN number would be a row. The columns would be data scraped from the “View Rule” page for each RIN (see, for example, http://www.reginfo.gov/public/do/eAgendaViewRule?pubId=200904&RIN=0560-AH84).

**The website includes a link to RIN data for all RIN numbers in XML. It may be easier to simply put all of this XML data into a spreadsheet. I would be willing to pay for this also, even the RIN numbers I have not specifically asked for.

If the XML data cannot be easily converted to spreadsheet from, the specific data I need from the web pages are:

1. RIN [1 column]
2. Publication ID [1 column]
3. Title [1 column]
4. Abstract [1 column]
5. Agency[1 column]
6. Priority [1 column]
7. RIN Status [1 column]
8. Agenda Stage of Rulemaking [1 column]
9. Major [1 column]
10. Unfunded Mandates [1 column]
11. CFR Citation [1 column]
12. Legal Authority [1 column]
13. Legal Deadline: [4 columns. In most cases this will say “None”. However, in some cases it will include data on a source, description and date. There may also be cases where there is more than one entry here. I need the number of entries (i.e., the number of rows) and data from the entry (source, description, and date). If there is more than one entry, use the data from the last entry.
14. Timetable: [3 columns; This portion of the page often includes multiple entries (i.e., multiple rows of data). I do not need all of it. I need the number of entries (i.e., the number of rows) and data from entries called “Final Action.” Specifically, I need Date and FR Cite for “Final Action” entries.]
15. Regulatory Flexibility Analysis [1 column]
16. Government Levels Affected [1 column]
17. Federalism [1 column]
18. Included in the Regulatory Plan [1 column]
19. RIN Data Printed in the FR [1 column]
20. Agency Contact: [8 columns; I need one column for Name, Title, Agency, Sub-agency, Address 1, Address 2, Address 3, Phone, Email]

This data should be collected for each RIN and placed in the columns.

Skills required:
Data Entry, Excel, Web Scraping
Hire DavidLewisVU
Project posted by:
DavidLewisVU United States
Verified
Public Clarification Board
Bids are hidden by the project creator. Log in as the employer to view bids or to bid on this project.
You will not be able to bid on this project if you are not qualified in one of the job categories. To see your qualifications click here.


Hire lafor
$ 300
in 2 days
$ 550
in 6 days
$ 300
in 3 days
Hire iit2009013
$ 275
in 3 days
$ 250
in 7 days
$ 500
in 7 days
Hire EvanKos
$ 544
in 7 days
$ 250
in 3 days
$ 550
in 0 days
$ 250
in 2 days