You have chosen to sponsor your bid up to a maximum amount of .
I need data on United States federal agency rules scraped from the regulations.gov website. I would provide a list of Regulatory Information Numbers (RIN). This requires a search of the regulations.gov website for each RIN (http://www.reginfo.gov/public/do/eAgendaSimpleSearch). There are about 569 unique RIN numbers for which I need data. Some of the searches for RINs will not produce results because the database is only for rules produced after 1995.
Each search will produce a number of records for each RIN. I need data from each record. So, each RIN will produce a number of rows of data equal to the number of records produced by the search (see, for example, what is produced by a search for RIN # 2060-AM06). This can vary from 1 to as many as 20 rows of data for each RIN.
The columns would be data scraped from the “View all RIN Data” page for each search (see, for example, what this looks like for RIN #2060-AM06).
**The website includes a link to RIN data in XML. It may be easier to simply put all of this XML data into a spreadsheet.
At minimum, the specific data I need from the web pages are:
1. RIN [1 column]
2. Publication ID [1 column]
3. Publication year [1 column]: From the Publication ID
4. Publication season [1 column]: From the Publication ID. This should be either Fall or Spring.
5. Title [1 column]
6. Agency [1 column]
7. Priority [1 column]
8. RIN Status [1 column]
9. Agenda Stage of Rulemaking [1 column]
10. Major [1 column]
11. Unfunded Mandates [1 column]
12. CFR Citation [1 column]
13. Legal Authority [1 column]
14. Legal Deadline [Many columns]: In some cases this will say “None”. However, in some cases it will include data on an action, source, description and date. There may also be cases where there is more than one entry here. I need the entries for action, source, description, and date for each deadline included in the table included after “Legal Deadline” (if there is one). So, in a case where there are two entries for legal deadline, there should be 8 columns of data filled (action, source, description, date; action, source, description, date). In cases where there are no deadlines, these columns would be blank.
15. Timetable [Many columns]: This portion of the page often includes multiple entries (i.e., multiple rows of data. It includes entries for Action, Date, and FR Cite. In most cases there is more than one entry here. I need all of this data. So, in a case where there are four entries for timetable, there should be 12 columns of data filled (action, date, fr cite; action, date, fr cite, etc.).
16. Deadline [1 column]: The way the agency indicates that they are setting a deadline is that the day within a date is set at 00. So, for example, a date of 06/00/2005 (mm/dd/yyyy) is a deadline because there is no specific day listed. I would like a column that is coded with a 1 if any of the dates in the timetable have this format (indicating a deadline).
17. Regulatory Flexibility Analysis [1 column]
18. Government Levels Affected [1 column]
19. Federalism [1 column]
20. Included in the Regulatory Plan [1 column]
21. Related RINs [1 column]
22. Agency Contact: [8 columns; I need one column for Name, Title, Agency, Sub-agency, Address 1, Address 2, Address 3, Phone, Email]
This data should be collected for each record for each RIN and placed in the columns.