U.S. Regulations (RIN) Data Collection 11-2013

IN PROGRESS
Bids
39
Avg Bid (USD)
$372
Project Budget (USD)
$250 - $750

Project Description:
Project Description:
I need data on United States federal agency rules scraped from the regulations.gov website. I would provide a list of Regulatory Information Numbers (RIN). This requires a search of the regulations.gov website for each RIN (http://www.reginfo.gov/public/do/eAgendaSimpleSearch). There are about 569 unique RIN numbers for which I need data. Some of the searches for RINs will not produce results because the database is only for rules produced after 1995.

Each search will produce a number of records for each RIN. I need data from each record. So, each RIN will produce a number of rows of data equal to the number of records produced by the search (see, for example, what is produced by a search for RIN # 2060-AM06). This can vary from 1 to as many as 20 rows of data for each RIN.

The columns would be data scraped from the “View all RIN Data” page for each search (see, for example, what this looks like for RIN #2060-AM06).

**The website includes a link to RIN data in XML. It may be easier to simply put all of this XML data into a spreadsheet.

At minimum, the specific data I need from the web pages are:


1. RIN [1 column]
2. Publication ID [1 column]
3. Publication year [1 column]: From the Publication ID
4. Publication season [1 column]: From the Publication ID. This should be either Fall or Spring.
5. Title [1 column]
6. Agency [1 column]
7. Priority [1 column]
8. RIN Status [1 column]
9. Agenda Stage of Rulemaking [1 column]
10. Major [1 column]
11. Unfunded Mandates [1 column]
12. CFR Citation [1 column]
13. Legal Authority [1 column]
14. Legal Deadline [Many columns]: In some cases this will say “None”. However, in some cases it will include data on an action, source, description and date. There may also be cases where there is more than one entry here. I need the entries for action, source, description, and date for each deadline included in the table included after “Legal Deadline” (if there is one). So, in a case where there are two entries for legal deadline, there should be 8 columns of data filled (action, source, description, date; action, source, description, date). In cases where there are no deadlines, these columns would be blank.
15. Timetable [Many columns]: This portion of the page often includes multiple entries (i.e., multiple rows of data. It includes entries for Action, Date, and FR Cite. In most cases there is more than one entry here. I need all of this data. So, in a case where there are four entries for timetable, there should be 12 columns of data filled (action, date, fr cite; action, date, fr cite, etc.).
16. Deadline [1 column]: The way the agency indicates that they are setting a deadline is that the day within a date is set at 00. So, for example, a date of 06/00/2005 (mm/dd/yyyy) is a deadline because there is no specific day listed. I would like a column that is coded with a 1 if any of the dates in the timetable have this format (indicating a deadline).
17. Regulatory Flexibility Analysis [1 column]
18. Government Levels Affected [1 column]
19. Federalism [1 column]
20. Included in the Regulatory Plan [1 column]
21. Related RINs [1 column]
22. Agency Contact: [8 columns; I need one column for Name, Title, Agency, Sub-agency, Address 1, Address 2, Address 3, Phone, Email]

This data should be collected for each record for each RIN and placed in the columns.

Skills required:
Data Entry, Excel, Web Scraping, XML
Qualifications required:
us_eng_1 US English - Level 1
About the employer:
Verified
Public Clarification Board
Bids are hidden by the project creator. Log in as the employer to view bids or to bid on this project.
You will not be able to bid on this project if you are not qualified in one of the job categories. To see your qualifications click here.


Hire dataworker2009
$ 515
in 5 days
$ 555
in 7 days
$ 750
in 7 days
Hire lafor
$ 265
in 3 days
$ 600
in 7 days
$ 600
in 6 days
$ 309
in 3 days
$ 250
in 2 days
$ 250
in 2 days
$ 315
in 3 days