U.S. Federal Regulations (RIN) -- Data Collection by XML or Web Scraping

This project was successfully completed by lafor for $300 USD in 2 days.

Get free quotes for a project like this
Project Budget
$250 - $750 USD
Completed In
2 days
Total Bids
Project Description

I need data on United States federal agency rules scraped from the [url removed, login to view] website. I would provide a list of Regulatory Information Numbers (RIN). This requires a search of the [url removed, login to view] website for each RIN. There are about 569 unique RIN numbers for which I need data. Each RIN number would be a row. The columns would be data scraped from the “View Rule” page for each RIN (see, for example, [url removed, login to view]).

**The website includes a link to RIN data for all RIN numbers in XML. It may be easier to simply put all of this XML data into a spreadsheet. I would be willing to pay for this also, even the RIN numbers I have not specifically asked for.

If the XML data cannot be easily converted to spreadsheet from, the specific data I need from the web pages are:

1. RIN [1 column]

2. Publication ID [1 column]

3. Title [1 column]

4. Abstract [1 column]

5. Agency[1 column]

6. Priority [1 column]

7. RIN Status [1 column]

8. Agenda Stage of Rulemaking [1 column]

9. Major [1 column]

10. Unfunded Mandates [1 column]

11. CFR Citation [1 column]

12. Legal Authority [1 column]

13. Legal Deadline: [4 columns. In most cases this will say “None”. However, in some cases it will include data on a source, description and date. There may also be cases where there is more than one entry here. I need the number of entries (i.e., the number of rows) and data from the entry (source, description, and date). If there is more than one entry, use the data from the last entry.

14. Timetable: [3 columns; This portion of the page often includes multiple entries (i.e., multiple rows of data). I do not need all of it. I need the number of entries (i.e., the number of rows) and data from entries called “Final Action.” Specifically, I need Date and FR Cite for “Final Action” entries.]

15. Regulatory Flexibility Analysis [1 column]

16. Government Levels Affected [1 column]

17. Federalism [1 column]

18. Included in the Regulatory Plan [1 column]

19. RIN Data Printed in the FR [1 column]

20. Agency Contact: [8 columns; I need one column for Name, Title, Agency, Sub-agency, Address 1, Address 2, Address 3, Phone, Email]

This data should be collected for each RIN and placed in the columns.

Completed by:
Skills Required

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online