Closed

Web Scraping (PDFs)

This project was awarded to chaituse for $275 USD.

Get free quotes for a project like this
Employer working
Awarded to:
Skills Required
Project Budget
$250 - $750 USD
Total Bids
13
Project Description

Hi there,

I am trying to develop a way to automate the daily retrieval of PDF's from a State Government website and extract (scrape) specific information from the document. The procedure is as follows:

1. Go to URL [url removed, login to view]

2. Enter Docket Code: ORDER

3. Enter Case Type: CD

4. Enter Date Range (to be done daily)

5. Hit ‘Search’

6. Open first document by clicking hyperlink under ‘ID’ Column

a. Identify RELIEF SOUGHT

b. If RELIEF SOUGHT = ‘POOLING’ continue to step 7

c. Else, return to results and open next document, then repeat a/b

7. If DISMISSED return to step 6

8. Else, identify fields highlighted in example documents

9. Export results to excel database – each column name marked in red on example documents

10. Return to search results and continue searching through documents with criteria from step 6

Obviously, I only need the PDF's that pertain to POOLING as the RELIEF TYPE.

I am looking to organize all this data in a program like Excel for my use. I'd like the data to be organized by Order Date, Cause CD No. and then a column for each piece of information highlighted and identified in red in the example documents.

I have provided two examples to show that the document may vary somewhat in formatting and the presentation of data.

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online