Web Scraping (PDFs)

Avg Bid (USD)
Project Budget (USD)
$250 - $750

Project Description:
Hi there,

I am trying to develop a way to automate the daily retrieval of PDF's from a State Government website and extract (scrape) specific information from the document. The procedure is as follows:

1. Go to URL http://imaging.occeweb.com/imaging/OAP.aspx
2. Enter Docket Code: ORDER
3. Enter Case Type: CD
4. Enter Date Range (to be done daily)
5. Hit ‘Search’
6. Open first document by clicking hyperlink under ‘ID’ Column
b. If RELIEF SOUGHT = ‘POOLING’ continue to step 7
c. Else, return to results and open next document, then repeat a/b
7. If DISMISSED return to step 6
8. Else, identify fields highlighted in example documents
9. Export results to excel database – each column name marked in red on example documents
10. Return to search results and continue searching through documents with criteria from step 6

Obviously, I only need the PDF's that pertain to POOLING as the RELIEF TYPE.

I am looking to organize all this data in a program like Excel for my use. I'd like the data to be organized by Order Date, Cause CD No. and then a column for each piece of information highlighted and identified in red in the example documents.

I have provided two examples to show that the document may vary somewhat in formatting and the presentation of data.

Skills required:
Excel, PDF, Web Scraping
Qualifications required:
us_eng_1 US English - Level 1
Additional Files: Annotated+Example.pdf
About the employer:
Public Clarification Board
Bids are hidden by the project creator. Log in as the employer to view bids or to bid on this project.
You will not be able to bid on this project if you are not qualified in one of the job categories. To see your qualifications click here.

$ 360
in 3 days
Hire cheapexcell
$ 257
in 7 days
$ 303
in 9 days
$ 550
in 30 days
$ 275
in 5 days
$ 385
in 3 days
$ 367
in 3 days
$ 515
in 9 days
$ 275
in 3 days
Hire xautoit
$ 275
in 3 days