Closed

Custom Bot to Extract Data

This project received 22 bids from talented freelancers with an average bid price of $88 USD.

Get free quotes for a project like this
Employer working
Skills Required
Project Budget
N/A
Total Bids
22
Project Description

I need someone that can make a custom bot. I use Windows and can run Perl, .Net, etc. I do not want it to be web based as that has proved to be slower but if you insist, I'll give it a try.

Here's how it could work.

In order to extract the data, you first have to have an active session with the website. That means I need a window (or a portion of one that I can visibility see) that stays open that I log-in. (I'll keep this window open and occasionally click around to keep the session active.)

I need the ability to load a list of URL's to extract data. I am OK with either the text being extracted or the code. Which ever is faster. It needs to be saved in CSV format. The HTML code is at the bottom and runs from line 156-228 on the page. Each page is the same.

Notice in the code that each field has a field name (ie: Name, Address, etc). I want to keep these in place so that it will look like "Name: Chris Jones". That way, when I sort it it will help better organize the data in case things get shifted around.

Each field name will needs its own column in CSV. Name, Address, City, State, Zip, etc.

This will need to support multiple threads since it needs to be fast.

I do not need to download any images or other files. I just need the text from each page.

Each URL is in this format: [url removed, login to view]

PS - Due to the nature of the data, I cannot provide a link for you to test. I realize this impacts the job but I cannot give out the data.

SAMPLE CODE ATTACHED -- (Runs from line 156-228)

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online