Closed

web data extraction/web scraping

This project received 3 bids from talented freelancers with an average bid price of $ USD.

Get free quotes for a project like this
Employer working
Skills Required
Project Budget
$10 - $30 USD
Total Bids
3
Project Description

Hi -- I need to extract some information from a few different websites, to put it into a 'nice' easy to read format for myself. It's a relatively easy job - but providing this is done well, I'll have a number of additional sites for you to scrape. I have some sample code that I can send you (from a previous completed project, which is a sample of the data extraction. It works for the first two sites I will be asking you to do, however I am going to be asking for a bit more information to be extracted, as well ). Criteria: ---------- 1. Please only bid if you are American, Canadian, European or Phillipino. All other bids will most likely be ignored (simply a matter of coding/quality issue). 2. It should be done in PHP/mySQL. You may need to use CURL (as some sites will have basic password/login authentication). 3. Preferably a good english speaking/reading/writing level. 4. If you have some code samples, that would help. (I am looking for someone that implements 'good' coding standards). 5. For this bid, it is for 'two' scrapes/websites (in my existing code). However, should you do a good job, I'll be extending this to another 10-15 sites. Some sites will have data scraped from 'wordpress' type websites, while others will simply be 'directory' style websites. I'm estimating probably 3-5 hours to complete this. (especially because I have some sample code I'm sending you, and it mainly requires tweaking/making it look good/etc). Other technical details: Ideally someone will have experience in the following. (I had a previous fellow working on it, but he cancelled due to other commitments) -------------------------------------- - PHP version 5.4 or newer - Framework: Yii - Scraping library: Goutte - Database: MySQL -------------------------------------- I have existing code that you can work off of if you wish. Actual project: ------------------ 1. Please see the attached ms word document for "complete" details, but basically you will be scraping data from websites via php. I'll start off with one site, and providing you do a good job, this will most likely be a job of about 10-15 sites, and maybe more. 2. You'll go to the webpage, download all applicable pages, and scrape the data. You will then 'reformat' this data, and insert it into a mySQL table. As it is "extracting", it would be nice to have some kind of counter (i.e., processing page 1/50) as it works, as well as making sure the script doesn't time out. (I.e., it's possible some scripts may take say 5-10 minutes to process). 3. I'd like a separate link included (php) that simply does a 'database' dump in HTML format. 4. For future (separate job from this), it will most likely be a 'maitenance' job. So for the future (which of course would be arranged in a separate project), probably 1-2x per month I'd want you just to go through the code to ensure everything is working a-ok. 5. Bonus - if you know how to use online .pdf to text pages (and/or can do that via curl/etc), that is a bonus. I'll have a separate project for you for that. Thanks!

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online