web data extraction/web scraping


Hi -- I need to extract some information from a few different websites, to put it into a 'nice' easy to read format for myself. It's a relatively easy job - but providing this is done well, I'll have a number of additional sites for you to scrape. I have some sample code that I can send you (from a previous completed project, which is a sample of the data extraction. It works for the first two sites I will be asking you to do, however I am going to be asking for a bit more information to be extracted, as well ). Criteria: ---------- 1. Please only bid if you are American, Canadian, European or Phillipino. All other bids will most likely be ignored (simply a matter of coding/quality issue). 2. It should be done in PHP/mySQL. You may need to use CURL (as some sites will have basic password/login authentication). 3. Preferably a good english speaking/reading/writing level. 4. If you have some code samples, that would help. (I am looking for someone that implements 'good' coding standards). 5. For this bid, it is for 'two' scrapes/websites (in my existing code). However, should you do a good job, I'll be extending this to another 10-15 sites. Some sites will have data scraped from 'wordpress' type websites, while others will simply be 'directory' style websites. I'm estimating probably 3-5 hours to complete this. (especially because I have some sample code I'm sending you, and it mainly requires tweaking/making it look good/etc). Other technical details: Ideally someone will have experience in the following. (I had a previous fellow working on it, but he cancelled due to other commitments) -------------------------------------- - PHP version 5.4 or newer - Framework: Yii - Scraping library: Goutte - Database: MySQL -------------------------------------- I have existing code that you can work off of if you wish. Actual project: ------------------ 1. Please see the attached ms word document for "complete" details, but basically you will be scraping data from websites via php. I'll start off with one site, and providing you do a good job, this will most likely be a job of about 10-15 sites, and maybe more. 2. You'll go to the webpage, download all applicable pages, and scrape the data. You will then 'reformat' this data, and insert it into a mySQL table. As it is "extracting", it would be nice to have some kind of counter (i.e., processing page 1/50) as it works, as well as making sure the script doesn't time out. (I.e., it's possible some scripts may take say 5-10 minutes to process). 3. I'd like a separate link included (php) that simply does a 'database' dump in HTML format. 4. For future (separate job from this), it will most likely be a 'maitenance' job. So for the future (which of course would be arranged in a separate project), probably 1-2x per month I'd want you just to go through the code to ensure everything is working a-ok. 5. Bonus - if you know how to use online .pdf to text pages (and/or can do that via curl/etc), that is a bonus. I'll have a separate project for you for that. Thanks!

Skills: Adobe Flash, Shopping Carts

See more: yii online job, writing shopping online, writing on pdf online, writing about writing pdf, wish shopping, web scraping process, web scraping online job, web pages scripts, web page making easy, site scraping online, scraping data from web database, reformat data, online web coding, online technical writing course, online shopping script in php, online shopping project in html, online job only c++ coding, is php web scraping, good coding websites, future online job, extracting bit, directory web scraping, data extraction from web, canadian online job, american sites for online working

Project ID: #5712392