I run a data analytics consultancy in Sydney, Australia. I have a requirement to setup weekly scrapes of at least 5 large (50,000+ SKUs) online retail websites. The project will be split to scrape a single website to start with, extending to the others assuming satisfaction with the work completed.
- Scrapes must be written in Ruby using Mechanize/Nokogiri or Watir as appropriate
- Where possible, avoiding JS and scrape via direct HTTP requests using Mechanize - i.e requires strong skills in tracing/debugging HTTP requests/responses
- Alternatively, load page using Watir webdriver then pass off cookies to Mechanize
- Last resort would be brute force Watir webdriver approach (scroll, point, click, etc)
- Implement utilities for use of proxy / IP rotation where required (access to pool of anonymous IP addresses will be provided)
- Scheduling via cron job to run at least weekly
- Must use object oriented approach and design in a DRY manner so that code can be leveraged for future scrapes of additional sites and ensure future code manageability
- Must use Activerecord to persist records scraped
- Must implement logging (production logging to file + verbose debug option to STDOUT)
- Must be a great communicator both in terms of coding concepts and project management
- Good at clearly laying out the workplan, milestones and progress
- Must be available for 2x daily stand-up calls with Sydney AUS (timing TBC at mutually agreed time)
- Must be able to demonstrate prior experience with ruby web scrape projects similar to above
- Code must be well-structured and well-documented / commented
Opportunity for ongoing Ruby backend and RoR web app build if this project is a success.
21 freelancers are bidding on average $37/hour for this job
We have worked on the large scale scrapers which work in the same way as you described in the project requirements. Are you available to discuss the details now? Talk soon, Zoran
Hello, I am scraping expert, I have did too many similar projects, Please check my feedback then you will know. Can you tell me more details? Then I will provide demo data for you.