Closed

Scrapy Web Spider

This project was awarded to happycoder2010 for $100 USD.

Get free quotes for a project like this
Employer working
Awarded to:
Skills Required
Project Budget
$30 - $250 USD
Total Bids
15
Project Description

Hi Guys,
I looking for someone to work on a scrapy project for me. I need a simple generic crawler that will start at a given domain and on each page (only within that domain) extract
->the anchor (or anything between the )
->and corresponding link


The crawler need to be able to pull the domain to start crawling from a MYSQldb and one other variable which would need to be pass back as a value when the results back to a database

It must allow for more than one spiders to be running at the same time as well as I'll have it on a cron job. It should work something like

START SCRIPT
CONNECT TO DB

SELECT FROM TABLE

WHILE(TRUE)
GET URL PLUS CORRESPONDING DOMAIN-ID VARIABLE FROM TABLE
START NEW SPIDER
LOAD URLS
EXTRACT ALL URLS AND ANCHORS FOUND ON EACH PAGE
SAVE RESULT TO DB (insert into %s set myurl, myanchor, urlid value ( url, anchor,%s domain-id)
LOOP

When each spider is done crawling I need it to update another table to say its finished
update crawldone where id = %s,domain-id

If you already have a scrapy spider running and you can modify it to do something similar that's nice as well

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online