Scrapy Web Spider

IN PROGRESS
Bids
15
Avg Bid (USD)
$119
Project Budget (USD)
$30 - $250

Project Description:
Hi Guys,
I looking for someone to work on a scrapy project for me. I need a simple generic crawler that will start at a given domain and on each page (only within that domain) extract
->the anchor (or anything between the )
->and corresponding link


The crawler need to be able to pull the domain to start crawling from a MYSQldb and one other variable which would need to be pass back as a value when the results back to a database

It must allow for more than one spiders to be running at the same time as well as I'll have it on a cron job. It should work something like

START SCRIPT
CONNECT TO DB

SELECT FROM TABLE

WHILE(TRUE)
GET URL PLUS CORRESPONDING DOMAIN-ID VARIABLE FROM TABLE
START NEW SPIDER
LOAD URLS
EXTRACT ALL URLS AND ANCHORS FOUND ON EACH PAGE
SAVE RESULT TO DB (insert into %s set myurl, myanchor, urlid value ( url, anchor,%s domain-id)
LOOP

When each spider is done crawling I need it to update another table to say its finished
update crawldone where id = %s,domain-id

If you already have a scrapy spider running and you can modify it to do something similar that's nice as well

Skills required:
Python, Web Scraping
About the employer:
Verified
Public Clarification Board
Bids are hidden by the project creator. Log in as the employer to view bids or to bid on this project.
You will not be able to bid on this project if you are not qualified in one of the job categories. To see your qualifications click here.


$ 99
in 3 days
Hire webscrapinggurus
$ 30
in 7 days
$ 350
in 7 days
$ 100
in 5 days
$ 250
in 5 days
Hire matideveloper
$ 100
in 2 days
$ 99
in 4 days
Hire happycoder2010
$ 100
in 3 days
$ 30
in 2 days
Hire B5UuAK8w0
$ 250
in 1 days