Web scraper in Python with Scrapy ([url removed, login to view]) for Google

Cancelled

I need to scrape Google search results, using Python with Scrapy ([url removed, login to view]).

My problem is that Google blocks automated scraping.

I need help to find how to configure the scraper (increase scraping delay?) and/or an anonymous proxy (like Tor+Privoxy) to be able to scrape Google search results.

What I have so far:

1) Simple Google parser:

def parse(self, response):

hxs = HtmlXPathSelector(response)

if [url removed, login to view]('[url removed, login to view]'):

for url in [url removed, login to view]('//div[@id="ires"]/ol/li//h3[@class="r"]/a/@href').extract():

... # Here parse google links

for url in [url removed, login to view]('//a[@id="pnnext"]/@href').extract():

url = "https://" + [url removed, login to view]('/')[2] + url

yield Request(url)

This simple parser, without any proxy, gets recognized as an automated scraper and blocked.

2) I installed Tor+Privoxy, with this middleware class:

class ProxyMiddleware(object):

def process_request(self, request, spider):

[url removed, login to view]['proxy'] = "http://localhost:8118"

configured in the settings:

DOWNLOADER_MIDDLEWARES = {

'[url removed, login to view]': 110,

'[url removed, login to view]': 100,

}

But scrapy seems not to work with Tor+Privoxy on https pages (with http scrapy+tor+privoxy works, but Google now only works with https).

So what I actually need is a sample project with detailed proxy configuration (Tor/Privoxy or else) on how to avoid being blocked by Google because of automated scraping.

Skills: PHP, Software Architecture

See more: privoxy tor scraping google, python parse, scrapy privoxy, scrape google scrapy, scrapy google search, scrapy google search results, python scrapy google search results, python scraping google, using scrapy tor, scrapy tor, scraping google tor, python scrape google, scrape search results scrapy, web scraping python 3, web scraping https, simple scraping software, scrapy org, r architecture, python find, how to work for google, find python, scraper google python, web spider software, web scraping with r

Project ID: #4253255

5 freelancers are bidding on average $190 for this job

SigmaVisual

I can help in your project, please check PMB and our ratings/reviews to get idea of our experience. Please let me know if you have any queries.

$199 USD in 5 days
(218 Reviews)
7.7
bob1982

is php ok to you? thanks

$250 USD in 5 days
(337 Reviews)
6.8
mantislin

Hi sir, please check PM, thx Kimi.

$250 USD in 5 days
(118 Reviews)
6.4
exprtsolution

i can make this project., please check pm.. thanks

$150 USD in 7 days
(5 Reviews)
2.3
AstreyLabs

Hi I have solution for your task. Go ahead.

$100 USD in 1 day
(0 Reviews)
0.0