PHP HTML DOM Web Scraping Issues

Closed

Issue:

I'm using the 'Simple HTML DOM Parser' for scraping a few pages from Amazon. Everything is working fine some of the time, but I keep getting a 'captcha' request from Amazon stopping the script because it is recognizing that I'm using a scraper. I can normally request around 3 pages before it stops my script with the following message "Sorry, we just need to make sure you're not a robot. For best results, please make sure your browser is accepting cookies.".

What I've Tried:

I've tried adding user-agents. I've also tried spacing out the requests to between 45-60 seconds. Neither have consistently worked.

What I Need:

I setup a test page for someone to get it to work properly (consistently) which contains the function for scraping the data and the 'Simple HTML DOM' library page included. I'm currently just echoing all of the HTML in the script so you can see if the page is returned or if Amazon is blocking the request with a Captcha. I'd like to keep the library I'm using ([url removed, login to view]) because I have other scripting based off of it. I also need this complete ASAP - tonight or tomorrow the latest.

Thank you.

Skills: PHP, Software Architecture, Web Scraping

See more: html issue, amazon php script, simple scraping software, scraping the web, is php web scraping, get php projects work, best web scraping software, best web scraping, best php software, what is data scraping, Web Scraping Software , web scraping amazon, scraping amazon, php html, http web requests, html parser, html in, dom, d robot, amazon web scraping, amazon scraper, web scraping amazon php, projects using html php, html based simple projects, php test page

Project ID: #5487309

17 freelancers are bidding on average $167 for this job

SigmaVisual

Dear Client, I can help in your project. We have already experience of working on similar projects. Please see below to get idea of our experience: Amazon/Ebay Bots: http://sigma-dns.sigmavirtual.com/PDemo1/Am More

$144 USD in 3 days
(255 Reviews)
8.0
mantislin

Hi sir, I am scraping expert, I have did too many similar projects, please check my feedback then you will know. Can you tell me more details? then I will provide demo data for you. Thanks, Kimi

$230 USD in 5 days
(191 Reviews)
7.0
faizan101010

Hi, My name is Faizan and I can provide a solution so that you can parse data from amazon consistently. I need to see your code and I can fix it. Are you using CURL or file get contents ? Best Regards Fai More

$210 USD in 3 days
(14 Reviews)
6.6
Mezh

Hello, I'm familiar with 'Simple HTML DOM Parser' library and could try to help you with your task. Did you explore cookies set by Amazon? Maybe it will help the solution. Thanks, Alex

$150 USD in 1 day
(35 Reviews)
5.8
mituld

Hi I work towards providing reliable, relevant and robust IT solutions at most competitive prices to my customers. I ensure 100% customer satisfaction so lets start Thanks

$206 USD in 7 days
(38 Reviews)
5.7
stevecorsi

Hi, Its Steve. Ready for the task. Please see my reviews and previous projects. Can we discuss the details further ? Waiting for your reply !

$151 USD in 7 days
(26 Reviews)
5.7
ruipimentel

Hello, My name is Rui Pimentel and I've more than 6 years of experience in development of web crawling applications. I have a lot of suggestions that can be applied to avoid that situation. Layer 1, might not be tha More

$83 USD in 3 days
(32 Reviews)
5.2
mmadi

Hi, I'll be happy to do that for you. I have rich experince in scrapping using curl regular expressions Dom and Selenium RC. I worked for travelfox.com and planeandtrain.com search engine where I gain my experience More

$144 USD in 3 days
(20 Reviews)
5.2
zakir375

Sir, I'm professional, innovative and positive approach Web Developer with good commend on PHP/MySQL, Ajax, Jquery, HTML, CSS, JavaScript, WordPress Drupal and Joomla, I have developed a wide range of websites past More

$230 USD in 5 days
(5 Reviews)
4.5
zeflex

Hello Sir, I thoroughly read your project description. I have an extensive experience with php/mysql + jquery and more(json, xml, api, ...). I am freelancer since 10 years. I have been also working for a big co More

$160 USD in 3 days
(9 Reviews)
4.4
n1team

Hello, I've done many similar task before for my own needs. That can be fixed by using captcha recognizizng services. That will cost about 1$ per 1000 captchas.

$88 USD in 1 day
(8 Reviews)
4.2
cybert2t

Greetings Sr, About me. Resume: Over 9 years of involvement in computer software development, successfully working with individuals and group engagements through the Software Development Life Cycle, software des More

$200 USD in 3 days
(8 Reviews)
3.9
derek8691

Hi, Are you saving the cookies between each page scrape? You have to set up a file/db to store the cookies. Amazon may also be blocking you due to javascript. Since your using PHP to get the page javascript doesn't More

$210 USD in 2 days
(9 Reviews)
3.5
ppandare

We are '3stechmind' team of dedicated software professionals developing the web projects for our clients.Our area of expertise is design the web site and implement the functional aspects in software technolgies like ph More

$185 USD in 3 days
(2 Reviews)
1.3
orGuxpIQhShl

Hi we are a team of freelance software developers, if you contact me at our website we can discuss the details of the project. w w w . solver.io

$155 USD in 3 days
(0 Reviews)
0.0
webticsindia

A proposal has not yet been provided

$166 USD in 10 days
(0 Reviews)
0.0
devop

Hi ! CAPTCHAs are hard to bypass. To achieve your goals you will need to use a "CAPTCHA Solver" system, like bypasscaptcha.com or deathbycaptcha.com If you are willing to pay for their service (DeathByCaptcha charges More

$88 USD in 3 days
(0 Reviews)
0.0
Quickf1x

Hi, We built some software to scrap Google so there should be no problem scrapping Amazon. If you can please send me the example page which you would like to scrap information from. Regards

$200 USD in 3 days
(0 Reviews)
0.0