I'm using the 'Simple HTML DOM Parser' for scraping a few pages from Amazon. Everything is working fine some of the time, but I keep getting a 'captcha' request from Amazon stopping the script because it is recognizing that I'm using a scraper. I can normally request around 3 pages before it stops my script with the following message "Sorry, we just need to make sure you're not a robot. For best results, please make sure your browser is accepting cookies.".
What I've Tried:
I've tried adding user-agents. I've also tried spacing out the requests to between 45-60 seconds. Neither have consistently worked.
What I Need:
I setup a test page for someone to get it to work properly (consistently) which contains the function for scraping the data and the 'Simple HTML DOM' library page included. I'm currently just echoing all of the HTML in the script so you can see if the page is returned or if Amazon is blocking the request with a Captcha. I'd like to keep the library I'm using ([url removed, login to view]) because I have other scripting based off of it. I also need this complete ASAP - tonight or tomorrow the latest.
17 freelancers are bidding on average $167 for this job
Hi sir, I am scraping expert, I have did too many similar projects, please check my feedback then you will know. Can you tell me more details? then I will provide demo data for you. Thanks, Kimi
Hello, I'm familiar with 'Simple HTML DOM Parser' library and could try to help you with your task. Did you explore cookies set by Amazon? Maybe it will help the solution. Thanks, Alex
Hi I work towards providing reliable, relevant and robust IT solutions at most competitive prices to my customers. I ensure 100% customer satisfaction so lets start Thanks
Hello, I've done many similar task before for my own needs. That can be fixed by using captcha recognizizng services. That will cost about 1$ per 1000 captchas.
Hi, We built some software to scrap Google so there should be no problem scrapping Amazon. If you can please send me the example page which you would like to scrap information from. Regards