PHP HTML DOM Web Scraping Issues

$30-250 USD

Closed

Posted

about 10 years ago

$30-250 USD

Paid on delivery

Issue: I'm using the 'Simple HTML DOM Parser' for scraping a few pages from Amazon. Everything is working fine some of the time, but I keep getting a 'captcha' request from Amazon stopping the script because it is recognizing that I'm using a scraper. I can normally request around 3 pages before it stops my script with the following message "Sorry, we just need to make sure you're not a robot. For best results, please make sure your browser is accepting cookies.". What I've Tried: I've tried adding user-agents. I've also tried spacing out the requests to between 45-60 seconds. Neither have consistently worked. What I Need: I setup a test page for someone to get it to work properly (consistently) which contains the function for scraping the data and the 'Simple HTML DOM' library page included. I'm currently just echoing all of the HTML in the script so you can see if the page is returned or if Amazon is blocking the request with a Captcha. I'd like to keep the library I'm using ([login to view URL]) because I have other scripting based off of it. I also need this complete ASAP - tonight or tomorrow the latest. Thank you.

PHP

Software Architecture

Web Scraping

Project ID: 5487309

About the project

17 proposals

Remote project

Active 10 yrs ago

Looking to make some money?

Email address

Benefits of bidding on Freelancer

Set your budget and timeframe

Get paid for your work

Outline your proposal

It's free to sign up and bid on jobs

17 freelancers are bidding on average $167 USD for this job

@mantislin

Hi sir, I am scraping expert, I have did too many similar projects, please check my feedback then you will know. Can you tell me more details? then I will provide demo data for you. Thanks, Kimi

$230 USD in 5 days

4.9

(191 reviews)

7.0

@faizan101010

Hi, My name is Faizan and I can provide a solution so that you can parse data from amazon consistently. I need to see your code and I can fix it. Are you using CURL or file get contents ? Best Regards Faizan

$210 USD in 3 days

4.6

(14 reviews)

6.6

@Mezh

Hello, I'm familiar with 'Simple HTML DOM Parser' library and could try to help you with your task. Did you explore cookies set by Amazon? Maybe it will help the solution. Thanks, Alex

$150 USD in 1 day

4.9

(35 reviews)

5.8

@mituld

Hi I work towards providing reliable, relevant and robust IT solutions at most competitive prices to my customers. I ensure 100% customer satisfaction so lets start Thanks

$206 USD in 7 days

4.9

(38 reviews)

5.7

@stevecorsi

Hi, Its Steve. Ready for the task. Please see my reviews and previous projects. Can we discuss the details further ? Waiting for your reply !

$151 USD in 7 days

5.0

(26 reviews)

5.7

@zakir375

Sir, I'm professional, innovative and positive approach Web Developer with good commend on PHP/MySQL, Ajax, Jquery, HTML, CSS, JavaScript, WordPress Drupal and Joomla, I have developed a wide range of websites past 5 years, I am sure you will not be disappointed if you give me this opportunity. Best regard.

$230 USD in 5 days

5.0

(5 reviews)

4.5

@zeflex

Hello Sir, I thoroughly read your project description. I have an extensive experience with php/mysql + jquery and more(json, xml, api, ...). I am freelancer since 10 years. I have been also working for a big company who owns a network of adult tubes (youporn network). You can check my portfolio that features few websites I have programmed. I am available to work with you from today and onward. If you have any questions, feel free to contact me on Freelancer.com or through skype: djflexlive. Thanks

$160 USD in 3 days

5.0

(9 reviews)

4.4

@n1team

Hello, I've done many similar task before for my own needs. That can be fixed by using captcha recognizizng services. That will cost about 1$ per 1000 captchas.

$88 USD in 1 day

5.0

(8 reviews)

4.2

@cybert2t

Greetings Sr, About me. Resume: Over 9 years of involvement in computer software development, successfully working with individuals and group engagements through the Software Development Life Cycle, software design, e-commerce, security applications and websites on different platforms. Over 8 years of development experience with Internet based applications many of them within the Microsoft .NET (ASP.NET) platform and 5 within the Open Source (mostly PHP / MySQL) community. You can check me profile to know more about my skills and experience. I have made web scraping before, however I can notice that what you want is to simulate a web browser request to avoid amazon to send you captcha. I was think about using 3rd part captcha killers but you dont want this, you want to simulate a real web browser request, and I can do that. I will use all the tricks I know to trick amazon, however you should know that in the future amazon could upgrade his algorithm to detect any trick. Nevertheless I will use all the browsers information in a request to avoid to be detected, anyways we will need to space out to requests to avoid any suspicius act from your server. It will take some time to make a lot of test against amazon, so I think it would take at most 3 days. Waiting for your answer. Kind Regards Henny Labrada

$200 USD in 3 days

4.7

(8 reviews)

3.9

@derek8691

Hi, Are you saving the cookies between each page scrape? You have to set up a file/db to store the cookies. Amazon may also be blocking you due to javascript. Since your using PHP to get the page javascript doesn't get run and they try and validate you with a captcha. I can build you a scraper for Amazon with a built-in browser that runs in windows and runs the javascript in order to test it out. Also there are several APIs out there that can enter the captchas for you automagically. Please let me know if you have any questions or would like to discuss the project further. Thanks, Derek

$210 USD in 2 days

5.0

(9 reviews)

3.5

@webticsindia

A proposal has not yet been provided

$166 USD in 10 days

0.0

(0 reviews)

0.0

@Quickf1x

Hi, We built some software to scrap Google so there should be no problem scrapping Amazon. If you can please send me the example page which you would like to scrap information from. Regards

$200 USD in 3 days