In Progress

EzineArticles Scraper

I am looking for a PHP script that scrapes all of EzineArticles and saves each article as a MySQL entry that includes the URL, Title, Category, and Article Text.

Your script should find, scrape, and store every single article on EzineArticles (there will be millions of them). So with that in mind there should be some sort of threading to help speed things up.

I have thousands of private proxies, so there should also be the ability for me to provide a text file with proxies. Some of the proxies will have usernames and passwords and some won't (so you will need to account for both). So I would recommend having several hundred threads with some sort of proxy switcher in place.

A good way to do this (without getting IPs banned) is to have a universal list that keeps track of what proxies are being used by a thread and which ones aren't. Then every couple articles you pull a new IP that currently isn't being used.

If a page fails to load properly (either because EzineArticles rate limited you or because the proxy itself was having issues) you should have it try again using a different proxy. If a page fails 10 consecutive times, have it save in the database that it failed (make everything blank but the URL) and then continue.

Lastly it needs to save its progress, so if the script is closed for some reason it can continue where it is left off. This can be controlled by data in a MySQL database as well.

MySQL structure:

| URL | Title | Category | Article |

Proxy document structure:

IP:Port:Username:Password

IP:Port

So : separates IP and Port (and Username and Password if it exists). Proxies are separated by newline.

When testing you do not need to test it all the way until completion (since you won't have the proxies besides a couple ones to do testing) but when it is done I will need to run it myself and make sure it is working before paying (and make sure that it will find every article).

The actual scraping/parsing should be relatively easy as the articles are always in a very well defined tag.

A good way to find every article is just go through each category page and go through every single listing.

I will accept applications for people who don't use PHP. Let me know your language and I will decide. However PHP is preferred.

When contacting me please let me know how you will be scraping the site (what framework).

Skills: PHP, Software Architecture

See more: ezinearticles scraper, where to find software testing, what is a good data entry speed, universal data entry, testing data entry speed, proxy testing software, i would recommend, data structure through c language, data structure sort, applications of data structure, ezinearticles php scraper, would recommend, scraper mysql php, text file scraping, php parsing text, title ezinearticles, scrape usernames, list data entry applications, ezinearticles entry, username scraper, page scraping mysql, private proxy list, data proxy php, page scraper script, easy url scraper

About the Employer:
( 8 reviews ) Baltimore, United States

Project ID: #4014744

Awarded to:

arickbro

Dear Sir, I've completed ezine scrapper before. please check your message

$100 USD in 3 days
(8 Reviews)
4.5

6 freelancers are bidding on average $177 for this job

SigmaVisual

I can help in your project, please check PMB and our ratings/reviews to get idea of our experience. Please let me know if you have any queries.

$250 USD in 7 days
(220 Reviews)
7.7
mantislin

Hi sir, please check PM, thx Kimi.

$60 USD in 2 days
(118 Reviews)
6.4
sayno2bugs

Seasoned scraper writer with hundreds of scraper scraping millions of pages each day. Please check my reviews to know more about my work : [url removed, login to view] More in pm. Cheers, SayN More

$500 USD in 10 days
(18 Reviews)
6.0
bistanil98

Please check in PM.

$55 USD in 2 days
(9 Reviews)
4.8
meldev

hello, i am on inbox. we must discuss.

$97 USD in 1 day
(6 Reviews)
2.8