Wikipedia Crawler, Put HTML article into MySQL database

IN PROGRESS
Bids
12
Avg Bid (USD)
$1257
Project Budget (USD)
$30 - $250

Project Description:
We need to crawl the entire Wikipedia (around 4 million articles) and copy the HTML articles into a MySQL database.

Images will link to their original URL in Wikipedia. However, articles should be linked locally in our own server, not to Wikipedia.

Snapshots of the database structure are available in the two images attached.

The crawler might have to be "non-aggressive", given that an aggressive crawler will be cause the process to get killed by Wikipedia's servers.

The crawler should crawl the entire Wikipedia automatically once time every month.

We expect that it takes about 1 to 2 weeks to crawl Wikipedia in a non- aggressive way.

EXAMPLE OF PROJECT WEBSITE PAGE
http://motipedia.com/category.php?catId=29&subcat=sub

CORRESPONDING WIKIPEDIA PAGE
http://en.wikipedia.org/wiki/Art

If you need more information or have any proposal please contact us.

Thanks!

Skills required:
Java, MySQL, PHP, Python, Wikipedia
Additional Files: content.png table.png
About the employer:
Verified
Public Clarification Board
Bids are hidden by the project creator. Log in as the employer to view bids or to bid on this project.
You will not be able to bid on this project if you are not qualified in one of the job categories. To see your qualifications click here.


$ 10309
in 3 days
Hire juliasoft
$ 248
in 7 days
$ 257
in 7 days
Hire arickbro
$ 222
in 3 days
Hire webstechno
$ 155
in 3 days
$ 231
in 9 days
$ 277
in 5 days
$ 84
in 3 days
Hire techlotus
$ 1340
in 25 days
Hire WaterWebDev1
$ 88
in 3 days