Wikipedia Crawler, Put HTML article into MySQL database

This project was successfully completed by juliasoft for $248 USD in 7 days.

Get free quotes for a project like this
Employer working
Project Budget
$30 - $250 USD
Completed In
7 days
Total Bids
Project Description

We need to crawl the entire Wikipedia (around 4 million articles) and copy the HTML articles into a MySQL database.

Images will link to their original URL in Wikipedia. However, articles should be linked locally in our own server, not to Wikipedia.

Snapshots of the database structure are available in the two images attached.

The crawler might have to be "non-aggressive", given that an aggressive crawler will be cause the process to get killed by Wikipedia's servers.

The crawler should crawl the entire Wikipedia automatically once time every month.

We expect that it takes about 1 to 2 weeks to crawl Wikipedia in a non- aggressive way.


[url removed, login to view]


[url removed, login to view]

If you need more information or have any proposal please contact us.


Completed by:
Skills Required

Browse Related Skills

Related Projects

Other things people do on Freelancer

Related Articles

Latest Articles

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online