Completed

Wikipedia Crawler, Put HTML article into MySQL database

This project was successfully completed by juliasoft for $248 USD in 7 days.

Get free quotes for a project like this
Employer working
Completed by:
Skills Required
Project Budget
$30 - $250 USD
Completed In
7 days
Total Bids
12
Project Description

We need to crawl the entire Wikipedia (around 4 million articles) and copy the HTML articles into a MySQL database.

Images will link to their original URL in Wikipedia. However, articles should be linked locally in our own server, not to Wikipedia.

Snapshots of the database structure are available in the two images attached.

The crawler might have to be "non-aggressive", given that an aggressive crawler will be cause the process to get killed by Wikipedia's servers.

The crawler should crawl the entire Wikipedia automatically once time every month.

We expect that it takes about 1 to 2 weeks to crawl Wikipedia in a non- aggressive way.

EXAMPLE OF PROJECT WEBSITE PAGE
[url removed, login to view]

CORRESPONDING WIKIPEDIA PAGE
[url removed, login to view]

If you need more information or have any proposal please contact us.

Thanks!

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online