You have chosen to sponsor your bid up to a maximum amount of .
We need to crawl the entire Wikipedia (around 4 million articles) and copy the HTML articles into a MySQL database.
Images will link to their original URL in Wikipedia. However, articles should be linked locally in our own server, not to Wikipedia.
Snapshots of the database structure are available in the two images attached.
The crawler might have to be "non-aggressive", given that an aggressive crawler will be cause the process to get killed by Wikipedia's servers.
The crawler should crawl the entire Wikipedia automatically once time every month.
We expect that it takes about 1 to 2 weeks to crawl Wikipedia in a non- aggressive way.
EXAMPLE OF PROJECT WEBSITE PAGE
CORRESPONDING WIKIPEDIA PAGE
If you need more information or have any proposal please contact us.