In Progress

Wikipedia Crawler, Put HTML article into MySQL database

We need to crawl the entire Wikipedia (around 4 million articles) and copy the HTML articles into a MySQL database.

Images will link to their original URL in Wikipedia. However, articles should be linked locally in our own server, not to Wikipedia.

Snapshots of the database structure are available in the two images attached.

The crawler might have to be "non-aggressive", given that an aggressive crawler will be cause the process to get killed by Wikipedia's servers.

The crawler should crawl the entire Wikipedia automatically once time every month.

We expect that it takes about 1 to 2 weeks to crawl Wikipedia in a non- aggressive way.

EXAMPLE OF PROJECT WEBSITE PAGE

[url removed, login to view]

CORRESPONDING WIKIPEDIA PAGE

[url removed, login to view]

If you need more information or have any proposal please contact us.

Thanks!

Skills: Java, MySQL, PHP, Python, Wikipedia

See more: wikipedia crawler, wiki website us, wikipedia's website, php catid, mysql database server, get wikipedia, get a wikipedia, example proposal project, example of project proposal, example of it project proposal, article proposal, an example of project proposal, mysql wiki, php wikipedia, database crawler, article sub, html java mysql, linked server database, killed, java crawl, java crawler url database, information crawler, java url information, articles java, html wikipedia

About the Employer:
( 1 review ) Tokyo, Japan

Project ID: #4500789

Awarded to:

juliasoft

Please, see private message for details.

$248 USD in 7 days
(54 Reviews)
5.3

10 freelancers are bidding on average $470 for this job

akhila27

Scraping Experts Here. Check the message and contact us. Scraping samples are also attached.

$10309 USD in 3 days
(15 Reviews)
6.1
on2it

Hi, We at on2itonline would love to work with you on this, we do MAD "full service" websites on a clear and fixed budget. We have lots to offer that others don't ,like our lifetime FREE hosting on every website for a s More

$257 USD in 7 days
(5 Reviews)
5.0
arickbro

Hi sir, please check PM for more detail, I've completed similar project before which crawl more than 1.5million page

$222 USD in 3 days
(10 Reviews)
4.7
webstechno

Please check our private message for more details. Thanks

$155 USD in 3 days
(14 Reviews)
4.6
expertc0ding

Expert Solution here. Check PM

$231 USD in 9 days
(3 Reviews)
2.4
MostafaSM

please check your PM

$211 USD in 4 days
(0 Reviews)
0.0
au3

. .

$1666 USD in 30 days
(0 Reviews)
1.4
WaterWebDev1

Hello my name is Duane and i am new to [url removed, login to view], i am ready to start working on this project asap. I am in need of positive feedback. Please see my personal message to you in your inbox.

$88 USD in 3 days
(0 Reviews)
0.0
techlotus

It's an easy task for us. We have gone through your requirements and we are ready to start the work immediately on your project. We will send you the complete list of company's projects and portfolio once you reply us More

$1340 USD in 25 days
(0 Reviews)
0.0
jaysonreis

Hi, I have strong crawling skills and if i get this project I would do it with scrappy a python "scrapping framework".

$277 USD in 5 days
(0 Reviews)
0.0
BHARATTECH3

We have 8+ experienced software developer [url removed, login to view] allow us to do your desired project skillfully and accurately.

$84 USD in 3 days
(1 Review)
0.0