Wikipedia Scraper

CANCELLED
Bids
8
Avg Bid (USD)
$434
Project Budget (USD)
$250 - $750

Project Description:
I need a small standalone desktop application to scrape information from Wikipedia. To apply for this job you should have a strong experience with Wikipedia data mining.

The software should do the following:
1. The software should have an URL input box. the user should be able to copy and paste several Wikipedia URLs into the input box. The software will go to wikipedia and extract the text and images from Wikipedia.

2, It will save the article content from a single URL and images into a Word document and save. It will do this for all the urls separately.

3. Scrape the databox on the right hand side of the page. see http://en.wikipedia.org/wiki/Great_white_pelican and check out the "Great White Pelican" databox on the right. I need to have the information saved to a table and added tot the scrape text.

4. It will save all the images from each URL to a separate folders and named each folder with the URL title where the images came from.

5. Search and remove all Wikipedia internal link numbers like [12].

After this project. I will do another project that will combine this information into a database that is easy to search which will be used for an app.

feel free to suggest the best method to do this.

happy bidding.

Skills required:
C Programming, PHP, Software Architecture
Hire Zackmo
Project posted by:
Zackmo Ireland
Verified
Public Clarification Board
Bids are hidden by the project creator. Log in as the employer to view bids or to bid on this project.
You will not be able to bid on this project if you are not qualified in one of the job categories. To see your qualifications click here.


$ 721
in 21 days
$ 315
in 5 days
$ 250
in 3 days
Hire johan777
$ 350
in 10 days
Hire vikasglobus13
$ 333
in 3 days
Hire SolidCoding
$ 250
in 4 days
$ 700
in 21 days
$ 555
in 3 days