Wikipedia Scraper

  • Status Closed
  • Budget $250 - $750 USD
  • Total Bids 8

Project Description

I need a small standalone desktop application to scrape information from Wikipedia. To apply for this job you should have a strong experience with Wikipedia data mining.

The software should do the following:

1. The software should have an URL input box. the user should be able to copy and paste several Wikipedia URLs into the input box. The software will go to wikipedia and extract the text and images from Wikipedia.

2, It will save the article content from a single URL and images into a Word document and save. It will do this for all the urls separately.

3. Scrape the databox on the right hand side of the page. see [url removed, login to view] and check out the "Great White Pelican" databox on the right. I need to have the information saved to a table and added tot the scrape text.

4. It will save all the images from each URL to a separate folders and named each folder with the URL title where the images came from.

5. Search and remove all Wikipedia internal link numbers like [12].

After this project. I will do another project that will combine this information into a database that is easy to search which will be used for an app.

feel free to suggest the best method to do this.

happy bidding.

Get free quotes for a project like this

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online