You have chosen to sponsor your bid up to a maximum amount of .
I need a small standalone desktop application to scrape information from Wikipedia. To apply for this job you should have a strong experience with Wikipedia data mining.
The software should do the following:
1. The software should have an URL input box. the user should be able to copy and paste several Wikipedia URLs into the input box. The software will go to wikipedia and extract the text and images from Wikipedia.
2, It will save the article content from a single URL and images into a Word document and save. It will do this for all the urls separately.
3. Scrape the databox on the right hand side of the page. see http://en.wikipedia.org/wiki/Great_white_pelican and check out the "Great White Pelican" databox on the right. I need to have the information saved to a table and added tot the scrape text.
4. It will save all the images from each URL to a separate folders and named each folder with the URL title where the images came from.
5. Search and remove all Wikipedia internal link numbers like .
After this project. I will do another project that will combine this information into a database that is easy to search which will be used for an app.
feel free to suggest the best method to do this.