Script To Scrape Website From Archive.org - repost
This project was successfully completed by tzo for $210 USD in 3 days.Get free quotes for a project like this
Project Budget$30 - $250 USD
Completed In3 days
I need a script written to scrape a website from www.archive.org.
The script will remove all [url removed, login to view] tags/ads in the code, and download all files into the root directory as original folders and sub folders.
The downloaded website should be complete as it is on [url removed, login to view], and able to be uploaded without further code modification.
I provide an URL like [url removed, login to view]://[url removed, login to view]; to the script, and it will get ALL content on the page (including subpages)
The URL Structure of the site musn't change.
Each site recovery should contain all pages in HTML format,
All images that the sites was using should e downloaded.
URL structure of the sites should be exactly as it was with original site including links to images internal and outbound links.
Files passing variables (example ending with ?dvar=variable) should also be saved as original
Need simple web interface, where I enter the starting [url removed, login to view] URL
I dont care what what language its written in as long as it works on Mac OSX
Looking to make some money?
- Set your budget and the timeframe
- Outline your proposal
- Get paid for your work
Hire Freelancers who also bid on this project
Looking for work?
Work on projects like this and make money from home!Sign Up Now
- The New York Times
- Wall Street Journal
- Times Online