A php script needs to scrape all the web pages on our linux server which are in php and html format.
We need a coder to create the script that will spider the root directory of the server, and all sub-directories. It will find all the web pages, and read the contents of each file. By reading the contents, it will determine which images are linked to which page. It will then create a CSV file which will show us a list of all images on this server, and all the pages that each image is linked to.
Note that most images are in the /images folder, but we have some Joomla websites that have their own images folders as well, so you have to crawl everything from the root directory downwards.
The output file needs to detail the URL of the web page and all the image url's associated with the page. The script should be able to capture all types of images from css, pdf, swf, jpegs, gif.
## Deliverables
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):
a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.
b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
## Platform
Internet