I need a program that will take a directory (a web server document root) and scan all files containing html links (.htm, .html, .shtml, .php, .css, .js, etc) and cross reference them against the Google Safe Browsing API to determine whether they're malicious or not. If a malicious link is found I would like the output to be the following:
Format: <path to file containing link relative to document root> <malicious link>
Example: [url removed, login to view] [url removed, login to view]
It should not output the same file/link more than once, even if it's listed multiple times within the file.
Please note that I'm not dead set on the google safe browsing API - if there is another URIBL/malicious site database I am open to it. Please send your suggestions along with your bid. I am open to languages other than PHP - I can handle pretty much anything except perl.
If you do choose to use the Google API, please see the google api documentation here:
[url removed, login to view]