I need someone to recover articles from a website that I have on disk.
Write a program to read the html page by page and produce something like a mysql sql load file with fields user, title, text for each page.
The items that I want could be in meta or body or anywhere.
There should be up to 50,000 articles
I require the written and tested software source as the deliverable, so I can run a number of times adjusting for content, or changing the spec.
I'll authorise payment on receiving software that works.
Language preferably c# but could accept php run from the command line on windows.