Script required to combine the "visible content" (text only) of multiple htm files into one large Text-File.
Attention: script is required within 48h hours (tested and working!), please do not apply if you cannot provide in this timeframe.
- htm files are stored in one folder and in multiple subfolders
- other file types are stored in these folders as well, but are not to be combined; only htm files are to be processed
- the text content of the htm files are to be combined in one single text-file, one beneath the other and separated by a separating phrase.
- files with certain key phrase in the filename are to be excluded (e.g. "old"; "ref")
- Script must be able to handle large amount of data without problems (>100.000 htm files) and should be able to process >1 file per second.
- optional: progress should be indicated (number of files processed and total file number)
Data sample, interface and sample output is attached.