I am the author of 3 books. Each book exists as a series of Word/PDF files. I need to know if any whole paragraph appears verbatim in more than one book.
I imagine that this is a fairly simple matter of extracting plain text from the chapter files, creating 3 large files (each of which contains the complete text of one book), extracting paragraphs from a file one at a time, and then using perl/grep/whatever to determine if that paragraph appears verbatim in the other files.
## Deliverables
This is a relatively simple job. I just need you to create the script, run it on my files, and tell me the answer.