We are looking for someone able to create a public search engine using elastic search and nutch for crawling or the constellio system.
What we need:
1. Crawl filestube.com, filestram.com, general-files.com, kat.ph and torrent.eu
2. Use the best technique to crawl up to 1 - 2 million pages per day.
3. extract all the files name + download links
4 stock it in our database.
4. Make it "searchable" inside our search engine.
We have the global idea but looking for someone able to advice us how to realize this project like a consultant and then provide the technology to start the project.