I require you to help me implement a solution that will allow me to scrape and process a huge amount of data per minute.
The end product should support scraping of approximately 1000 random webpages per minute.
We will assume that these pages are from random websites on the internet and take approximately 3-5seconds to load and a further 2 seconds to process (extract patterns and insert into database). You however, will only be required for the Sever / Language recommendation part and some basic programming to show me how it all fits together.
Ideally I would like to work with PHP/Multi-Threading/PHP-SIMPLE-DOM but I have a strong feeling this is to resource intensive for what I require, hopefully someone can prove me wrong. What's the fastest way we can get this done?
You know exactly what is needed, now you need to sell yourself to me! Answer these questions:
How much RAM would we need?
How much CPU would we need?
How many server instances?
Approximate monthly server costs?
What language would you do it in?
Is multi-threading supported in this language and if so, how does it work?
No point bidding and not telling me what you're plan is, so please, no copy&paste replies.
Just be honest with your ideas and answer my questions in full and you'll be more likely to be chosen!