I have a web crawler based on PHP curl.
The crawler fetches content from websites (bloggs) that contains a specific keyword. The crawler works with parallell processes.
The crawler starts with a list of interesting url:s and follow all internal links but no external.
The crawlers execution time shall be limited to less than one hour and it is started as a cron job every hour. But there are bugs causing the crawler to continue in some way and it's stealing a lot of capacity from our server.
We have two programs written in PHP analyzing the result from the crawler and search for keywords and updating the database . We want to combine the two programs and debug and have it working in a better way.
So in this project there is no new development but we need a serious real expert in curl and php who can start immediately and work with this continuously until it's working perfect. I expect to have this finalised within a week. Please only bid on this if you fulfill these criteria.
You will have to download existing programs and a few database tables so you can run and test on your own server so you need a good Internet connection also.