We have a client who needs a custom web crawler to be developed.
The needs of this project are as follows:
- Must be a WordPress plugin
- Use WP's http API
- Create custom post types from data gathered
- Set custom taxonomy terms in the custom post type
- Save certain data patterns as custom post meta fields
- Revist previous created post types once a week for XX amount of time to check for changes
Plugin needs to be API based, meaning that you have developed the core plugin and then develop an extension that is designed to crawl one specific site. The two only interact with each other via action hooks and filters. Standard stuff for a professional developer; were only stating it so everyone is on the same page.
This is not an auto-blog concept as plugin will need to crawl sites with no feeds. This will be an internal tool for the client. Crawled data will in no way be redistributed or have copyrights violated. This plugin acts more like a custom internet archive time machine.
We will walk you through exactly what the client is looking to obtain from each site.