Need a script that will grab events from a number of sites and put them into a common database. Along with events also venues, artists, promoters and other data will need to be gathered and stored/updated in a consistent way.
The script will scrape events along with media and metadata from 3-4 different sites and consolidate them into one database. The data will be merged and dupe checked. Some data can be gathered via API and RSS, other will be scraped from websites. You will need to come up with a flexible data structure to cover multiday events, recurring events etc. The scraper should be configurable for spreading requests out over time to not overload servers etc.
A password protected backend is needed to view and filter logs/statistics as well as view/edit/approve/remove events/data. The backend needs to support an approval queue for merged and imported data. There should be an admin function to add events manually as well.
There should be an API to access all the data.
Automated status emails should be sent to the admin on a configurable address.
A frontend is needed to view and filter the data, with very fast search and realtime updating w [url removed, login to view] or similar.
Not cut and paste bids, please be specific about experience with data scrapers and similar projects.