I need a script that can take a RSS feed from wordpress or blogger, fetch the blog pages and scrape the title, date, content from each post entry (<div>) from the rss feed. Only the text is needed, the html tags and scripts should be filtered out or replaced with user defined tag.
Proof a correct function will be a test on 7 random wordpress blogs and 3 blogger blogs. The correct results should return a page with posts title dates content similar to google reader.
Provider should have some experience with scraping webpages, spidering, or very good with regular expressions. Language used should be PHP. I will consider perl or python if there are peformance advantage and intergration hooks into php is provided. Also please design code to improve performance whenever possible.
Follow up work possible
I can help you and can complete in few hours. I have experience for scrapers (veoh.com, youtube.com, gofish.com, video.google.com, guba.com, metacafe.com and other)