We require someone to manipulate RSS and MySQL data for use on an automated wordpress blog. The wordpress side of things is already setup and working. However, the current feeds we are importing are not high enough quality for our needs.
We are currently using a feed aggregator to pull in numerous feeds and then extract an xml feed based on keywords we set. The quality of the extracted information in the xml output can be quite low, with very short description text, HTML links in it etc.
What we need to do is improve this with only articles containing no HTML, only over a certain number of words in the article etc. We would also like to be able to filter certain things out, and maybe pull in a picture from flickr that relates to the article etc.
We are open to suggestions on the best way to achieve this, so please drop us a line if you have any great ideas.