See attached PDF for full SoW
Overview (from the PDF):
Infinit.e is an open source document analysis platform, allowing the fusion, enrichment, analysis, and visualization of many different "unstructured" (think textual) data sources - Web pages, Office documents, social media, emails. Data can be harvested from the Internet, intranets, on file shares, and from databases. More details about the platform are available from our wiki (link in PDF).
As you might expect, one of the key activities is building and testing the harvesting pipeline that turns sources like fileshares and RSS feeds into sets of documents with various different types of processing applied to them. Currently this is done by building a JSON document (link in PDF, this format is changing though), which has some limited GUI support (link in PDF) but remains complex and difficult to use, and is one of the major problem's overall usability.
Two activities are ongoing to simplify the harvest process:
1] The JSON format linked above is being simplified
2\ We are building a Flex-based GUI to generate the JSON document.
An outline of the tool has been developed and is about 33% of the way through the phase 1 functionality. We are looking for someone to complete the Alpha and then use up the available budget adding Beta options. Moving forwards it is possible that there would be a continuing stream of work on this and our other flex based tools (link in PDF), since we are resource constrained and average at (very!) best at Flex development.