We have a number of mailservers running postfix and amavisd-new. Each email that comes in generates anywhere from x to x lines of logging data (more if a destination relay is down and we have to periodically retry relaying the email.
We need to support ad-hoc searching of coalesced data, efficient generation of reports on a per-domain and per-user basis, and storage of historical patterns (ie email volume on a per-user and per-domain basis).
We need to gather the logs from the various servers to a central location using something like:
The data should be coalesced and stored into a database - we probably don't want every single line logged, and we need to relate log events that happen over a timespan, so there's some sort of coalescing or log consolidation that has to be done. We're currently considering MongoDB with the Toku storage engine, but are open to considering other options. We also need to use somethings like statsd + graphite for easy graphing of things like email traffic, types of messages, on a per-domain and per-user basis.
We then need to be able to search for messages by sender, recipient, date, subject, message-id, etc.
We would provide some logging data to work with. We need someone to configure the logging software to properly parse the data, store it in a database, and then provide us with a number of "ready to go" queries so that we can then integrate this into a tool for our users so that they can view their email logs, search for information, and view near-real-time and historical data about their email usage patterns.
We will give great preference to someone who has already worked on large-scale logging systems. We currently generate about 40million lines of logging data a day and expect it to grow substantially.