Main application workflow:
1. User signs up and selects type of account (imap, gmail or ms exchange)
2. Application will require user to give access to mailbox (for imap -> credentials, gmail -> oauth, ms exchange -> ews)
3. After validating the users information it will add their mailbox to a jobqueue
4. The applications server(s) will select a free worker to process the queue
5. The job will "sync" the users mailbox with the application.
6. Each (unique) email has to be saved (as a raw eml file) to Amazon S3
7. Each (unique) email has to get their metadata extracted and inserted in Solr (subject, date, attachment (names / types), addresses (to, from, cc, bcc etc.) and messagebody (text)).
8. Deduplication (of emails) is needed to save storage / resources (possibly on attachment level if possible).
1. Users create account and can afterwards login
2. Users should be able to search their emails (->Solr)
3. Users should be able to view their emails which they have searched (Solr / S3 raw eml files?)
4. Users should be able to download their attachments
5. Users should be able to restore emails, folders or a whole account
1. Work with at least those (3) main email protocols / providers (imap, gmail, ms exchange), eventually more will be connected
2. Highly scalable, should be able to easily grow up to tens of thousands of users. Ofcourse extra server resources will be needed but it has to work as efficient as possible.
3. Highly reliable, as this is a backup solution, this should be highly reliable -> perfect error handling and carefully check every steps (especially in the job / queue part)
There are some companies who already done some work which could be compared with this (private) application.
- [url removed, login to view] (doing a great job in "syncing" imap accounts, bad points is that they dont support other protocols (exchange), messagebody in their api and unknown status on deduplication. Still a good tip to look at as they define a nice structure of Accounts, sources, folders, emails, contacts and files)
- [url removed, login to view] (pretty nice and very similar in its workflow, though they lack deduplication and their webapp is kind of horrible to use (slow and buggy).
Please respond with price, timeframe and at least 1 reference of an application (where u needed scalability and / or realiability).