I’m experimenting with importing a set of .eml files (each .eml files containing a raw email message) into a SQL database.
Producing the .eml files from an IMAP account is easy. I experimented with different tools
- Gmail Backup – can be used to backup a GMail account (via IMAP).
- IMAPSize – can be used to backup any IMAP account.
- Thunderbird in combination with the ImportExportTools add-on.
All of these tools are ideal for mail backup and to their job well.
Now, suppose we have a bunch of .eml files produced by one of the above mentioned backup tools. What can we do with them?
First of all, the tools that produce .eml backups, can also restore your e-mail by uploading the backed-up messages back to your IMAP account. But maybe you don’t want to store your mail backup in separate files, or encrypt the raw messages, or …fill in specific wish here... In that case you’ll want to parse the raw .eml files in your favorite language (I’ll assume here that that is C# 😉 ) so that you can do something useful with the different parts (subject, body, attachments, etc.) of the messages.
Although the .EML file format itself is not complicated (the content of an .eml file is the literal content of an e-mail message as received by a SMTP server), interpreting the content, especially MIME content, is not that easy.
Fortunately, Microsoft already implemented mail parsing for us. The method below shows how to parse a single .eml file:
CDO.Message msg = new CDO.MessageClass();
ADODB.Stream stream = new ADODB.StreamClass();
stream.Open(Type.Missing, ADODB.ConnectModeEnum.adModeUnknown, ADODB.StreamOpenOptionsEnum.adOpenStreamUnspecified, String.Empty, String.Empty);
(original code from this stackoverflow topic).
You’ll need to reference the Microsoft CDO for Windows 2000 Library, which can be found on the ‘COM’ tab in the Visual Studio ‘Add reference’ dialog. The CDO library is included in IIS.
- The other way around: save a message to a file