The Linux Page

Filtering a mailbox with formail and procmail

I had to delete many emails that ended up in my archives.

When I received an email from postfix, I have a virtual entry that looks like this:

alexis@example.com    archive@archive.example.comarchive@archive.example.com

example.com is the main mail server receiving all the external mail.

archive.example.com is a separate system where mail gets sent for archival.

The archival happens in /var/mail/archive and when the file grows over a certain size, it gets compressed and saved in a different location.

The files are therefore mailbox like. Emails are written one after the other with a From to start each email.

The problem I had was that I was receiving emails from a provider which asked for said emails to be deleted. Easy enough in my mail system (where it was quickly being deleted anyway). However, the archive were not going to just "lose" those emails.

To do that, I looked at a mail filter and found about such on Stackoverflow. They mentioned procmail. That can be used to decide where an email should end up. This is done by defining a set of matches and corresponding actions.

First, here is the script I ended up with. It goes through my archive files and removes the unwanted email by not inclusing them in the output. The rest is output as is.

# Script used to filter out emails
#
# Note that procmail uses the ~/.procmailrc file

for f in archive*.txt.gz
do
        b=${f}.bak
        if test ! -f ${b}
        then
                cp ${f} ${b}
        fi

        # the output of procmail is cumulative so better make sure
        # that the file is deleted before starting again
        rm -f filtered.txt

        gunzip -c ${f} >archive.txt
        formail -s procmail <archive.txt
        gzip -c filtered.txt >${f}
done

The -s command line option of formail is used to pass procmail as the command to run against the stripped emails as extracted by formail.

# From ~/.procmailrc

:0 E
* ^From:.*unwanted@example.com
/dev/null

:0
/home/someuser/filtered.txt

The file includes two entries. An entry starts with a line which starts with :0.

The first entry uses the 'E' flag. This means, if that entry is a match, then End the processing.

The line starting with an asterisk is a regular expression pattern. It has to match a line in the header or the body (check the manual for details man procmail). In my case, I had a specific email address found in the From: header field. Note that in most cases, my mail server transforms the first line From email with envelops, making it really complicated to test with a regular expression. The From: is, however, not transformed so much easier to test, in general.

Finally, an entry ends with at least one action, which by default is a destination file or a directory name. If you use a directory, then each email will be saved as a separate file (which takes more space since a file is bound to 4kb sectors and adds more entries to the directory). In my case, I wanted to filter those out so I could simply send all of that to /dev/null.

The second entry has no filter, meaning that all emails that reach it are saved in the specified output file. Just make sure that the file where it gets saved is not the same as the input file. In my case, I wanted the output to also be in the standard mailbox format, so I specified a filename. As mentioned above, the output could be a folder in which case each email would be saved in a separate file.