Zimbra Fail2Ban Best Practices

This post contains our current Zimbra Fail2Ban Best Practices and will show you how to configure Fail2Ban optimally on your Zimbra servers. Bad actors, overly aggressive marketing companies and others clog up our Inboxes with unwanted emails, and increase the load on our Zimbra servers. While Fail2Ban will effectively block spammers, it’s not enough on its own. Please consult our Anti-Spam Best Practices post; the techniques in that post, together with the techniques in this post, will help keep your users’ Inboxes as free of spam as is practicable.

Level Set: What Is Fail2Ban?
Fail2Ban is a longstanding python application that scans log files for user-defined regular expressions containing IP addresses, and when a regular expression is found in sufficient numbers over a user-defined time period, performs a user-defined action–typically a ban of the offending IP address. The regular expressions sought are documented in one or more “filter” files, and the action screening criteria and actual action to take are described in “jail” files. Fail2Ban is distributed with pretty much all Linux operating systems, and is maintained via GitHub. Fail2Ban runs as an operating system service and maintains a database of observed and banned IP addresses and their metadata found via filters, to see if/when an IP meets the criteria for banning (and unbanning automatically) as described in the companion jail configuration file.

Implementing Zimbra Fail2Ban Best Practices
During the summer of 2022, I and others collaborated with Barry deGraaff at Zimbra on his excellent Fail2Ban wiki article. Please use the steps in that article to deploy Fail2Ban in your environment as a first step. This post presumes that you have already deployed Fail2Ban using that article; the additional steps here are articulated as add-ons to a base installation of Fail2Ban, using “route” as the ban action.

With the better part of a year having passed since that article was first published (we’ve been using Fail2Ban for many years prior, just to be clear), we’ve learned a few more things, and now deploy additional regular expressions in the filter files from the Fail2Ban wiki. We also are much “stricter” in that, as compared to the wiki article, we ban IPs after a much fewer number of failed retries, and when we ban an IP, we ban the IP for more than 11 years. Yes, really!

Each day, we also review Zimbra’s Daily Mail Report email for prospective bad actor activity. Over time, this has caused us to either fine-tune the existing, or create new, regular expressions to use in the filter files. A few times a week, we will manually ban additional IP addresses and save the log file entries. If we see the same pattern continuing after a few weeks, we either fine-tune the existing, or create new, regular expressions to automate dealing with that bad behavior.

While we are reviewing the Daily Mail Report, we also make note of badly behaving domains, and periodically manually ban domains using Postfix’s PCRE. We’ve documented how to do that by leveraging former Zimbra employee Rick King’s excellent article on how to prevent nested address spoofing. You can read our blog post on this at this link here.

At the end of the day, if you take a “set it and forget it” approach to implementing Zimbra Fail2Ban best practices, you’ll get pretty terrific results. But if you make periodic adjustments based on what you see in each day’s Daily Mail Report, you (and your end users!) will enjoy spectacular results.

Regardless, by blocking IPs before they even have a chance to open connections to Zimbra, you will significantly lighten the load on your Zimbra MTA servers. Recall that processing emails through Amavis is a very expensive operation in terms of resource consumption, so anything we can do to eliminate garbage emails from getting that far is helpful. Again, you can find actionable information in our Anti-Spam Best Practices post, to deal with email of unknown quality that gets past Fail2Ban and the Postfix PCRE filtering.

Zimbra Fail2Ban Best Practices Findings and Results
Despite our incredibly long ban time, we have yet to block a legitimate email sending server. About once a month or so, we find we do block one of our customer’s end users (we host more than 100 domains presently), but most often this is because the impacted user has a new computer or mobile device, or is not using one of their regular devices, and enters the incorrect password… repeatedly. Given the fullness of time, our customers have been generally successful at training their users to stop trying to log in after two failed login attempts, and reach out instead to the Help Desk for a password reset.

With that same incredibly long ban time, we find we are adding between 500 and 1,500 new networks each month to Fail2Ban on average. As of this writing, we are banning more than 15,000 networks, totaling more than 300K IP addresses. (Fail2Ban will automagically block only single IP addresses, i.e. /32 networks. But when we manually ban IPs, based on our research, we will sometimes add whole networks to the Fail2Ban database; networks as large as a /17 or 32,768 IP addresses.)

As re performance, I can’t speak to ipset, but through testing we have done, we have found that Ubuntu’s ufw and Red Hat’s firewalld do not scale efficiently to handling banning the 15K+ networks that we are banning. Using Linux routing table entries to perform the ban is much more efficient and much more scalable. If you are interested in the internals of Linux route, you can check out for example this post.

Recall that the route table is recreated from scratch on each reboot, so when a Zimbra server is booted up, the Fail2Ban service reads its database, and writes the entries therein into the Linux routing table. On shutdown or reboot, the Fail2Ban service removes all of the entries it added to the route table. On Ubuntu 18 and above, this happens really fast, like within a minute or two. The older version of Fail2Ban that ships with Ubuntu 16 is much slower at adding and removing route table entries, and so reboots take several minutes longer with our sized route table.

Zimbra Fail2Ban Best Practices Implementation Steps
After performing a basic Fail2Ban implementation, as described in Barry’s excellent wiki, we make the following updates:

To the /etc/fail2ban/filter.d/zimbra-smtp file, add the regular expressions so that the file looks like this:

root@zmta01:~# cat /etc/fail2ban/filter.d/zimbra-smtp 

[Definition]
failregex = postfix\/submission\/smtpd\[\d+\]: warning: .*\[<HOST>\]: SASL \w+ authentication failed: authentication failure$
            postfix\/smtps\/smtpd\[\d+\]: warning: .*\[<HOST>\]: SASL \w+ authentication failed: authentication failure$
            postfix\/submission\/smtpd\[\d+\]: warning: non-SMTP command from \w+\[<HOST>\]:.*$
            postfix\/smtpd\[\d+\].*amazonaws\.com\[<HOST>\]:.*@yahoo\.com>.*helo=<\[127\.0\.0\.1\]>$
            postfix\/smtpd\[\d+\].*amazonaws\.com\[<HOST>\]:.*@yahoo\.com>.*helo=<127\.0\.0\.1>$
            postfix\/smtpd\[\d+\].*unknown\[<HOST>\]:.*helo=<\[192\.168.*\]>$
            postfix\/smtpd\[\d+\].*unknown\[<HOST>\]:.*helo=<192\.168.*>$

ignoreregex =
root@zmta01:~#

The first extra line we added is because we see IP addresses periodically trying to execute things like http GET or http POST commands on the TCP 25, 587 and 465 ports. No legitimate sending server would do that, so we ban those IPs. Sometimes those tests are part of a security scan. We welcome customers vetting the security of our infrastructure by prior notice, but in addition to Fail2Ban at the Zimbra server level, we also have other protections higher up the network chain that block bad behavior.

The remaining lines look for very specific Use Cases.

The next two lines look for a sending server with an Amazon Web Services IP address, where the email From: address is a spoofed yahoo.com email address, and where the AWS instance is configured to HELO with 127.0.0.1. We’ve reported these to AWS several times; they have replied that they were in touch with the company that does this, that AWS considers the activity to be OK, and that their customer offered if we would kindly provide our mail servers’ IPs to them that they would stop this “email address verification” activity by excluding our IP addresses (we refused). We don’t understand how AWS can consider this activity to be legitimate, we would think Yahoo would not take kindly to their users’ email addresses and Yahoo’s own domain being spoofed for the purpose of “email address verification”, so we block these IPs. BTW, we host on AWS, and have found their security postures generally otherwise to be first rate, so we really don’t understand why AWS considers email address spoofing to be legitimate activity…

The last two lines look for sending servers with no PTR record in public DNS and where the sending server sends a HELO with an IP address from one of the RFC 1918 private networks. No legitimate sending server would do this. While we don’t accept emails from sending servers without a PTR record (regardless of their HELO) in any event, the log file entries from this bad actor led us to ban their IP addresses, rather than let them continue to try to talk to our servers.

You could if you wish duplicate the last two lines to cover all of the RFC 1918 networks. If we find activity in our log files that warrant this, we will add such regular expressions to this filter file at a later date.

Next, we need to scour nginx.log for issues. Thanks to Jim Dunphy for organizing this approach. Let’s create a new file called /etc/fail2ban/filter.d/zimbra-nginx.conf which should contain the following:

[Definition]
failregex = client\s+sent\s+plain\s+HTTP\s+request\s+to\s+HTTPS\s+port\s+while\s+reading\s+client\s+request\s+headers,\s+client:\s+<HOST>,\s+server
client\s+sent\s+invalid\s+method\s+while\s+reading\s+client\s+request\s+line,\s+client:\s+<HOST>,\s+server

ignoreregex =

To the /etc/fail2ban/jail.d/zimbra.local file, we have adjusted the values of several parameters so that the file looks like this:

root@zmta01:~# cat /etc/fail2ban/jail.d/zimbra.local

[zimbra-smtp]
enabled = true
filter = zimbra-smtp
port = 25,465,587
logpath = /var/log/zimbra.log
maxretry = 2
findtime = 86400
bantime = 360000000
action = route

[zimbra-mail]
enabled = true
filter = zimbra-mail
port = 80,443
logpath = /opt/zimbra/log/mailbox.log
maxretry = 5
findtime = 86400
bantime = 360000000
action = route
root@zmta01:~#

[zimbra-nginx]
enabled = true
filter = zimbra-nginx
port = 80,443
logpath = /opt/zimbra/log/nginx.log
maxretry = 1
findtime = 86400
bantime = 360000000
action = route

As you can see above, the above long ban time translates to 11.42 years. You may want to increase the maxretry parameter when you first introduce Fail2Ban into your organization, to give users some runway on entering bad credentials, and then reduce the maxretry value down over time, after notifying users of the impending tightening up. We are are essentially banning IP addresses after two failed SMTP-Auth login attempts, and after 3 failed web client login attempts. (Yes, 5 maxretry equals three failed web client login attempts because the zimbra-mail filter in the Zimbra Fail2Ban wiki article results in two hits for each login attempt.)

We use a maxretry of 1 for the nginx jail, because that’s what we have consistently observed in our log files. No need to start with a higher value here.

After you have created all of these files, you can run a check to see what Fail2Ban, if it had been running already, would have found in your log files. Here’s what the check of the nginx filter run looks like:

root@mail2:~# fail2ban-regex /opt/zimbra/log/nginx.log /etc/fail2ban/filter.d/zimbra-nginx --print-all-matched

Running tests
=============

Use failregex filter file : zimbra-nginx, basedir: /etc/fail2ban
Use log file : /opt/zimbra/log/nginx.log
Use encoding : UTF-8


Results
=======

Failregex: 35 total
|- #) [# of hits] regular expression
| 1) [6] client\s+sent\s+plain\s+HTTP\s+request\s+to\s+HTTPS\s+port\s+while\s+reading\s+client\s+request\s+headers,\s+client:\s+<HOST>,\s+server
| 2) [29] client\s+sent\s+invalid\s+method\s+while\s+reading\s+client\s+request\s+line,\s+client:\s+<HOST>,\s+server
`-

Ignoreregex: 0 total

Date template hits:
|- [# of hits] date format
| [43404] {^LN-BEG}ExYear(?P<_sep>[-/.])Month(?P=_sep)Day(?:T| ?)24hour:Minute:Second(?:[.,]Microseconds)?(?:\s*Zone offset)?
`-

Lines: 43404 lines, 0 ignored, 35 matched, 43369 missed
[processed in 1.04 sec]

Following the above output will be the 35 lines that matched the regex. Since we have maxretry set to 1 for the nginx filter, this means we would have banned these 35 IP addresses had Fail2Ban already been running with this updated configuration. You should repeat the above with all three filters, and if you are satisfied with the results you can restart the Fail2Ban service.

Note that restarting Fail2Ban can take a few minutes as it will remove everything it added to the route table as part of service stop, and then put all the banned IPs back into the route table.

For maintenance, we recommend running a “vacuum” against the sqlite database periodically. To root’s crontab we have added:

0 2 * * 0 /usr/bin/sqlite3 /var/lib/fail2ban/fail2ban.sqlite3 'vacuum;'

In a somewhat busy system that had been running Fail2Ban for several months, we ran the vacuum manually and noticed a significant reduction in the database size, like so:

root@zimbra:~# ls -alh /var/lib/fail2ban/fail2ban.sqlite3
-rw------- 1 root root 12M Jul 11 12:37 /var/lib/fail2ban/fail2ban.sqlite3
root@zimbra:~# 
root@zimbra:~# /usr/bin/sqlite3 /var/lib/fail2ban/fail2ban.sqlite3 'vacuum;'
root@zimbra:~# ls -alh /var/lib/fail2ban/fail2ban.sqlite3
-rw------- 1 root root 8.4M Jul 11 12:38 /var/lib/fail2ban/fail2ban.sqlite3
root@zimbra:~#

You can see above that the database size went from 12M to 8.4M after the vacuum. This particular server has more than 22K banned IP addresses:

root@zimbra:~# route -n | wc -l
22152
root@zimbra:~#

Zimbra Fail2Ban Best Practices Conclusions
Implementing Fail2Ban is straightforward and will greatly increase the level of protection provided to your end users from bad actors’ emails. Deploying Fail2Ban provides a net positive performance impact as well. While you can take a “set and forget it” approach to deploying Fail2Ban, using the information in Zimbra’s Daily Mail Report to make incremental improvements to your Fail2Ban filters over time will provide you with additional protections from bad actors’ newly introduced behaviors.

By the way, if you are inexperienced when it comes to regular expressions, or you just want a place to validate your proposed regular expressions, I recommend creating a free ChatGPT account! Just browse to OpenAI’s signup web page and click the “Sign up” button. There’s no need to install any of the ChatGPT client applications; the web interface is straightforward to use.

If you’d like help with your Fail2Ban deployment or other Zimbra security enhancing task, please start the conversation by filling out this form:

Hope that helps,
L. Mark Stone
Mission Critical Email LLC
21 May 2023
Updated 11 July 2023, to add vacuuming of the sqlite database

The information provided in this blog is intended for informational and educational purposes only. The views expressed herein are those of Mr. Stone personally. The contents of this site are not intended as advice for any purpose and are subject to change without notice. Mission Critical Email makes no warranties of any kind regarding the accuracy or completeness of any information on this site, and we make no representations regarding whether such information is up-to-date or applicable to any particular situation. All copyrights are reserved by Mr. Stone. Any portion of the material on this site may be used for personal or educational purposes provided appropriate attribution is given to Mr. Stone and this blog.

Share this: