:: Performance Tuning Challenges ::
For many years, Zimbra’s Performance Tuning Guidelines for Large Deployments wiki page has been the defacto standard reference for tweaking large and medium sized Zimbra systems alike. There are four problems with that wiki page however…
The first problem is that the 8.8 series already incorporates pretty much all of the Zimbra-specific tweaks listed in the old Performance Tuning Guidelines wiki. That may explain why the wiki hasn’t been updated in nearly a year.
The second problem is that much of the data there applies to older versions of Zimbra, almost all of which at this writing are past end-of-support, and which have different architectures than 8.8, so applying some of those wiki-recommended changes either won’t work (Tomcat hasn’t been in Zimbra since version 4.5…) or may bork your Zimbra system.
The third problem with the Performance Tuning Guidelines wiki is that a lot of the operating system specifics there don’t really apply if you are hosting in modern virtualized environments (XenServer, VMware, KVM, AWS, Azure, Oracle Cloud, etc.).
The fourth problem is that Cloud- and virtualized environment-specific tweaks are missing from the wiki, again, because the wiki page really hasn’t been given a wholesale update in a long time.
So what I’m going to do here is simplify everything down for you to just a few of the most important and impactful performance tweaks. This is 80:20 rule stuff; the tweaks I’m suggesting are perhaps 20% of what you could do, but that will get you 80% of the added performance available.
By way of background, I’ve been architecting, deploying and hosting Zimbra systems for more than a dozen years, starting back in the Zimbra 4.0 days. The largest Zimbra multi-server I ever architected, deployed and hosted domiciled more than 20K email domains. The largest single Zimbra server I ever was called to help with contained more than 6,000 active mailboxes. I know of many Zimbra mailbox servers with more than 10,000 active mailboxes each.
Whether single-server or multi-server, the bottlenecks to Zimbra performance all happen on the mailbox stores (again, more than 80% of the time, but not always of course…).
All email servers hammer disk I/O, so when you compromise on disks, you will come to regret it at some point. Further, CPU and RAM are important because Zimbra’s MariaDB, Lucene index engine, and all the other housekeeping bits baked in to Zimbra will keep your server “busy” even when there’s not a lot of email flow. Lastly, as compared to even just a few years ago, these days each single user will often have several devices chatting with Zimbra simultaneously: The work laptop, a mobile device, a home iMac or similar with the Zimbra web UI open, and maybe even a tablet. The old rules therefore for how many mailboxes you can fit on a Zimbra server no longer really apply.
Further, IMAP can bring a Zimbra server (any IMAP server actually) to its knees in certain circumstances. This is because the IMAP standard is like our tax laws — open to interpretation. Apple Mail, for OCD users with hundreds of nested folders, will hammer IMAP because of the way Apple interpreted the IMAP specs. Zimbra’s breakout imapd service is not quite GA as of this writing, but when it is, that will greatly help mailbox server scalability.
Lastly, virtualized environments have their own overhead. XenServer and Amazon Web Services (“AWS” uses a customized version of Xen) and other hypervisors often allocate one thread (via tapdisk in XenServer’s Dom0) per virtual disk. So while it’s really convenient to create a single big disk for your Zimbra server, that’s not the best way to go.
Zimbra 8.8 Performance Tip #1: Initial Build Instance and Disk Optimization
On AWS, an m4.xlarge instance is a great, cost-effective size for most mailbox servers. Four cores and 16GB of RAM will easily handle a few thousand active mailboxes in almost all but the most severe corner cases. But, you need multiple disks or performance will suffer. I suggest the following for a server with a 2TB mailstore on AWS:
- 50GB EBS volume for root (will be built on AWS from a snapshot, so best to keep it small)
- 500GB EBS volume for /opt
- 2TB EBS volume for /opt/zimbra/backup
- 500GB EBS volume for /opt/zimbra/store
- S3 volume for Secondary HSM volume
There are two features within Zimbra 8.8 Network Edition that really impact disk consumption in a positive way. First, experienced Zimbra users will recall that 8.7 and earlier versions of Zimbra would consume for backups typically 3.5x the amount of storage consumed by /opt/zimbra/store. In 8.8, the new BackupNG module incorporates compression and deduplication to make a 30-day set of backups about 70% of the mailstore size. That’s right… 30 days of backups are now smaller than the original emails themselves.
The second way to save on disk space (an disk space costs) is to leverage an S3 storage repository for Secondary mail volumes (Hierarchical Storage Management), using the new HSM-NG module in Zimbra 8.8. HSM is a feature that enables you to move mail blobs and other objects older than a certain number of days off to a different storage frame. Historically, ISPs and hosting providers used this to move older mail blobs off of expensive, high-performance primary SSD SAN storage to older, SATA spinning disk storage. Older mail blobs involve only write operations (after the initial move), so this was, and remains, a great way to leverage lower cost storage without compromising performance. On Amazon, S3 storage is incredibly inexpensive already, so using this feature when hosting on AWS (or on-premises if you have an S3-capable storage appliance) is the reason why, in our example above, we have a 2TB mailstore but only 500GB of local disk. The HSM-NG module also supports S3-IA (Infrequent Access) for emails with attachments, lowering your storage costs even further.
Zimbra 8.8 Performance Tip #2: MariaDB Tuning
Zimbra uses MariaDB’s InnoDB database engine for mapping the locations of mail blobs on disk to each user’s folder tree (among other things). InnoDB tries to put the entire database in RAM, subject to the innodb_buffer_pool size limit in my.cnf. The innodb_buffer_pool parameter gets set the first time you run the Zimbra installer, and it’s set as a percentage of RAM at the time of installation.
If you installed Zimbra on a small instance and then changed the instance size, the InnoDB buffer pool will have been set too small. As you increase the number of mail blobs on disk, the InnoDB database size will increase. The InnoDB engine needs, ideally, enough RAM (innodb_buffer_pool_size) to hold the entire database, plus 20%-25% more for temporary tables and other housekeeping tasks.
When your InnoDB database has grown to, say 7GB, but your innodb_buffer_pool_size is set to 5GB, MariaDB is going to hammer your disks; continuously swapping out portions of your database from RAM to disk, and then back in to RAM when that portion of the database tables are requested. Your users will notice the performance degradation, trust me.
The solution? Run a utility like mysqltuner.pl to compare the size of the buffer pool against the database, and increase the buffer pool accordingly (database + 25% minimum). Note that you may need to change your instance size if you are going to claw back more than 2GB – 3GB of RAM for MariaDB’s InnoDB buffer pool.
When you have Zimbra server that’s been running for a year or so, or which has grown in mailbox numbers or size, and you are seeing performance drop, this is the first thing to check.
Zimbra 8.8 Performance Tip #3: Configure a RAM Disk for Amavis
Amavis, when it calls SpamAssassin and ClamAV, is very disk intensive. It’s not unusual for emails with large attachments to take 10 seconds to pass through analysis. If you allow larger attachments (like 100MB or so), you can see emails with those large attachments take up to 30 seconds to be processed. Sure, there are ten amavisd worker processes running, but if your server has to deal with an email blast performance will suffer — unless you configure a RAM disk for Amavis’s temporary directory. Once you do so, emails that took 10 seconds to process will now be processed typically in under a second.
Most Linux system administrators are wary of RAM disks due to the risk of data loss during a shutdown/reboot event (intentional or unintentional). This is simply not a risk at all in Zimbra due to the way Postfix and Amavis collaborate. Here’s why there’s zero risk:
Postfix gets an email that it wants to hand off to Amavis for screening. It does so on port 10024, but when it does so, Amavis plays a little game with Postfix: Amavis doesn’t acknowledge that it actually received the email right away, so Postfix keeps a copy until Amavis confirms reception.
Sneaky bugger that Amavis is, it actually screens the whole email and attachments, and if clean, gives the email back to Postfix on port 10025. Postfix has a multiple personality disorder, so the Postfix that gave the unscreened email to Amavis on port 10024 has no clue that Amavis is handing the deemed-clean email back to the other Postfix “personality” on port 10025. Once Amavis gets the nod from Postfix on port 10025 that it received the email OK (Postfix then delivers it to mailboxd so the user can actually see it in their Inbox), only then does Amavis tell Postfix on port 10024 that it finally received the email to be processed.
In other words, the system could crash while Amavis is still processing a number of emails, but upon reboot Postfix still has the original emails and will resubmit them to Amavis for processing.
Unfortunately, the configuration of a RAM disk varies both by Linux distribution and the version number, so I’ll leave it to you to sort that out. Having said that, Ralf Hildebrant’s Postfix Shrine book, portions of which are now online, has some guidance.
Zimbra 8.8 Performance Tip #4: Relax… You Are Done!
Yes, Zimbra 8.8 has incorporated so much of the tips in the Performance Tuning Guidelines for Large Deployments wiki that there really are only three things you need to do to maximize performance for most Zimbra servers.
Yet one more reason to get yourself upgraded to 8.8 sooner rather than later!
Hope that helps,
L. Mark Stone
Mission Critical Email
21 April 2018
The information provided in this blog is intended for informational and educational purposes only. The views expressed herein are those of Mr. Stone personally. The contents of this site are not intended as advice for any purpose and are subject to change without notice. Mission Critical Email makes no warranties of any kind regarding the accuracy or completeness of any information on this site, and we make no representations regarding whether such information is up-to-date or applicable to any particular situation. All copyrights are reserved by Mr. Stone. Any portion of the material on this site may be used for personal or educational purposes provided appropriate attribution is given to Mr. Stone and this blog.