Zimbra Linux Kernel Tuning To Reduce Swap File Usage

Zimbra Linux Kernel Tuning To Reduce Swap File Usage

Zimbra mailbox servers on Ubuntu 18/20 can benefit from Linux kernel tuning to reduce swap file usage.  This post will show you how.

Note: this article is based in large part on information kindly provided by John Holder of Zimbra a week before his untimely passing.  Over the years John was nothing short of incredibly helpful and generous to us and our customers. His contributions to the Zimbra community are legendary, and he continues to be deeply missed.  Additional information for this article is based on testing we have been performing over the past three years.

To be clear, our testing has been limited to Ubuntu 16, 18 and 20 systems running Zimbra 8.8.12-8.8.15 and Zimbra 9.  The kernel settings suggested below are likely not accurate for Red Hat-based Linux distros, as they use different kernels. For RH-based distros, we recommend testing on a clone of your production system with test workloads that simulate your production workloads.

 

Challenge
We build our hosting farm’s and most of our customers’ mailbox servers with 32GB of RAM and four CPU cores. We have a somewhat firm rule that once the InnoDB databases get to 25GB in size, it’s time to add another mailbox server.  Sometimes this means there can be 15,000 mailboxes on the mailbox server; sometimes less than a thousand mailboxes (due to most mailboxes having lots of folders and being several hundred GB each on average, with “power users” accessing Zimbra simultaneously from multiple devices).  We used to build much larger mailbox servers, but (thanks to John’s influence) have migrated to a “larger number of smaller mailbox servers” approach to multi-server farms.

Starting back in 2019, we noticed that mailbox servers were swapping a lot more than expected; a lot more than we felt they should. In the past six months, on a typical 32GB server, with Java heap memory set to 6GB and the MariaDB InnoDB buffer pool set to ~18GB, we noticed Java would consume an expected ~11GB of memory, and MariaDB would consume an expect ~19GB of memory.  But…

…we also consistently noticed that Java had swapped out about 4GB of memory to the swapfiles, and MariaDB had swapped out ~13GB as well!

What we also noticed from the “free” command was that the Linux kernel was allocating ~16GB for buffers and cache and prioritizing this usage over keeping Java and the InnoDB databases in RAM (mostly).

This article describes what kernel parameters we changed to address this.  We’ll summarize implementing the solutions at the very end, to keep all of our changes in one place.  Before we deal with tuning the Linux kernel, we need to make sure a few other tuning settings are in tip-top shape.

 

Solutions – Swapfiles
In years past, it was recommended to set vm.swappiness = 0, to keep Linux from using swap unless it really had to.  These days, that’s no longer correct.  In modern kernels, vm.swappiness = 1 is the new vm.swappiness = 0.  If you set vm.swappiness = 0 on a modern kernel, we have seen the OOM Killer kill Zimbra processes to make more RAM available.  IOW, vm.swappiness = 0 now means something like “Don’t use swap unless you really have to, because using the OOM Killer isn’t sufficient.”  We filed a bug report with Zimbra as a number of their wikis and documents still reference vm.swappiness = 0 as the recommended setting.

The size of the swapfiles should be set to equal RAM, because according to John Holder, the Java 17 currently shipping with Zimbra (at this writing) will want to try to use swap.  Further, Zimbra’s Java is sensitive to changes in swap.  Zimbra Support has made clear to us and customers that you should never do like “swapoff -a && swapon -a” to clear the swapfiles after making configuration changes.

Lastly as regards swap, it’s a good idea to have two 16GB swap files (on a 32GB RAM system), located on entirely separate, high performance disks.  Modern hypervisors allocate one thread per disk to manage disk I/O, so if you have separate disks (not partitions!) for / (the root partition), /opt, /opt/zimbra/store and /opt/zimbra/backup, you will get a performance boost.

One other thing as regards separate disks and disaster recovery…  We always recommend that if you have two SANS, put /opt/zimbra/backup on one SAN and everything else on the other SAN.  In this way, if you lose the SAN hosting /opt/zimbra/backup, Zimbra will still run and you will have a comfortable window in which to replace/fix that storage frame.  If you lose the other SAN and Zimbra is destroyed, you still have the /opt/zimbra/backup disk on the other SAN that you can use for disaster recovery.

 

Solutions – MariaDB
As regards disk IOPs, MariaDB has a default setting that limits disk IOPS to 500. If you are running old-style spinning disks in a RAID array, this is probably sensible.  But if you are running a 10GB or faster fibre-channel SSD/Flash SAN, you can increase this setting.

The benefit will be that, even if MariaDB continues to use the swap file some, such usage will consume more IOPS i.e. be much faster.

We recommend limiting MariaDB’s IOPs to about 2/3rds of available IOPS.  On Amazon Web Services, gp3 disks have a default of 3,000 IOPs, so we limit MariaDB to 2,000 IOPs, which is 4x the default.

As regards MariaDB memory usage, once Zimbra is installed, patches and upgrades won’t make any changes to my.cnf.  So, if you installed Zimbra originally on a virtual server with less than 32GB RAM, or if your InnoDB databases have grown, you’ll want to increase the innodb_buffer_pool_size parameter in my.cnf.

Most MySQL/MariaDB tuning guides we have seen recommend setting innodb_buffer_pool_size to 110% to 125% of the total size of your InnoDB databases.  In practice, we have found that setting this to ~80% of the total size of your InnoDB databases, combined with increasing MariaDB’s maximum disk IOPs, works well and conserves some RAM at the expense of swapfile usage.  If you are stuck with slow disks, then to preserve performance you may want to stick with the 110% to 125% recommendation, and just add more mailbox servers when your InnDB databases exceed like 20GB on a 32GB RAM system.

Either way, running the mysqltuner.pl tool is an easy way if you are not comfortable with MariaDB’s CLI to get the total size of all of Zimbra’s InnoDB database sizes.

 

Solution – Java Heap Memory
Java includes a “Garbage Collector” engine, which is responsible for defragging the Java Heap memory space, as well as deleting objects discarded by Java.  The garbage collector is a feature which frees developers from having to code lower-level memory cleanup actions, helping Java attain that “write once; run anywhere” claim.

The Zimbra installer sets the Java heap memory, and upgrades and patches don’t change this, so the same issues described above with MariaDB’s InnoDB buffer pool can impact the size of the Java Heap.

The garbage collector normally works behind the scenes, but if things get really bad, the garbage collector will pause mailboxd, to give itself breathing room to do its job.  Garbage collector activities are logged in ~/log/zmmailboxd.out.

Basically, if the garbage collector is running every few seconds, you are OK.  If the garbage collector is running like every second, and the pause time is greater than like 100ms, it’s likely time to to increase the Java heap size.  In most cases, 6GB-8GB is adequate. For mailbox servers supporting mostly IMAP clients with large mailboxes and lots of folders, sometimes you may need to increase the Java Heap to 10GB.  Zimbra Support says that this is unusual, even for heavy workloads.  (Fun Fact: To defend against a DDoS attack we once had to increase the Java Heap memory temporarily to 26GB to keep Zimbra running…)

Although there will be lots of other stuff in zmmailboxd.out, look for lines like:

[313905.682s][info][gc] GC(172536) Pause Young (Normal) (G1 Evacuation Pause) 4770M->1512M(6144M) 40.579ms
[313913.513s][info][gc] GC(172537) Pause Young (Normal) (G1 Evacuation Pause) 4812M->1473M(6144M) 35.362ms
[313921.516s][info][gc] GC(172538) Pause Young (Normal) (G1 Evacuation Pause) 4913M->1477M(6144M) 29.879ms

The timestamps on the left show that these cleanups are running about 8 seconds apart (good!). Moving towards the right, we see that the garbage collector (programmed to fire off when the heap memory gets ~75% full) started processing when the heap had 4.7GB to 4.9GB of objects in it.  After discarding unused objects and defragging the heap (which process took between 29ms and 40ms), Java heap consumption was down to about 1.5GB, with the maximum heap size set to 6GB.

If zmmailboxd.out showed that garbage collection operations were happening a few times every few seconds, and/or that after processing current used objects were more like 4GB instead of 1.5GB, and/or that processing time was greater than 100ms, then I’d increase the java heap size, which is set via Zimbra’s localconfig.  You can see the current setting like this:

zimbra@mailbox2:~$ zmlocalconfig mailboxd_java_heap_size
mailboxd_java_heap_size = 6144
zimbra@mailbox2:~$

 

Solutions – Kernel Tuning
Now that we’ve made sure that everything else is set appropriately, assuming we still see the kernel doing more swapping than we want/expect, we can tackle changing some of the kernel tuning parameters, by adding our changes to the bottom of /etc/sysctl.conf.

Likely you have already added a setting to change swappiness here. We suggest adding two more variables:

# Reduce Swappiness and Other Tuning
vm.swappiness = 1
vm.min_free_kbytes = 524288
vm.vfs_cache_pressure=200
# To effect changes after updating this file, execute "sysctl -p" as root

You can Google each of these kernel parameters on your own to dive as deeply down those rabbit holes as you wish, but in overly simplified language, what we have found through a few years of testing, trial and error is that setting vm.vfs_cache_pressure to double the default setting of 100 reduces the use of kernel buffers/cache (as reported by top), thus leaving more memory readily available and lessening swap file usage.  And yes, we know that the kernel can/does reclaim a good portion of the memory used for buffers/cache if needed, but inrcreasing the cache pressure we have found, and others too have reported with different applications, tends to reduce swap file usage.

As regards vm.min_free_kbytes, the default values are very different, depending upon from where you get your Ubuntu distro. AWS sets this very low, but Zimbra performance we have found benefits from having ~500MB free memory be really freely available.

 

Summary Implementation Steps

  1. Use at least two swap files, each on different disks (not partitions).
    1. Total swap should equal RAM.
  2. As a baseline, we like 32GB RAM mailbox servers with 4 CPU cores.
  3. Move mailboxes to a new/different mailbox server when the InnoDB databases total 25GB in size.  With spinning disks, consider reducing this threshold to 20GB.
  4. To find out how big your InnoDB databases total, become root, get mysqltuner.pl, like so (and then run it as the Zimbra user):
    1. wget mysqltuner.pl && mv index.html /opt/zimbra/mysqltuner.pl && chmod +x /opt/zimbra/mysqltuner.pl && chown zimbra.zimbra /opt/zimbra/mysqltuner.pl && su – zimbra
    2. ./mysqltuner.pl
    3. Important Note: Make no other changes mysqltuner may recommend!!!
  5. Change ~/conf/my.cnf to adjust innodb_buffer_pool_size to not less than 80% of the actual total sizes of your InnoDB databases. As you add more mailboxes, or users’ mailboxes grow, you’ll likely need to update this setting:
    1. # innodb_buffer_pool_size = 9954159820
      # Updated 18 April 2022
      # innodb_buffer_pool_size = 12288M
      # Updated 26 April 2022
      # innodb_buffer_pool_size = 14336M
      # Updated 21 July 2022
      # innodb_buffer_pool_size = 15360M
      # Updated 7 December 2022
      innodb_buffer_pool_size = 17408M
  6. Add a line to ~/conf/my.cnf in the [mysqld] section to increase the maximum IOPs MariaDB may use to no more than 2/3rds of your disks’ capacity. For AWS gp3 disks with 3,000 IOPS total capacity, you can use:
    1. innodb_io_capacity = 2000
  7. Carefully analyze Java’s Garbage Collector’s usage from ~/log/zmmailboxd.out and increase (only if needed) the Java Heap size to not more than 10GB; typically 6GB to 8GB, like so:
    1. zmlocalconfig -e mailboxd_java_heap_size=6144
  8. Add the lines in the kernel tuning box just above this summary section to the end of /etc/sysctl.conf.
  9. To implement the kernel tuning  changes, you can reboot the server, but you could also do starting as root:
    1. sysctl -p
    2. su – zimbra
    3. zmmailboxdctl restart

 

Conclusions
Changing Kernel Tuning Parameters can help reduce swap file usage and increase Zimbra performance, but should only be done after Zimbra is properly tuned.

We’ve tried to provide a basic guide above to essential Zimbra tuning considerations, with updates to selected kernel tuning parameters being the icing on the cake.

As always, YMMV.  It’s always best to test in a separate lab environment first.  A wise system administrator once told me:

“Mark… everyone has a lab environment.  Its just that for some people, their lab environment is also their production environment…”

If you’d like help with your Zimbra performance tuning challenges, please start the conversation by filling out this form!

 

Hope that helps,
L. Mark Stone
Mission Critical Email LLC
6 February 2023

The information provided in this blog is intended for informational and educational purposes only. The views expressed herein are those of Mr. Stone personally. The contents of this site are not intended as advice for any purpose and are subject to change without notice. Mission Critical Email makes no warranties of any kind regarding the accuracy or completeness of any information on this site, and we make no representations regarding whether such information is up-to-date or applicable to any particular situation. All copyrights are reserved by Mr. Stone. Any portion of the material on this site may be used for personal or educational purposes provided appropriate attribution is given to Mr. Stone and this blog.