Optimizing Zimbra on Amazon Web Services – Applicable to Ubuntu Server 20/22 and Rocky Linux 8/9

Overview

We have been deploying Zimbra on Amazon Web Services (“AWS”) for a number of years. At the Americas Zimbra Partners’ Conference a few years ago, we were asked to present on “AWS Best Hosting Practices”. We’ve updated and refined our best practices since then so it was time for a new blog post!

Hosting Zimbra on AWS brings a unique mix of challenges:

Variable EBS performance (especially with st1/sc1 throughput-optimized volumes).
Mixed workloads — high IOPS, low-latency mailbox traffic, and long sequential archive or backup writes.
A need to balance durability, latency, and throughput across several mount points.

This guide provides AWS-specific kernel and filesystem tuning optimized for Zimbra workloads on instances with 32 GB RAM and ≥4 vCPUs, applicable to Ubuntu Server 20/22 and Rocky Linux 8/9.

The tunings here assume:

All storage is EBS-backed (gp3, gp2, st1, or sc1 as available on m7i and r7i instances for example).
Separate disks are provisioned for the root and boot volumes, as well as for /opt, /opt/zimbra/store, /opt/zimbra/db, /opt/zimbra/index, /opt/zimbra/hsm and /opt/zimbra/backup — recommended to improve I/O concurrency.
No direct-attached NVMe instance storage.
The instance runs production Zimbra services either as a single server, or; is a beefy mailbox server in a multi-server environment. In either case, the server will have 32GB of RAM and four (or more) CPU cores hosting ~1,000 active accounts with larger mailboxes (50GB – 200GB) and users connecting with multiple email clients, including IMAP clients.

Let’s get started!

1. /etc/sysctl.d/99-zimbra-aws.conf

This drop-in file defines AWS-aware kernel tuning for memory management, writeback, and swap control.

It is intended to reduce I/O stalls and swap activity while maintaining steady writeback behavior — crucial when your server mixes fast SSD (gp3) and slower HDD (st1/sc1) EBS volumes.

# /etc/sysctl.d/99-zimbra-aws.conf
# Zimbra Collaboration Server on AWS
# For use on Ubuntu Server 20/22 and Rocky Linux 8/9
#  
# Copyright 2025 Mission Critical Email LLC. All rights reserved.
#
# DISCLAIMER:
# This file is provided "AS IS" without warranty of any kind, either express
# or implied, including but not limited to the implied warranties of
# merchantability and fitness for a particular purpose. Use at your own risk.
# In no event shall Mission Critical Email LLC be liable for any damages
# whatsoever arising out of the use of or inability to use this file.
#
# -------------------------------------------------------------------
# Objectives:
#   - Stable mailbox latency (gp3/gp2 volumes)
#   - Prevent global stalls from slow EBS (st1/sc1)
#   - Minimize swap usage without disabling it
#   - Consistent writeback behavior across mixed-speed disks

############################
# Memory and Swap Behavior
############################

# Reduce swap usage, but leave a safety margin.
# Swappiness = 1 means "avoid swapping unless absolutely necessary".
vm.swappiness = 1

# Prefer reclaiming metadata over file data cache. Default = 100.
# 150 is slightly more aggressive in freeing inode/dentry caches.
vm.vfs_cache_pressure = 150

# Reserve 256 MB of memory for kernel allocations under pressure. Default = 64MB.
# This is intended to prevent low-level OOM conditions in heavy I/O workloads.
vm.min_free_kbytes = 262144

############################
# Writeback Tuning (critical for AWS EBS)
############################

# Start background writeback early — flush dirty pages gradually.
vm.dirty_background_ratio = 2

# Limit total dirty memory to 5% before blocking writers.
# Keeps slow st1/sc1 volumes from stalling the system.
vm.dirty_ratio = 5

# Pages older than 30 seconds are considered for writeback.
vm.dirty_expire_centisecs = 3000

# Kernel flusher threads wake every 5 seconds.
# Ensures consistent I/O instead of bursty writes.
vm.dirty_writeback_centisecs = 500

############################
# Advanced Memory Management
############################

# Increase distance between memory watermarks (min/low/high).
# Default = 10 (0.1% of memory between thresholds)
# 100 = 1% of memory between thresholds (~320MB on 32GB system)
# Effect: Kernel starts reclaiming memory earlier and more gradually,
#         reducing sudden memory pressure and emergency reclaim.
vm.watermark_scale_factor = 100

# Control swap readahead behavior (pages read per swap fault).
# Default = 3 (reads 2^3 = 8 pages = 32KB per fault)
# 0 = read only 1 page (4KB) at a time
# Effect: Reduces "swap thrashing" when processes access swapped memory.
#         Important for Zimbra where MySQL/Java may have old pages in swap
#         that shouldn't trigger large readahead operations.
vm.page-cluster = 0

# Why these matter for Zimbra on AWS:
# - watermark_scale_factor=100 gives the kernel more "breathing room"
#   to manage memory without hitting emergency reclaim during Java GC
#   or MySQL buffer pool operations.
# - page-cluster=0 prevents swap storms when touching old swapped pages,
#   keeping mailboxd responsive even with some historical swap usage.

############################
# Notes:
############################
# These values we consider to be somewhat conservative.
# - Lower dirty ratios force smaller, continuous flushes.
# - Helpful when mixing SSD (gp3/gp2) and HDD (st1/sc1) volumes.
# - Keeps mailboxd and MariaDB responsive during HSM or backups.
#
# To apply immediately:
#   sudo sysctl --system
# Verify:
#   cat /proc/sys/vm/{swappiness,vfs_cache_pressure,dirty_ratio,dirty_background_ratio}

Why it Matters

Zimbra workloads include:

Latency-sensitive writes (MariaDB transactions, mailbox updates)
Bulk sequential writes (HSM migrations, nightly backups)
Java-based services (mailboxd) that can suffer from kernel-level I/O stalls

On AWS, EBS write performance is network-bound, and slow disks can backpressure the entire kernel write queue.
The small dirty ratios here (2 % / 5 %) ensure the kernel never buffers too much data before flushing — helping to keep the system responsive even under heavy HSM or backup operations.

2. Optimized /etc/fstab for ext4 Volumes

If all your Zimbra data volumes are ext4, the following layout and mount options are tuned for AWS EBS and Zimbra’s workload distribution. You may have allocated swap differently, and you may have different mount points, so adjust this file accordingly. Please of course do not simply replace your existing /etc/fstab file with this one!

# /etc/fstab - Zimbra on AWS (ext4)
#  
# Copyright 2025 Mission Critical Email LLC. All rights reserved.
#
# DISCLAIMER:
# This file is provided "AS IS" without warranty of any kind, either express
# or implied, including but not limited to the implied warranties of
# merchantability and fitness for a particular purpose. Use at your own risk.
# In no event shall Mission Critical Email LLC be liable for any damages
# whatsoever arising out of the use of or inability to use this file.
#
# ----------------------------------------------------------
# HDD volumes (st1/sc1): enable larger commit intervals

LABEL=cloudimg-rootfs   /             ext4  defaults        0 1
LABEL=UEFI              /boot/efi     vfat  umask=0077      0 1
/swapfile               none          swap  sw              0 0

# Core Zimbra paths on SSD (gp3)
UUID=08c34f30-... /opt               ext4  defaults,noatime,nodiratime 0 0
UUID=4e571035-... /opt/zimbra/db     ext4  defaults,noatime,nodiratime 0 0
UUID=c0790103-... /opt/zimbra/index  ext4  defaults,noatime,nodiratime 0 0
UUID=479833d6-... /opt/zimbra/store  ext4  defaults,noatime,nodiratime 0 0

# HSM (secondary mail volume) on st1/sc1
UUID=b16a49eb-... /opt/zimbra/hsm    ext4  defaults,noatime,nodiratime,commit=60  0 0

# Backup volume (st1/sc1)
UUID=d9733e55-... /opt/zimbra/backup ext4  defaults,noatime,nodiratime,commit=120 0 0

Rationale

Setting	Effect	Trade-off
noatime,nodiratime	Disables access-time updates on read	Saves metadata writes, ideal for mailstores
commit=60 (HSM)	Flush journal every 60 seconds	Slightly higher risk (<60 second data loss) but smoother HDD I/O
commit=120 (Backup)	Flush every 120 seconds	Prioritizes throughput; suitable for sequential backups
No commit= on gp3	Defaults to 5 seconds	Prioritizes durability and responsiveness

Design Goals

SSD volumes: Fast, transactional, durable. Use defaults or commit=5.
HSM/Backup (HDD): Slow, sequential. Increase commit= to reduce metadata churn.
System-wide effect: Balanced I/O flow, minimal cross-volume contention.

3. Optimized /etc/fstab for XFS Volumes

When using XFS, the commit= option doesn’t exist.
Instead, you influence journal frequency and metadata update behavior with mount options like lazy-count, logbufs, and logbsize. The same comments regarding how and where you mount swap, and what separate disk volumes you may or may not have (mentioned above for the ext4 version of this file) of course apply here as well. Again, please of course do not simply replace your existing /etc/fstab file with this one!

# /etc/fstab - Zimbra on AWS (XFS)
#  
# Copyright 2025 Mission Critical Email LLC. All rights reserved.
#
# DISCLAIMER:
# This file is provided "AS IS" without warranty of any kind, either express
# or implied, including but not limited to the implied warranties of
# merchantability and fitness for a particular purpose. Use at your own risk.
# In no event shall Mission Critical Email LLC be liable for any damages
# whatsoever arising out of the use of or inability to use this file.
#
# ----------------------------------------------------------
# st1/sc1: slow HDDs, tuned for sequential writes

LABEL=cloudimg-rootfs   /            ext4  defaults       0 1
LABEL=UEFI              /boot/efi    vfat  umask=0077     0 1
/swapfile               none         swap  sw             0 0

# Core Zimbra volumes (gp3/gp2)
UUID=08c34f30-... /opt               xfs  defaults,noatime,nodiratime 0 0
UUID=4e571035-... /opt/zimbra/db     xfs  defaults,noatime,nodiratime 0 0
UUID=c0790103-... /opt/zimbra/index  xfs  defaults,noatime,nodiratime 0 0
UUID=479833d6-... /opt/zimbra/store  xfs  defaults,noatime,nodiratime 0 0

# HSM (st1/sc1) - sequential archival writes
UUID=b16a49eb-... /opt/zimbra/hsm    xfs  defaults,noatime,nodiratime,lazy-count=1,logbufs=8,logbsize=262144 0 0

# Backups (st1/sc1) - large sequential throughput
UUID=d9733e55-... /opt/zimbra/backup xfs  defaults,noatime,nodiratime,lazy-count=1,logbufs=8,logbsize=262144 0 0

Rationale

Option	Purpose	Notes
lazy-count=1	Defers updates to filesystem counters (e.g., free inode counts)	Reduces metadata churn on HDDs
logbufs=8	Increases in-memory log buffers	Improves concurrency, especially under parallel writes
logbsize=26214	Enlarges log block size	Fewer, larger journal writes = smoother sequential I/O

What Replaces on XFS-Formatted Disks the commit= Attribute Used on ext4-Formatted Disks?

On XFS, journal flush timing is governed by kernel writeback tunables (vm.dirty_*) rather than per-mount intervals. The 99-zimbra-aws.conf sysctl file above serves that function, controlling when dirty pages are written back globally.

4. Rationalizations and Tradeoffs

Why Small dirty_ratio Values?

EBS write throughput is network-bound. If the kernel buffers too many dirty pages, then flushes them all at once, you can hit:

High EBS queue depth → throttling → I/O stalls
Spikes in iowait and JVM pauses (mailboxd responsiveness drops)

Setting vm.dirty_background_ratio=2 and vm.dirty_ratio=5 ensures the flusher keeps up, maintaining smooth, predictable latency.

Why Not Use The discard Mount Option for the SSD Disks, To Get The Benefits of Running TRIM?

AWS documentation states that only locally-attached NVME disks support TRIM, and that new EBS disks you create and attach to your instance come fully trimmed already.

Why Not Use The nofail Mount Option, and End Each Line in /etc/fstab with “0 2” Instead of “0 0” As AWS Recommends?

It is true that AWS recommends using the nofail mount option and ending lines in /etc/fstab with “0 2“. The nofail option will allow an instance to boot if there is a defective line in /etc/fstab. The “0 2” option will cause the file system to be checked each time on boot (fsck).

Since we mount different disks for specific Zimbra directories, if the /opt disk didn’t get mounted due to a typo in /etc/fstab, but we aloow the boot process to continue, Zimbra is not going to run (properly, if at all) anyway. Besides, after making edits to /etc/fstab it’s much safer to just stop Zimbra, unmount and remount the impacted disk to confirm that you didn’t fat-finger anything. Worst case, there’s a straightforward process to recover, comprised of shutting down the instance; detaching the root volume and attaching and mounting it to another running instance; correcting /etc/fstab from that other instance; detaching and reattaching the corrected root volume to the original instance, and then; rebooting the original instance. Doing this takes less than 15 minutes (but don’t ask me how I know this, please…)

As regards the “0 2” option, EBS volumes are pretty stout with very high durability rates, and AWS is frequently and transparently rotating storage frames behind the scenes anyway, so the need to run fsck on reboot like we would on a bare metal server is significantly reduced. Plus, do you really want to slow down the boot process to allow fsck to be run against 3TB of HSM disks and a 9TB /opt/zimbra/backup disk every time you reboot to deploy for example a new kernel, or to move your instance to different AWS hardware?

Why Different commit= Values for HSM and Backups?

The ext4 journal commit= parameter affects metadata flush frequency, not file data.
HSM and backups generate predictable, sequential writes and can tolerate longer metadata windows (60–120 seconds).
This minimizes journal overhead and keeps EBS I/O sequential — far more efficient on st1/sc1.

Why Use lazy-count and logbufs on XFS?

XFS maintains counters for free inodes and blocks. With lazy-count=1, it defers updating these until needed, reducing unnecessary log activity. logbufs=8 and large logbsize help coalesce writes — a significant improvement on high-latency EBS volumes.

When a file is created, deleted, or extended, XFS doesn’t immediately update the inode or directory tables directly on disk. Instead, it writes those changes into a transaction log. logbsize is XFS’s way of making its journal write in bigger, smoother chunks — the XFS equivalent of “commit less often, but more efficiently.”

When XFS records metadata updates (e.g., new files, directory entries, inode changes), it stores them in memory log buffers before writing to the on-disk log (journal). The more buffers it has (logbufs), the more metadata transactions it can accumulate and flush concurrently. By default, XFS dynamically allocates between 2 and 8 buffers, depending on kernel version and system memory. However, you can force it higher — especially useful for systems that do a lot of parallel file operations (like Zimbra’s LMTP deliveries, indexing, and HSM processes).

Durability vs. Performance

Component	Priority	Trade-off	Setting
Mailstore (gp3)	Durability	Minimal latency increase	Default(commit=5)
Database (gp3)	Durability	Must fsync frequently	Default
Index (gp3)	Responsiveness	Low-risk of loss	Default
HSM (st1/sc1)	Throughput	Minor metadata risk	commit=60 / large logbsize
Backup (st1/sc1)	Maximum throughput	Minimal risk	commit=120 / large logbsize

Summary Table

Area or Item	Parameter	Recommended	Notes
Swap	vm.swappiness	1	Prevent unnecessary swap I/O
Memory reclaim	vm.vfs_cache_pressure	150	Balance metadata vs. data cache
Writeback	vm.dirty_ratio	5	Prevent EBS saturation
Background writeback	vm.dirty_background_ratio	2	Start flushing early
ext4 HSM	commit=60	Balanced metadata interval
ext4 Backup	commit=120	Maximize sequential throughput
XFS HSM	logbufs=8, logbsize=262144	Reduce metadata overhead
XFS Backup	logbufs=8, logbsize=262144	Optimize sequential I/O

Closing Thoughts:

These tunings are based on our experiences with hosting Zimbra on AWS and Zimbra’s typical workload characteristics on busy mailstores or busy single-server deployments:

Mail latency and I/O stability > raw throughput.
Avoid global kernel stalls from slow disks.
Favor predictability and data safety for primary volumes, efficiency for secondary ones.

Zimbra’s performance on AWS improves we have seen when:

I/O patterns are smoothed by kernel writeback;
Metadata chatter is minimized, and;
EBS volumes aren’t flooded with bursty writes.

By combining the above sysctl and fstab strategies, you’ll have a system that stays responsive during HSM moves, nightly backups, and busy disk activity peaks — without trading away reliability.

Concluding Remarks:

Always validate under realistic mail load and monitor vmstat and iostat latency metrics. Every Zimbra system is different — some more than others. The settings above may not be optimal for your system’s unique workloads. Nevertheless, now that you know a bit more about tuning the Linux kernel and ext4 and XFS file systems, you can benchmark and derive optimal settings for your Zimbra environment.

Of course, if you’d like help with this or any other Zimbra challenges, please just fill out the form below and we’ll be back in touch!

Go back

Your message has been sent

Hope that helps,
L. Mark Stone
Mission Critical Email LLC
16 October 2025

A sincere “Thank you!” to Matthew Francis at In-Tuition Networks for your considered review and terrific suggestions for improving this blog post.

The information provided in this blog is intended for informational and educational purposes only. The views expressed herein are those of Mr. Stone personally. The contents of this site are not intended as advice for any purpose and are subject to change without notice. Mission Critical Email makes no warranties of any kind regarding the accuracy or completeness of any information on this site, and we make no representations regarding whether such information is up-to-date or applicable to any particular situation. All copyrights are reserved by Mr. Stone. Any portion of the material on this site may be used for personal or educational purposes provided appropriate attribution is given to Mr. Stone and this blog.

Optimizing Zimbra on Amazon Web Services – Applicable to Ubuntu Server 20/22 and Rocky Linux 8/9

Overview

1. /etc/sysctl.d/99-zimbra-aws.conf

Why it Matters

2. Optimized /etc/fstab for ext4 Volumes

Rationale

Design Goals

3. Optimized /etc/fstab for XFS Volumes

Rationale

What Replaces on XFS-Formatted Disks the commit= Attribute Used on ext4-Formatted Disks?

4. Rationalizations and Tradeoffs

Why Small dirty_ratio Values?

Why Not Use The discard Mount Option for the SSD Disks, To Get The Benefits of Running TRIM?

Why Not Use The nofail Mount Option, and End Each Line in /etc/fstab with “0 2” Instead of “0 0” As AWS Recommends?

Why Different commit= Values for HSM and Backups?

Why Use lazy-count and logbufs on XFS?

Durability vs. Performance

Summary Table

Closing Thoughts:

Concluding Remarks:

Your message has been sent

Share this:

Leave a Reply Cancel reply