UK’s trusted IT infrastructure partner since 2003
Servnet
ConfiguratorGet in Touch
ZFS vdevs, ashift & recordsize explained — analysisZFS vdevs, ashift & recordsize explained — analysis — reach
Storage · RAID

ZFS vdevs, ashift & recordsize explained

Servnet Storage Team · Storage & Data Protection8 min read

Three ZFS settings make or break a pool's capacity and performance: how you lay out vdevs, ashift (sector size) and recordsize. Get them wrong and you waste capacity or IOPS. See the effect live in the ZFS calculator.

ZFS layout & tuning layers
4Pool layout (vdevs)IOPS scale with vdev count3RAIDZ levelZ1 / Z2 / Z3 parity per vdev2recordsize / volblocksize128 KiB default; small = padding1ashift (sector size)ashift=12 (4 KiB), set at creation

vdev layout sets capacity and IOPS

A pool is striped across its vdevs, so layout is the biggest lever. A RAIDZ vdev delivers roughly the random IOPS of a single drive, so pool IOPS scale with the number of vdevs — many narrow vdevs (or mirrors) for IOPS, fewer wide vdevs for capacity. Redundancy is per vdev, and a lost vdev loses the pool, so wider vdevs of large drives should use RAIDZ2/3.

You can't shrink a vdev or (until recently) easily change its width, so plan layout up front. See RAIDZ1 vs Z2 vs Z3.

ashift — match the physical sector size

ashift sets ZFS's smallest allocation unit as a power of two: ashift=12 = 4 KiB sectors (modern drives), ashift=9 = 512 B (legacy). Set it to match the drive's physical sector size at pool creation — it can't be changed later. Too small an ashift on a 4 KiB drive wrecks performance; ashift=12 is the safe modern default.

ashift also interacts with capacity: RAIDZ rounds allocations to a multiple of (parity+1) sectors, so larger sectors mean coarser rounding and more potential padding on small blocks.

recordsize effect on a wide RAIDZ vdev
128 KiB32 KiB8 KiB (zvol)Padding overheadNegligibleSomeLargeGood forFiles / mediaMixedVM / DB zvolsBetter onRAIDZ2RAIDZ2Mirror vdevs

recordsize — the capacity/perf dial

recordsize (datasets) or volblocksize (zvols) is the maximum logical block size. The 128 KiB default is great for general files and minimises RAIDZ padding. But small fixed blocks — an 8 or 16 KiB volblocksize for VM/database zvols — combined with RAIDZ's (parity+1) rounding can waste a large fraction of capacity to padding, and read-amplify if mismatched to the workload.

Rule of thumb: large recordsize for big sequential files, smaller for databases/VMs (and prefer mirror vdevs there). The calculator lets you change ashift and recordsize to watch the padding overhead move.

Key takeaways
  • vdev layout is the biggest lever: more vdevs = more IOPS; redundancy is per vdev.
  • Set ashift to the drive's sector size at creation (ashift=12 / 4 KiB is the modern default) — it's permanent.
  • recordsize 128 KiB default minimises RAIDZ padding; small volblocksize on RAIDZ wastes capacity.
  • For VM/DB random IOPS, prefer mirror vdevs over wide RAIDZ.
Frequently asked

FAQs — ZFS vdevs, ashift & recordsize explained

ZFS tuning

What ashift should I use?

ashift=12 (4 KiB) for virtually all modern drives — it matches their physical sector size. Use ashift=9 only for genuine 512-byte-sector legacy drives. It's set at pool creation and cannot be changed, so get it right up front.

Does recordsize affect ZFS capacity?

Yes, on RAIDZ. ZFS rounds allocations to a multiple of (parity+1) sectors, so small recordsize/volblocksize (e.g. 8 KiB zvols) on a wide RAIDZ vdev can waste significant capacity to padding. The default 128 KiB minimises it.

How do I get more IOPS from ZFS?

Add more vdevs (each adds ~one drive's random IOPS) or use mirror vdevs instead of wide RAIDZ. Pool IOPS scale with vdev count, not total drive count.

Related

Continue reading

More in Storage

Got a question this article didn't answer?

One conversation with an engineer who's done this before. No sales script.

Talk to Servnet →