UK’s trusted IT infrastructure partner since 2003
Servnet
ConfiguratorGet in Touch
RAID rebuild & ZFS resilver explained (times & risk) — analysisRAID rebuild & ZFS resilver explained (times & risk) — analysis — reach
Storage · RAID

RAID rebuild & ZFS resilver explained (times & risk)

Servnet Storage Team · Storage & Data Protection7 min read

A rebuild (or ZFS resilver) is when the array reconstructs a failed drive — and it's the riskiest moment in an array's life. Here's how long it takes, why it's risky, and how to shrink both. Estimate it in the RAID calculator.

Rebuild risk by level
RAID 5 / Z1RAID 6 / Z2RAID 10Redundancy in rebuildNone left1 parity leftMirror intactURE mid-rebuildData lossRecoverableRecoverableRebuild methodRead all + parityRead all + parityMirror copyRebuild speedSlowSlowFast

What happens during a rebuild

When a drive fails, the array reconstructs its contents onto a replacement (or hot spare): mirrors copy from the surviving half, parity arrays read all surviving members and recompute the missing data. Until that finishes, the array runs degraded — and for single-parity levels, with no redundancy left.

Rebuild time depends mainly on drive capacity and the rebuild rate, which is throttled by ongoing workload. As a rough guide, a large nearline HDD rebuilds at tens of MB/s under load, so a multi-TB drive can take many hours to days. The calculator gives an indicative figure (clearly labelled an estimate).

Why rebuilds are risky

Two things can go wrong during the window. First, an unrecoverable read error (URE) on a surviving drive: in single-parity RAID 5 / RAIDZ1 there's no parity left, so that's data loss (see is RAID 5 dead?). Second, a second drive failure — the rebuild stresses every drive, and a same-batch sibling may fail too. Dual parity (RAID 6 / RAIDZ2) survives both a URE and one extra failure during a single-drive rebuild.

The bigger the drives, the longer the window and the more bits are read, so the higher both risks. This is why dual parity is the default on large-capacity arrays.

Lower your rebuild risk
Large drives?
yes
Dual parity (6/Z2)
slow rebuilds
Distributed / nested
downtime
Add a hot spare

Shrinking rebuild time and risk

Distributed RAID (Dell ADAPT, HPE distributed RAID) and ZFS resilver only rebuild the used data and spread the work across many drives, finishing far faster than classic dedicated-parity rebuilds. Nesting (RAID 50/60) splits a big pool into smaller, faster-rebuilding groups. A hot spare removes the wait for a human.

The strongest combination on big drives: dual (or triple) parity + distributed rebuild + a hot spare — fast reconstruction with redundancy still in reserve.

Key takeaways
  • Rebuild/resilver reconstructs a failed drive; the array runs degraded until it finishes.
  • Time scales with drive capacity and is throttled by live workload — hours to days on big HDDs.
  • Risk = a URE (fatal for single parity) or a second failure during the window.
  • Shrink both with dual parity, distributed/RAIDZ rebuilds, nesting and a hot spare.
Frequently asked

FAQs — RAID rebuild & ZFS resilver explained (times & risk)

Rebuilds & resilvers

How long does a RAID rebuild take?

It depends on drive capacity and the rebuild rate (throttled by live workload). Large nearline HDDs rebuild at tens of MB/s under load, so a multi-TB drive can take many hours to days. SSDs and distributed/RAIDZ rebuilds are much faster. The calculator gives an indicative estimate.

What is a resilver in ZFS?

ZFS's term for a rebuild. Because ZFS knows which blocks are used, a resilver only reconstructs actual data (not empty space) and verifies checksums as it goes — often faster than a full-disk rebuild.

Why is a rebuild dangerous?

The array runs degraded during it. A URE on a surviving drive is fatal for single-parity levels, and the stress of the rebuild can trigger a second drive failure. Dual parity survives both during a single-drive rebuild.

Related

Continue reading

More in Storage

Got a question this article didn't answer?

One conversation with an engineer who's done this before. No sales script.

Talk to Servnet →