UK’s trusted IT infrastructure partner since 2003
Servnet
FinanceToolsConfiguratorGet in Touch
Servnet Research · Data Study

AI Servers: The 2026 Data Study

What GPU/AI servers are, how they are built, what they cost, and how to buy them — a cited, UK-focused reference.

Last updated 5 July 2026 · every figure is sourced and dated · specifications verified against vendor datasheets.

~10.2 kW
Power drawn by one air-cooled 8× H100 server (NVIDIA DGX H100, 8U)[src]
~120 kW
Power per GB200 NVL72 rack — 100% liquid-cooled, beyond any air-cooled rack[src]
1.8 TB/s
Per-GPU NVLink bandwidth on Blackwell (5th-gen) — 2× Hopper’s 900 GB/s[src]
~$6.7tn
Global data-centre capex needed by 2030 to meet compute demand (McKinsey)[src]
Size an AI build → GPU calculatorDownload the data (JSON)
Architecture

What an AI server actually is

Almost every high-end AI server is built around NVIDIA’s HGX 8-GPU SXM baseboard: eight GPUs on one carrier, wired all-to-all by NVLink and NVSwitch rather than PCIe, so they behave like one large accelerator.[src] SXM is what enables that mesh; cheaper PCIe add-in cards are used for inference and smaller deployments but can only bridge GPUs in pairs.[src] Around the GPUs sits a dual-socket CPU host with 1–2 TB of RAM,[src] and a high-speed east-west fabric — InfiniBand (deterministic latency, in-network reductions) or 400/800GbE Ethernet (broader ecosystem) — to scale training across many nodes.

NVLink bandwidth is the defining metric of the platform, and it has doubled with Blackwell:

Per-GPU NVLink bandwidth by generationGB/sA100 (NVLink 3)600H100 / H200 (NVLink 4)900B200 (NVLink 5)1,800
Source: NVIDIA NVLink & NVLink Switch.[src] Blackwell reaches 1.8 TB/s per GPU via 18 links × 100 GB/s.
Workloads

Training vs inference — one box can’t do both well

Training

Big, synchronised GPU clusters advancing in lockstep. Wants maximum HBM capacity, top FP8/FP4 compute, the full 8-GPU NVLink mesh and 400/800G scale-out for all-reduce. Runs hot and near-continuously — the case for liquid cooling and owned/colo hardware.

Inference / serving

Latency-bound and often memory-bandwidth-bound — token throughput tracks HBM bandwidth, not raw FLOPS. Can run on lower-power PCIe GPUs, smaller nodes and commodity Ethernet, distributed closer to users. The highest cumulative volume of AI compute.

Workload/infrastructure split per industry analysis of training vs inference server design.

The GPUs

Data-centre GPU spec comparison

Every figure is from the manufacturer’s datasheet. These are static hardware specs (no pricing), so they don’t go stale.

GPUArchVRAMBandwidthTDPInterconnect
NVIDIA H100 SXM5[src]Hopper80 GB HBM33.35 TB/sup to 700 WNVLink 4th-gen 900 GB/s
NVIDIA H100 PCIe[src]
NVLink bridge is 900 GB/s (reuses A100 bridges); the 600 GB/s figure belongs to the dual-GPU H100 NVL card.
Hopper80 GB HBM2e2.0 TB/s350 WNVLink bridge 900 GB/s (2-GPU)
NVIDIA H200 SXM[src]Hopper141 GB HBM3e4.8 TB/sup to 700 WNVLink 4th-gen 900 GB/s
NVIDIA B200 SXM[src]
192 GB is the raw dual-die capacity; DGX/HGX B200 configures 180 GB/GPU. TDP is a configurable maximum.
Blackwell180 GB HBM3e (DGX/HGX)8 TB/s~1,000 W (config. max)NVLink 5th-gen 1.8 TB/s
NVIDIA GB200 (superchip)[src]Grace + 2× Blackwell372 GB HBM3e + 480 GB LPDDR5X16 TB/s (HBM)superchipNVLink-C2C 3.6 TB/s per superchip
NVIDIA L40S[src]
No NVLink — an inference/graphics part, not a training GPU.
Ada Lovelace48 GB GDDR6 (ECC)864 GB/s350 WNo NVLink; PCIe Gen4 x16
NVIDIA A100 80 GB SXM[src]Ampere80 GB HBM2e2.04 TB/s400 WNVLink 3rd-gen 600 GB/s
AMD Instinct MI300X[src]
The leading non-NVIDIA alternative.
CDNA 3192 GB HBM35.3 TB/s750 W (TBP)Infinity Fabric; PCIe Gen5 x16
AMD Instinct MI325X[src]CDNA 3256 GB HBM3e6.0 TB/s1,000 W (TBP)Infinity Fabric; PCIe Gen5
GPU memory (VRAM) comparedGBL40S48A100 80GB80H10080H200141B200 (DGX/HGX)180AMD MI300X192AMD MI325X256
The landscape

The AI-server platforms

ModelVendorGPUsFormCooling
PowerEdge XE9680[src]Servnet suppliesDell8× HGX H100/H200 SXM (or MI300X / Gaudi3 OAM)6UAir
PowerEdge XE9640[src]Servnet suppliesDell4× H100 SXM (NVLink) or Intel Max OAM2UDirect liquid (facility water)
PowerEdge XE8640[src]Servnet suppliesDell4× HGX H100 SXM54UClosed-loop liquid + fans
ProLiant DL380a Gen12[src]Servnet suppliesHPEup to 8× double-wide PCIe (RTX PRO 6000, H200 NVL, H100 NVL, L40S)4UAir or direct liquid
Cray XD670[src]Servnet suppliesHPE8× HGX H100/H200 SXM55UAir, with liquid option
ThinkSystem SR675 V3[src]Servnet suppliesLenovoup to 8× PCIe, or 4× HGX H200 SXM (NVLink)3UNeptune hybrid (HGX variant)
ThinkSystem SR780a V3[src]Servnet suppliesLenovo8× HGX H100/H200/B200 (NVLink 900 GB/s)5UNeptune direct liquid + air
SYS-821GE-TNHR[src]Supermicro8× HGX H100/H200 (NVLink + NVSwitch)8UAir (liquid variants exist)
DGX H100 / H200[src]NVIDIA8× H100 (640 GB) / H200 (1,128 GB) SXM8UAir
DGX B200[src]NVIDIA8× Blackwell (1,440 GB HBM3e)10UAir
GB200 NVL72[src]NVIDIA72× Blackwell + 36 Grace (rack-scale)Rack100% liquid
G593-SD0[src]Gigabyte8× HGX H100 SXM55UAir

Servnet supplies and configures the Dell, HPE and Lenovo platforms above — including the 8-GPU Lenovo ThinkSystem SR780a V3. Build one to your spec in the Lenovo configurator, or explore the ThinkSystem range.

Facilities

Power & cooling: why AI breaks the rack

The single biggest practical shock for buyers is power density. A traditional enterprise rack is provisioned for perhaps 6–15 kW. A single 8-GPU AI node can consume that on its own, and a rack-scale Blackwell system is an order of magnitude beyond it — which is why NVIDIA’s Blackwell generation is designed around direct-liquid cooling.[src]

Power draw: where AI hardware lands vs an air-cooled rackkWTypical air-cooled rack~151× 8-GPU H100 node (DGX)~10.2B200 GPU (liquid TDP)1.2 (per GPU)GB200 NVL72 rack~120
Sources: NVIDIA DGX H100 (~10.2 kW/8U node)[src]; NVIDIA GB200 NVL72 (~120 kW rack, 100% liquid-cooled)[src]; B200 defined at 1,000 W air / 1,200 W liquid per GPU.[src]

Practical read: many Hopper (H100/H200) servers still run air-cooled and fit high-density enterprise or colo racks; Blackwell largely does not. Plan power, cooling and rack space before the hardware.

Economics

What it costs: own vs cloud

Acquisition prices for AI servers and data-centre GPUs are almost entirely quote-gated and volatile, so we don’t publish a headline “street price” — anyone who does is guessing. What is knowable is the shape of the decision and the current cloud rental rates.

Indicative cloud GPU rental — on-demand, mid-2025
~$4.1 /GPU-hr
AWS (p5.48xlarge, H100)
~$3.00 /GPU-hr
Google Cloud (A3-high, H100)
~$6.98 /GPU-hr
Microsoft Azure (NC H100 v5)

AWS cut on-demand NVIDIA GPU EC2 pricing by up to ~45% (44% off P5/H100), effective 1 June 2025.[src] On-demand list rates move frequently and vary by region — treat as indicative and verify live before deciding.

The durable rule: rent for short, bursty or uncertain workloads; own (or colocate) when utilisation is high and sustained over years. At continuous use, an H100 rents for roughly the price of buying one within a year — so the more steadily you run GPUs, the more owning pays. The exact break-even depends on your utilisation, GPU choice and current cloud pricing — model your own case in the AI GPU calculator (it carries its own cited cloud-vs-own comparison), and finance the capex over 2–5 years with the IT finance calculator if you buy.

Context

The market backdrop

~$6.7 trillion
Global data-centre capex needed by 2030 to meet compute demand (~$5.2tn of it AI-capable)[src]
~219 GW
Projected global data-centre capacity by 2030 (nearly 3×), ~70% of new demand from AI[src]
61%
Western-European CIOs who say geopolitics will raise reliance on local/regional cloud providers[src]
>75%
Enterprises outside the US expected to have a digital-sovereignty strategy by 2030 (Gartner)[src]

Figures from named analysts (McKinsey, Gartner). Enterprise “AI repatriation” to on-prem/colo is a real trend but is largely evidenced by vendor-sponsored surveys, so we cite only the named-analyst anchors here and treat the sponsored numbers as directional.

FAQ

AI servers — common questions

What is an AI server?

An AI server is a GPU-dense server built to train or run AI models. Most are built on NVIDIA’s HGX 8-GPU SXM baseboard, where eight GPUs are linked all-to-all by NVLink and NVSwitch (far faster than PCIe). A CPU host, large system memory and high-speed east-west networking (InfiniBand or 400/800GbE Ethernet) surround the GPUs. Examples include the Dell PowerEdge XE9680, HPE Cray XD670, Lenovo ThinkSystem SR780a V3 and NVIDIA DGX.

How much power does an AI server use?

A lot. One air-cooled 8× H100 server (NVIDIA DGX H100, 8U) draws about 10.2 kW — more than many UK enterprise racks are provisioned for at a single node. NVIDIA’s Blackwell B200 is defined at 1,000 W air-cooled or 1,200 W liquid-cooled per GPU, and a rack-scale GB200 NVL72 draws roughly 120 kW and must be liquid-cooled.

Which GPU is best for AI — H100, H200 or B200?

It depends on the workload. H100 (80 GB HBM3) is the established training GPU; H200 adds 141 GB HBM3e and 4.8 TB/s bandwidth, which helps memory-bound inference; B200 (Blackwell, 180 GB HBM3e in DGX/HGX, 8 TB/s) is the current top training part. For lighter inference, PCIe parts like the L40S (48 GB, no NVLink) are cheaper and lower-power. AMD’s Instinct MI300X (192 GB) and MI325X (256 GB) are the main non-NVIDIA alternatives.

Is it cheaper to buy an AI server or rent GPUs in the cloud?

Renting is cheaper for short, bursty or uncertain workloads; owning wins at sustained high utilisation over multiple years. The tipping point depends on your utilisation, GPU choice and cloud rates (which move — AWS cut on-demand GPU pricing by up to ~45% in June 2025). The durable rule is: the higher and steadier your GPU utilisation, the more owning or colocating pays. Model your own case with our AI GPU calculator.

Why do AI servers need liquid cooling?

Because air can only remove so much heat from a rack — typically up to ~8–25 kW. A single Blackwell node or a GB200 NVL72 rack (~120 kW) is far past that ceiling, so NVIDIA’s Blackwell generation is designed around direct-liquid cooling (DLC). Many Hopper (H100/H200) servers can still run air-cooled, which is why they remain practical for enterprises without liquid-ready facilities.

Do I need special data-centre facilities for AI servers?

Usually yes. Rack power density is the first constraint — one 8-GPU node can exceed a whole traditional rack’s power budget, and Blackwell adds liquid-cooling plumbing. Options are upgrading your own facility, using a colocation provider with high-density/liquid-ready halls, or renting cloud GPUs. Lead times and GPU availability are also real planning factors.

Transparency

Methodology, sources & how to cite

Hardware specifications are taken directly from manufacturer datasheets (NVIDIA, AMD) and OEM product pages (Dell, HPE, Lenovo, Supermicro, NVIDIA), and were cross-checked in an adversarial fact-check pass — during which one figure (the H100 PCIe NVLink bridge) was corrected to 900 GB/s. Market figures are attributed to named analysts (McKinsey, Gartner). Cloud prices are dated and labelled indicative because they change frequently; we publish no hardware “street prices” because those are quote-gated and volatile. Last updated 5 July 2026.

Cite this study
Servnet (2026) “AI Servers: The 2026 Data Study.” servnetuk.com/research/ai-servers, accessed {date}.
Journalists & researchers: this study is free to cite with a link to servnetuk.com/research/ai-servers. Structured data: download JSON.
Full source list (24)

Planning an AI server or GPU cluster?

Servnet specs, supplies and supports Dell, HPE and Lenovo AI servers for UK organisations — with the power, cooling, networking and finance worked out. Tell us the workload; we’ll design the build.

Open the GPU calculator

Talk to a UK specialist

Get expert advice or a no-obligation quote — servers, storage, networking, maintenance, finance and cloud. We reply the same working day.

or call 0800 987 4111