AI infrastructure has imported a vocabulary that did not exist in mainstream server rooms a few years ago. Vendors now talk about direct liquid cooling, immersion, PUE, RoCE, RDMA and lossless Ethernet as if everyone already knows what they mean. This is the decoder. We will define the cooling terms, air versus direct-liquid versus immersion, and explain what they trade against each other; then do the same for the networking jargon, RoCE and RDMA, and why an AI fabric is engineered to never drop a packet. It is a glossary with enough engineering behind it to make the deeper decisions readable, and it links to the cooling article where those decisions are actually weighed.
Why cooling suddenly matters
A conventional rack of business servers might draw a handful of kilowatts and shed its heat into room air without much thought. An AI rack of dense GPUs is a different animal: power densities climb into the tens of kilowatts per rack, far beyond what moving room air can practically remove. At that point cooling stops being a facilities afterthought and becomes a design constraint that shapes the whole platform, which is why GPU selection and cooling are decided together in our GPU accelerators work.
The single number that summarises how efficiently a facility cools is PUE, power usage effectiveness: the ratio of total facility power to the power actually delivered to the IT equipment. A PUE of 2.0 means you spend a watt on cooling and overhead for every watt of compute; a PUE near 1.1 means almost all the power reaches the servers. Cooling method is the biggest lever on that ratio, which is why it is worth understanding the three approaches.
Air, direct-liquid and immersion
Air cooling is the familiar baseline: fans and chilled room air. It is simple, universally supported and entirely adequate for standard servers, but it runs out of headroom as rack density climbs, and the energy spent moving and chilling air pushes PUE up.
Direct liquid cooling, often called DLC or cold-plate cooling, runs coolant through plates clamped directly onto the hot components, the GPUs and CPUs. Liquid carries heat far more effectively than air, so DLC handles dense racks while improving efficiency, and it has become the mainstream answer for high-density AI without reinventing the entire data hall. Immersion cooling goes further still: whole servers are submerged in a non-conductive fluid that absorbs heat from every component at once. It enables the highest densities and excellent efficiency, but it is the most operationally different, changing how hardware is serviced, racked and maintained.
- •Air: simplest and universal, but limited density and higher PUE at scale
- •Direct liquid (cold plate): high density and efficiency, mainstream for AI, moderate plumbing
- •Immersion: highest density and efficiency, but a fundamentally different operating model
- •PUE falls as you move from air toward liquid and immersion
The networking half: RDMA and RoCE
Cooling keeps dense GPUs alive; the network keeps them fed. In a multi-GPU training job, GPUs across many servers exchange data constantly, and ordinary networking adds too much delay because every transfer is copied through the operating system and CPU on both ends.
RDMA, remote direct memory access, removes that overhead. It lets one machine read or write another machine's memory directly, bypassing the CPU and operating system on both sides, so data moves with very low latency and minimal CPU cost. RoCE, RDMA over Converged Ethernet, is the technology that runs RDMA across standard Ethernet networks rather than requiring InfiniBand. It is why an AI cluster can use Ethernet switching and still get the near-zero-overhead, memory-to-memory transfers that GPUs need, and it is a key reason high-speed Ethernet has become viable for AI fabrics built on modern network cards.
Lossless Ethernet: why a dropped packet is the enemy
There is a catch that explains a lot of AI networking jargon. RDMA performance collapses if packets are dropped, because a lost packet forces retransmission and stalls the tightly-synchronised GPUs waiting on that data. Standard Ethernet, by design, drops packets when congested; that is normal and TCP simply resends them. For RoCE, it is unacceptable.
So an AI fabric is engineered to be lossless. Mechanisms such as priority flow control and explicit congestion notification let switches signal senders to slow down before buffers overflow, rather than discarding packets. This is what people mean by a lossless or congestion-managed network: an Ethernet fabric specially configured so that under load it pauses and paces traffic instead of dropping it. Designing that fabric, the cards, switching and congestion settings, is a deliberate exercise rather than a default, and it sits within our network cards and fabric design work.
How the glossary fits together
The two halves are connected. You adopt dense GPUs because the workload demands them; that density forces a cooling decision, air, direct-liquid or immersion, that shapes power efficiency and the physical build; and the same workload demands a low-latency, lossless RDMA fabric, usually RoCE on high-speed Ethernet, so the GPUs are never starved of data. Cooling and networking are the two sides of making dense AI compute actually usable.
This article is the decoder; the engineering trade-offs of the cooling side, where the air-versus-liquid call is actually made for a specific build, are worked through in depth in AI server cooling, air vs liquid in 2026. Read this to understand the terms, then that to make the decision, and bring the GPU and networking choices together with our GPU accelerators practice.