When you build an AI cluster, the network between the GPUs matters almost as much as the GPUs themselves - a fast accelerator starved of interconnect bandwidth sits idle waiting for its peers. For years that fabric meant InfiniBand, largely from one vendor. Now two open standards are mounting a serious challenge: Ultra Ethernet, which re-engineers Ethernet for AI, and UALink, which targets tight accelerator-to-accelerator links. For UK teams planning AI infrastructure, this is a standards-watch moment - the choices are no longer just InfiniBand or nothing, and the fabric you pick shapes scale, cost and vendor lock-in for years. Here is how the three compare.
Why the interconnect decides AI cluster performance
Training and large-scale inference spread a single job across many GPUs that must constantly exchange gradients and activations. The collective operations behind that - all-reduce, all-to-all - are intensely network-bound, and any latency or congestion in the fabric directly idles expensive accelerators. As clusters grow from a handful of nodes to hundreds, the interconnect becomes the dominant factor in how much useful work the GPUs actually do, which is why fabric choice is a first-order decision, not an afterthought.
Two different jobs are involved, and the new standards split along that line. There is scale-up - linking accelerators tightly within and across a few nodes so they behave almost like one big device - and scale-out - connecting many nodes into a large cluster. Historically InfiniBand stretched across both; the new standards are more specialised, and understanding which job each targets is the key to reading the landscape. The fundamentals of building such a cluster are in building your first UK on-prem AI cluster.
Ultra Ethernet: re-engineering Ethernet for AI scale-out
Ultra Ethernet, developed by the Ultra Ethernet Consortium, takes the ubiquitous, multi-vendor Ethernet ecosystem and re-works it specifically for AI and HPC scale-out. Standard Ethernet was not designed for the bursty, loss-sensitive collective traffic of large training jobs; Ultra Ethernet adds modern congestion control, more efficient packet handling and better support for the collective patterns AI workloads generate, aiming to match the performance characteristics that made InfiniBand attractive while keeping Ethernet's open, competitive supplier base.
The strategic appeal for buyers is the ecosystem. Ethernet has many switch and NIC vendors, deep operational familiarity in most organisations, and pricing shaped by competition rather than a single supplier. If Ultra Ethernet delivers on AI-scale performance, it offers a path to large clusters without committing to one vendor's fabric - which is exactly why so much of the industry has lined up behind it. Choose AI-capable NICs with our network cards guidance.
- •Ultra Ethernet: open multi-vendor scale-out fabric, AI-tuned Ethernet - broad ecosystem, competitive pricing
- •UALink: open scale-up accelerator-to-accelerator interconnect - tight, low-latency GPU linking
- •InfiniBand: mature, high-performance, largely single-vendor - proven but more locked-in
UALink: open scale-up for accelerator-to-accelerator
UALink addresses the other half of the problem. Backed by a broad industry consortium, it is an open standard for the tight, low-latency, high-bandwidth links between accelerators - the scale-up domain where GPUs need to share memory and results almost as if they were a single device. This is the space proprietary accelerator interconnects have owned, and UALink's pitch is to provide an open, multi-vendor alternative so customers are not locked into one accelerator vendor's fabric for the most performance-critical links.
The two standards are therefore complementary rather than competing: UALink for scale-up within the accelerator pod, Ultra Ethernet for scale-out across the cluster. A future AI build could plausibly use both - UALink binding the GPUs tightly together and Ultra Ethernet stitching the pods into a large fabric - which is a very different picture from one vendor's InfiniBand spanning everything. Plan the accelerators alongside the fabric with our GPU and accelerators guidance.
InfiniBand: the incumbent it has to beat
InfiniBand is not going away - it is mature, very high performance, and has the deepest software and operational track record for large-scale training, which is exactly why it remains the default for the biggest clusters today. Its weaknesses from a buyer's standpoint are commercial rather than technical: it is largely a single-vendor ecosystem, which means pricing power, supply dependence and a smaller pool of operational expertise outside specialist shops.
So the standards-watch question is not whether InfiniBand works - it plainly does - but whether the open alternatives mature fast enough, and perform close enough, to make the openness worth it. For most UK organisations building their first or second AI cluster, that maturity timeline is the thing to track, because committing to a fabric is a multi-year, hard-to-reverse decision. This is distinct from the separate question of switch port speeds; here the issue is the fabric architecture and its ecosystem, not whether the ports are 400G or 800G.
What a UK buyer should do now
Do not bet a near-term cluster on a standard that is not yet shipping in the form you need - but do build with the transition in mind. For an AI cluster being procured today, weigh InfiniBand's proven maturity against the open standards' trajectory, and favour designs and suppliers that keep an Ethernet-based path open, because the momentum and ecosystem behind Ultra Ethernet make it the likeliest long-term mainstream. For scale-up links, watch UALink's maturation as the open alternative to proprietary accelerator fabrics.
The honest near-term answer is workload- and scale-dependent, and the safest posture is to avoid unnecessary lock-in while the standards settle. We design AI fabrics with that openness in mind - read the build fundamentals in building your first UK on-prem AI cluster and choose the NICs with our network cards guidance.