UK’s trusted IT infrastructure partner since 2003
sales@servnetuk.com
0800 987 4111
Servnet
ConfiguratorGet in Touch
Computing Components · Network Interface Cards

NICs — 25GbE to 400GbE, InfiniBand & RDMA
Complete buyer guide & technology reference

Choosing the wrong NIC is one of the most common causes of unexplained performance bottlenecks in AI and storage infrastructure. RDMA, GPUDirect, SR-IOV, and InfiniBand each solve different problems. This guide explains what each technology actually does and which NIC to choose for your workload before looking at part numbers.

Get NIC Quote →Fibre Channel HBAs →
400GbE
Max Ethernet speed — ConnectX-7 single port
~1–3µs
RDMA latency vs ~50µs kernel TCP
PCIe 5.0
ConnectX-7 interface — 128 GB/s slot bandwidth
GPUDirect
Direct GPU memory ↔ NIC transfers
SR-IOV
Up to 256 virtual functions per physical NIC
NVMe-oF
Sub-20µs remote NVMe storage over RDMA
Key Technologies

NIC Technologies Explained

These are the technologies that separate an AI infrastructure NIC from a standard Ethernet card — understanding them is essential before selecting a model.

RDMA / RoCE
Remote Direct Memory Access over Converged Ethernet

RDMA allows one server to directly read from or write to the memory of another server over the network, completely bypassing the OS kernel and CPU on both ends. The receiving CPU is not involved in the data transfer at all. This eliminates kernel processing overhead and reduces latency from ~50µs (kernel TCP) to ~1–3µs (RDMA). RoCEv2 runs RDMA over standard Ethernet, requiring only a RoCE-capable NIC on each end and a lossless Ethernet fabric (PFC/ECN enabled on switches). GPUDirect RDMA extends this to allow direct GPU-to-GPU transfers without copying data through system RAM.

🔀
InfiniBand vs Ethernet
Two distinct fabrics for HPC and AI

InfiniBand (IB) is a purpose-built high-performance fabric historically used in HPC supercomputers. It provides inherently lossless transport, native RDMA, and very low latency (sub-200ns). NVIDIA ConnectX-7 supports InfiniBand NDR (400 Gb/s) as well as Ethernet. Modern Ethernet with RoCEv2 and PFC/ECN approaches InfiniBand's performance for AI workloads, and offers lower switch costs, easier integration with existing network infrastructure, and more flexible topology options. Most enterprise AI deployments use RoCEv2 over Ethernet; pure HPC supercomputers often use InfiniBand for the lowest possible MPI latency.

🏗️
SR-IOV
Single Root I/O Virtualisation

SR-IOV allows a single physical NIC to present multiple "virtual functions" (VFs) directly to virtual machines or containers, each with dedicated hardware queues. Without SR-IOV, all VMs share the physical NIC through the hypervisor — the hypervisor copy overhead adds latency. With SR-IOV, a VM with a VF assignment can achieve near-native NIC performance. Essential for latency-sensitive VM networking (NFV, telecom, real-time databases). Supported on NVIDIA ConnectX series, Intel E810, and Broadcom P225P.

🎮
GPUDirect RDMA
GPU-to-GPU network transfers

GPUDirect RDMA (GDR) allows the NIC to transfer data directly between GPU memory and the network, bypassing system RAM entirely. In a GPU training cluster, GDR means gradient updates can be sent directly from GPU memory to the network buffer without the CPU first copying them into RAM. This significantly reduces latency for distributed training (NCCL all-reduce operations). GDR requires an NVIDIA ConnectX NIC and NVIDIA GPU on the same PCIe segment, plus the GDR kernel module. All NVIDIA DGX systems include GDR-capable ConnectX NICs.

🛠️
DPDK
Data Plane Development Kit

DPDK is a set of libraries that allow network applications to process packets directly in userspace, bypassing the Linux kernel network stack. Instead of the kernel handling each packet interrupt, the application polls the NIC directly. This eliminates kernel overhead and enables millions of packets per second (Mpps) on commodity servers. DPDK is mandatory for telco NFV/vRAN, software-defined networking (Open vSwitch), and high-frequency trading applications. Supported on Intel E810, NVIDIA ConnectX, and Broadcom NICs.

💾
NVMe-oF over RDMA
Remote NVMe storage at local latency

NVMe-oF (NVMe over Fabrics) uses RDMA to extend the NVMe block storage protocol over a network. The server NIC acts as both an NVMe-oF initiator (connecting to remote storage) and optionally a target (serving local NVMe drives to other servers). This enables disaggregated storage — NVMe SSDs in storage servers are accessed by compute servers at latencies of 10–20µs, compared to 1–3ms for iSCSI or FC-SCSI. Pure Storage FlashArray, NetApp AFF, and Dell PowerStore support NVMe-oF. Requires RoCEv2-capable NICs (ConnectX-6 Dx or newer).

Server NIC speed ladder — 1 GbE through 800 GbE — use cases and physical interfaces per speed tier

Ethernet Speed Tiers — When to Upgrade

Each tier serves different workloads. Jumping directly to 400GbE when 25GbE is adequate wastes substantial budget — and conversely, under-specifying creates bottlenecks that no amount of CPU or GPU can overcome.

SpeedBandwidthTypical DeploymentPrimary Use CasesRecommendation
10GbE~1.25 GB/sLegacy servers (pre-2019)Basic connectivity — insufficient for all-flash NVMe storage or AI networking. Upgrade candidate.Migrate to 25GbE
25GbE~3.1 GB/s per portMainstream application serversVMware NSX, web/app tier, containerised workloads, edge compute. Cost-effective with SFP28 cabling.Standard choice
100GbE~12.5 GB/s per portStorage, AI inference, HPCNVMe-oF storage networking, GPU inference servers, vSAN backbone, high-throughput east-west.AI/storage servers
200GbE / NDR100 IB~25 GB/s per portDense GPU clustersMulti-GPU training cluster interconnect — NVIDIA DGX H100 uses 2× 200GbE ConnectX-7 per node.Training clusters
400GbE / NDR IB~50 GB/s per portHyperscale AI, HPCNext-generation AI fabric — DGX B200 and future AI supercomputer nodes. Highest bandwidth available.Future-proof AI
Selection Guide

Which NIC for Which Workload?

NIC selection depends on network technology requirements, not just speed. AI training and NVMe-oF storage have fundamentally different requirements from standard virtualisation networking.

WorkloadRecommendedReasoningNIC Model
VMware ESXi / NSX-T Virtualisation25GbE (dual-port)vMotion, vSAN, and NSX overlay (VXLAN) traffic is well-served by 25GbE. SR-IOV VF assignment for latency-sensitive VMs. Dual-port for path redundancy (active-active LACP).Intel E810 or Broadcom P225P
AI Training (GPU cluster, NCCL)400GbE or 200GbENCCL all-reduce bandwidth is the primary bottleneck in distributed training — more NIC bandwidth = faster gradient synchronisation across GPUs. GPUDirect RDMA essential.NVIDIA ConnectX-7
AI Inference Server100GbE (dual-port)Inference serving needs sufficient ingress bandwidth for request batching and egress for responses. 2× 100GbE with RoCEv2 is ample for most inference throughput requirements.NVIDIA ConnectX-6 Dx
NVMe-oF Storage Networking100GbE RDMANVMe-oF requires a lossless RDMA fabric. ConnectX-6 Dx supports NVMe-oF initiator and target simultaneously. 100GbE bandwidth matches NVMe SSD throughput aggregates.NVIDIA ConnectX-6 Dx
Telco NFV / 5G vRAN25GbE with DPDKTelco packet processing requires kernel-bypass DPDK and precision PTP hardware timestamps (IEEE 1588). Intel E810 ADQ (Application Device Queues) provides deterministic low latency for RAN workloads.Intel E810
HPC Cluster (MPI)InfiniBand (ConnectX-7 IB mode)MPI latency determines cluster efficiency. NDR200 InfiniBand mode provides sub-200ns latency vs ~1–3µs for RoCE — significant for tightly-coupled HPC codes with frequent MPI barriers.NVIDIA ConnectX-7 (NDR200 mode)
Product Specifications

Enterprise NIC Specifications

All specifications from official vendor datasheets. Contact Servnet for UK stock availability, pricing, and platform compatibility confirmation.

🌐
NVIDIA (Mellanox) · 400GbE / NDR200 IB
MCX75310AAS-NEAT

NVIDIA ConnectX-7 Single-Port 400GbE / NDR200 InfiniBand

Speed400GbE single-port or NDR200 InfiniBand 200Gb/s
Ports1× QSFP112
InterfacePCIe 5.0 x16
RDMARoCEv2 · iWARP · InfiniBand RDMA
FeaturesSR-IOV · DPDK · GPUDirect RDMA · NVMe-oF Target
OffloadsVXLAN · NVGRE · GRE · TCP/UDP checksum · TSO/LRO
CryptoIPsec · TLS 1.3 inline at line rate
OS SupportLinux (MLNX_OFED) · Windows Server · VMware ESXi
Best for: AI fabric for GPU servers — GPUDirect RDMA for direct GPU-to-GPU memory transfers. NDR200 IB mode for HPC clusters. Standard NIC in NVIDIA DGX and Supermicro GPU SuperServers.
🌐
NVIDIA (Mellanox) · 2× 100GbE
MCX623106AN-CDAT

NVIDIA ConnectX-6 Dx Dual-Port 100GbE

Speed2× 100GbE
Ports2× QSFP56
InterfacePCIe 4.0 x16
RDMARoCEv2 · iWARP
FeaturesSR-IOV · DPDK · GPUDirect RDMA · NVMe-oF Initiator/Target
OffloadsVXLAN · NVGRE · TCP/UDP checksum · TSO/LRO · IPsec
CryptoInline IPsec and TLS 1.3 crypto acceleration
CompatibleDell PowerEdge R760 · HPE DL380 Gen11 · Lenovo SR650 V3
Best for: Standard 100GbE for AI/ML servers — dual 100GbE with RDMA and GPUDirect for GPU-accelerated storage access and AI inference networking.
🌐
NVIDIA (Mellanox) · 2× 100GbE
MCX516A-CCAT

NVIDIA ConnectX-5 Dual-Port 100GbE

Speed2× 100GbE
Ports2× QSFP28
InterfacePCIe 3.0 x16
RDMARoCEv2 · iWARP · InfiniBand EDR
FeaturesSR-IOV · DPDK · GPUDirect RDMA · NVMe-oF
OffloadsVXLAN · NVGRE · TCP/UDP checksum · TSO/LRO
CompatibleExisting PCIe 3.0 server estates · Dell/HPE/Lenovo broad support
Best for: Previous-generation 100GbE — widely deployed in existing estates for vSAN, storage replication, and east-west traffic. Cost-effective upgrade from 25GbE.
🌐
Intel · 4× 25GbE
E810-XXVDA4

Intel E810-XXVDA4 Quad-Port 25GbE

Speed4× 25GbE
Ports4× SFP28
InterfacePCIe 4.0 x16
FeaturesDPDK · SR-IOV · Flow Director · ADQ (Application Device Queues)
OffloadsVXLAN · NVGRE · GRE · timestamp · checksum · TSO
Precision TimingIEEE 1588 PTP hardware timestamping
OS SupportLinux · Windows Server · DPDK · RDMA/iWARP
CompatibleDell PowerEdge · HPE ProLiant · Lenovo ThinkSystem
Best for: Telco / edge 5G deployments and high-density Kubernetes networking — ADQ and PTP hardware timestamping for time-sensitive applications. 4 ports in a single slot for multi-tenant environments.
🌐
Broadcom · 2× 25GbE
BCM957412A4120AC

Broadcom P225P Dual-Port 25GbE

Speed2× 25GbE
Ports2× SFP28
InterfacePCIe 4.0 x8
FeaturesSR-IOV · DPDK · RoCEv2
OffloadsTCP/UDP checksum · TSO · LRO · VXLAN
VLANHardware VLAN tagging · 4094 VLANs
OS SupportLinux · Windows Server · VMware ESXi
CompatibleDell PowerEdge · HPE ProLiant (OEM-qualified BroadCom NIC)
Best for: Cost-effective 25GbE for mainstream servers — HPE and Dell OEM-qualified option for standard virtualisation and application server networking.
Common Questions

Frequently Asked Questions

Q: What is the difference between RoCEv1 and RoCEv2?
RoCEv1 runs RDMA directly over Ethernet Layer 2 — it cannot be routed between VLANs or subnets, limiting it to a single broadcast domain. RoCEv2 encapsulates RDMA inside UDP/IP, allowing it to be routed across Layer 3 networks. All modern RDMA deployments use RoCEv2. Both require lossless Ethernet fabric configuration (Priority Flow Control — PFC, and Explicit Congestion Notification — ECN) on the switches to prevent RDMA connection drops from packet loss.
Q: Does my switch need to support RDMA for RoCE to work?
Not directly — switches do not need to understand RDMA. However, RoCE requires a lossless Ethernet fabric because RDMA connections are broken by packet drops (unlike TCP which retransmits). This means PFC (Priority Flow Control) and ECN (Explicit Congestion Notification) must be configured on the switches to signal and prevent congestion before packet drops occur. Standard unmanaged or consumer switches will not work for RDMA networking. Enterprise switches (Arista, Cisco Nexus, NVIDIA Spectrum) support PFC/ECN configuration.
Q: When should I use InfiniBand instead of Ethernet?
InfiniBand NDR is worth considering when MPI latency is critical — specifically for tightly-coupled HPC simulations where every MPI_Barrier adds to wall-clock time. InfiniBand's hardware-managed flow control provides inherently lossless transport without PFC/ECN tuning. For AI training (NCCL workloads), well-configured 400GbE RoCEv2 achieves comparable bandwidth with lower switch cost and more flexible topology options. Most enterprise AI deployments choose 400GbE over InfiniBand due to total cost of ownership.
Q: What is a SmartNIC / DPU and when do I need one?
A SmartNIC (or DPU — Data Processing Unit) is a NIC with an embedded ARM/RISC-V processor and local memory that can offload network functions from the host CPU. Examples include NVIDIA BlueField-3 and Marvell OCTEON. Use cases: offloading IPsec encryption/decryption (freeing host CPU cycles), NFV packet processing, hardware-enforced security isolation in multi-tenant cloud environments, and network storage (NVMe-oF target offload). For most enterprise workloads, a standard ConnectX NIC with hardware offloads is sufficient — SmartNICs are primarily for cloud providers and telco.
Q: How many NICs does a GPU server need?
It depends on the GPU count and network topology. An NVIDIA DGX H100 has 8× H100 GPUs and includes 2× ConnectX-7 200GbE NICs (one for compute fabric, one for storage). A Supermicro 8-GPU server for training typically uses 2× ConnectX-7 for the NCCL compute fabric and 2× 100GbE for storage access. For single-node inference servers without multi-node training, 1× dual-port 100GbE NIC is generally sufficient for most throughput requirements.

Related Products & Infrastructure

Need NIC recommendations for your infrastructure?

We advise on NIC selection for AI training clusters, NVMe-oF storage fabrics, and virtualisation networking — matching speed, RDMA capability, and switch requirements to your workload.

Request NIC Configuration Advice