A VDI host is not a general virtualisation host with desktops bolted on. The economics live and die on users-per-host, and that number is set by three things working together: how much frame buffer each session needs, how much RAM you can pack in, and whether you put a GPU in the box at all. Spec one dimension wrong and your cost-per-seat doubles. This is the build framework our engineers use when we configure a VDI host for UK customers.
Start from the user profile, not the hardware
Every VDI sizing exercise begins with honest user profiling. A task worker living in a browser and Office is a completely different load to a CAD engineer in SolidWorks or a data analyst in Power BI. Citrix and Omnissa (formerly VMware Horizon) both publish profile bands - knowledge, power, and designer - and the gap between them is enormous: a task worker might need 2-3GB RAM and no dedicated GPU, while a designer wants 16-32GB and a healthy slice of a professional GPU.
Get the profile right and everything else follows. Get it wrong - by averaging a mixed estate into a single number - and you either starve your power users or massively over-buy for your task workers. Segment the estate into two or three profiles and size pools separately.
Decide whether you need a GPU at all
Modern hypervisors offload basic 2D and video rendering reasonably well in software, so a pure task-worker estate can often run CPU-only. The moment you add Teams or Zoom video, multi-monitor 4K, or any graphical application, a GPU stops being a luxury and becomes the thing that makes sessions feel like a real PC.
The mainstream choice for density VDI in 2026 is the NVIDIA L40S - a single-slot-friendly, datacentre GPU with 48GB of GDDR6 that vGPU software carves into per-user profiles. You buy the GPU once and amortise it across every session on the host, which is why GPU-backed VDI often beats CPU-only on cost-per-seat once graphics enter the picture.
Frame buffer is the real density lever
With vGPU, each session is allocated a fixed slice of the GPU's frame buffer (its VRAM). That allocation - not raw GPU horsepower - is usually what caps users-per-card. A 48GB L40S split into 2GB profiles theoretically hosts up to 24 sessions; split into 4GB profiles for heavier graphical work, you get 12. The profile size must match the application, the monitor count and the resolution.
Over-allocate frame buffer and you waste density; under-allocate and applications crash or fall back to software rendering. Size the vGPU profile to the heaviest realistic session in the pool, validate it with a pilot group, then standardise.
- •Task/knowledge worker: 1-2GB vGPU profile, 8-16 sessions per L40S
- •Power worker (multi-monitor, video): 2-4GB profile, 12 or fewer sessions
- •Designer/CAD: 4-8GB profile, fewer sessions, validate per application
- •Match profile to resolution and monitor count - 4K and multi-monitor eat frame buffer fast
RAM density: the other hard ceiling
VDI is memory-hungry in aggregate. Multiply your per-session RAM by the sessions you want on the host, add 20-25% for the hypervisor and overhead, and you quickly land on a host needing 1TB or more. That is what pushes VDI hosts towards high-capacity DDR5 configurations and, often, the larger memory footprint of a dual-socket platform.
Populate every memory channel evenly so you keep full DDR5 bandwidth - an unbalanced config silently throttles every session on the box. Where capacity forces two DIMMs per channel, accept the small clock drop knowingly; on a density VDI host, capacity usually wins over the last few percent of latency. Plan the population carefully with our server memory guidance.
Balance the host: GPU, CPU, RAM and network together
A VDI host is only as fast as its slowest dimension. There is no point packing 48 sessions onto a GPU if the CPU cannot encode that many display streams or the host runs out of RAM at 30. Aim for a balanced build where GPU frame buffer, CPU core count, memory capacity and network all top out at roughly the same user count.
Networking matters more than people expect: every session is a live, latency-sensitive display protocol stream, so 2x 25GbE is a sensible mainstream baseline and dual PSUs plus out-of-band management are non-negotiable for a host whose failure logs off a hundred users at once. Choose the L40S and the GPU-capable chassis with our GPU accelerators guidance, read the detail on the NVIDIA L40S, and build the exact host in our configurator.