Renting GPUs by the hour is the obvious way to start with AI, and often the right one. But once a workload runs steadily, the meter never stops - and owning the same hardware, financed and powered, can cost markedly less while leaving you with an asset. This guide sets out the real comparison: what each side actually costs, where the break-even falls, and how utilisation decides the answer. Model your own numbers in the AI/GPU calculator.
The two cost models, honestly stated
Cloud GPUs are pure operating cost: you pay a per-GPU-hour rate for exactly the time you use, and when you stop, you own nothing. Self-hosting is a capital cost - the servers, GPUs, networking and rack - usually financed into a monthly payment, plus the power and cooling to run them. The fair comparison is therefore owned monthly (finance plus power) against cloud monthly (rate times hours times how much of the time you actually run).
That last factor, utilisation, is what most cloud-is-cheaper and on-prem-is-cheaper arguments quietly assume. Renting wins when GPUs would sit idle; owning wins when they run.
What owning actually costs
Take a Llama-70B-class build: four H100s serving a busy internal assistant. Bought outright that is roughly £158,000 of hardware; financed over five years on hire purchase it is about £3,400 a month, after which the kit is yours. Running it draws around 5.4 kW at the servers - call it 7.6 kW once the data-centre cooling overhead is included - which at UK commercial rates adds roughly £1,400 a month in electricity.
So the owned position is about £4,800 a month all-in, for hardware you keep. Every figure here is indicative; the IT finance calculator prices the monthly for your term and structure, and the AI/GPU calculator derives the power and cooling from the exact build.
What the cloud actually costs
The same four H100s on-demand, at a representative UK rate of a little over two pounds per GPU-hour, cost about £6,500 a month running around the clock, or roughly £3,900 a month at 60% utilisation. A one-year reserved commitment cuts the on-demand rate by around 40%, which is the cloud strongest answer - but it locks you in and you still own nothing at the end.
This is why the honest verdict depends on how hard you run the GPUs. At low, bursty utilisation the cloud is genuinely cheaper and far more flexible. At high, sustained utilisation - a production service running most of the day - owning pulls clearly ahead.
Where the break-even falls
For a steadily-used build, a cash purchase pays back against on-demand cloud in roughly two to three years, after which the owned hardware runs for the price of its power alone while the cloud bill keeps arriving. Financed, the owned monthly sits below on-demand cloud from the first month at high utilisation, so you are cash-flow positive immediately and own an asset at the end.
The crossover moves with utilisation, electricity price and the cloud rate you can secure - all of which you can vary in the calculator. Our cloud vs on-premise TCO calculator takes the same question across your wider infrastructure.
A practical way to decide
A sensible pattern is to prototype and burst in the cloud, then repatriate the steady baseline onto owned hardware once utilisation is predictable and high - keeping the cloud for spikes. That captures the cloud flexibility where it matters and the ownership saving where it counts.
When you are ready to price the owned option, the build the calculator produces - GPUs, servers, networking, power and cooling - becomes a real quotation, financed if you wish, through our NVIDIA DGX and GPU server ranges. To size the memory side first, see how much VRAM an LLM needs.