Server Infrastructure · Hardware

HPE Apollo as an HPC cluster building block: cores, interconnect and density

Servnet Editorial · Server Infrastructure Practice13 January 202612 min read

Before the Apollo name became associated with AI and GPUs, its purpose was high-performance computing: packing as many CPU cores as possible into a rack and wiring them together with a fast interconnect so they could work as one large machine. That HPC role is alive and well for UK research groups, engineering and simulation teams, and anyone running tightly-coupled parallel workloads. This is a look at the Apollo as an HPC compute building block, the cores, the fabric that joins them, and the power and cooling density that determines how much you can actually fit in a rack, rather than as an AI or GPU platform.

Apollo HPC — chassis to fabric to nodes

HPC is about cores per rack, joined as one

High-performance computing solves large problems by splitting them across many CPU cores running in parallel, from weather and fluid-dynamics simulation to molecular modelling and engineering analysis. The unit of value is therefore aggregate cores, and how efficiently they can be packed and powered. The Apollo HPC design exists to maximise that: a shared multi-node chassis holds several densely-built compute nodes, sharing power and cooling infrastructure to fit far more cores per rack than discrete 1U servers would.

That density is the dividing line from a general-purpose box. A standard rack server is provisioned to stand alone; an Apollo HPC node is one of many in a chassis, stripped to compute and engineered for population at scale. It is the same family logic as the rest of the HPE Apollo range, applied to CPU cores rather than disks or GPUs, and distinct from how a standalone HPE server is specified.

Choosing the processors

In an HPC node the processor choice is the heart of the decision, and it is not always more cores at any cost. Tightly-coupled parallel codes care about the balance of core count, per-core clock and, very often, memory bandwidth, because many scientific workloads are limited by how fast data reaches the cores rather than by compute alone. High-core-count parts from modern Intel Xeon and AMD EPYC lines suit throughput-heavy, embarrassingly-parallel work; higher-clock parts can win for codes sensitive to per-core speed.

Memory bandwidth deserves particular attention, because an HPC node that is starved of memory throughput wastes the cores you paid for. Populating all memory channels evenly to reach the platform's full bandwidth is as important as the core count itself. Matching the processor and memory layout to the specific class of code is exactly the kind of decision we work through using our processors guidance rather than defaulting to the biggest part on the list.

•Aggregate cores per rack is the HPC unit of value, not single-node spec
•Balance core count against per-core clock for the workload class
•Memory bandwidth often limits scientific codes more than raw cores
•Populate every memory channel evenly to reach full platform bandwidth

The interconnect is half the machine

A cluster is only as good as the network joining its nodes, because parallel codes constantly exchange intermediate results, and the speed of that exchange often determines overall performance more than any single node's power. This is the interconnect, and for HPC it generally takes one of two forms. InfiniBand offers very high bandwidth and very low latency and has long been the default for tightly-coupled, latency-sensitive parallel work. High-speed Ethernet, increasingly with RDMA, is a capable and often more familiar alternative, particularly where workloads are less latency-bound or where an organisation prefers a single network technology.

The choice depends on how tightly coupled the codes are: latency-critical MPI workloads lean toward InfiniBand, while more loosely-coupled or throughput-oriented jobs can run very well on RDMA Ethernet. Either way, the fabric is a first-class design decision, not an add-on, and the head node and storage nodes hang off that same fabric. We size the interconnect alongside the compute, drawing on our processors and platform expertise so the network does not become the bottleneck the cores then wait on.

Node-in-chassis-in-rack power envelope

View the data behind this chart

Node-in-chassis-in-rack power envelope
Layer	Detail
Compute node	Cores + balanced memory channels
Shared chassis	Pooled power + cooling
Rack budget	kW supply + heat removal cap
Cooling method	High airflow or liquid at density

Power and cooling set the real limit

The reason density is an engineering decision and not just a marketing number is that power and cooling, not physical space, are usually what cap how many nodes you can run. Packing many high-core-count CPUs into a chassis concentrates a great deal of power, and therefore heat, into a small footprint, and a rack can only deliver and remove so much.

So an honest HPC design works within a power-and-cooling envelope: how many kilowatts the rack can supply, and how that heat is removed, whether by high-airflow cooling or, at higher densities, by liquid cooling. The shared-infrastructure chassis is efficient precisely because it pools power and cooling across nodes, but the rack-level budget still sets the ceiling. Planning the node-in-chassis-in-rack power envelope up front is part of designing a workable Apollo cluster rather than discovering the limit after installation.

Building the cluster

A complete HPC cluster is more than compute nodes: it is the dense Apollo compute, joined by the chosen interconnect, served by a head or login node that schedules and manages jobs, and backed by storage nodes that feed data to the computation. The Apollo provides the compute building block at the centre of that picture, and the design exercise is sizing each part so they are in proportion, enough storage bandwidth and network capacity to keep the cores busy.

For UK research and engineering teams, that makes the Apollo a natural foundation for on-premises HPC where data residency, sustained utilisation or cost favour owning the cluster over renting cloud capacity. We design the whole system, compute, fabric, head and storage nodes, within the power envelope, drawing the compute from the HPE Apollo range and the processors from our processors guidance, alongside the rest of our HPE servers portfolio.

Key takeaways

✓Apollo HPC nodes maximise cores per rack via a shared multi-node chassis, not standalone servers.
✓Processor choice balances cores, clock and memory bandwidth to the code class - not just maximum cores.
✓The interconnect (InfiniBand or RDMA Ethernet) is half the machine for tightly-coupled parallel work.
✓Power and cooling, not floor space, usually cap how many dense nodes a rack can run.
✓A full cluster pairs Apollo compute with a head node and storage nodes, all sized in proportion.

Frequently asked

FAQs — HPE Apollo as an HPC cluster building block

Compute

How does an Apollo HPC node differ from a standard server?

An Apollo HPC node is one of several in a shared chassis, stripped to compute and engineered to pack maximum cores per rack with pooled power and cooling. A standard server stands alone. It is the same density logic as the wider HPE Apollo family applied to CPU cores.

How do I choose processors for an HPC node?

Balance core count, per-core clock and memory bandwidth to the workload - many scientific codes are bandwidth-limited rather than compute-limited, so populate every memory channel. We match the processor to the code class using our processors guidance.

Cluster design

InfiniBand or Ethernet for an Apollo HPC cluster?

InfiniBand suits latency-critical, tightly-coupled MPI work with its very low latency; high-speed RDMA Ethernet suits more loosely-coupled or throughput-oriented jobs and single-network estates. The fabric is a first-class decision we size alongside the compute and the rest of the HPE servers build.

What limits how many Apollo nodes fit in a rack?

Power and cooling, not space. Dense CPUs concentrate heat, so the rack power budget and cooling method (high airflow or liquid) set the ceiling. We plan the node-in-chassis-in-rack envelope up front when designing an Apollo cluster.

HPE Apollo →Server processors →HPE servers →HPE Apollo vs standard rack server density: the break-even analysis →Building your first UK on-prem AI cluster: step-by-step →

Got a question this article didn't answer?

One conversation with an engineer who's done this before. No sales script.

Talk to Servnet →

Talk to a UK specialist

Get expert advice or a no-obligation quote — servers, storage, networking, maintenance, finance and cloud. We reply the same working day.

or call 0800 987 4111