Before the Apollo name became associated with AI and GPUs, its purpose was high-performance computing: packing as many CPU cores as possible into a rack and wiring them together with a fast interconnect so they could work as one large machine. That HPC role is alive and well for UK research groups, engineering and simulation teams, and anyone running tightly-coupled parallel workloads. This is a look at the Apollo as an HPC compute building block, the cores, the fabric that joins them, and the power and cooling density that determines how much you can actually fit in a rack, rather than as an AI or GPU platform.
HPC is about cores per rack, joined as one
High-performance computing solves large problems by splitting them across many CPU cores running in parallel, from weather and fluid-dynamics simulation to molecular modelling and engineering analysis. The unit of value is therefore aggregate cores, and how efficiently they can be packed and powered. The Apollo HPC design exists to maximise that: a shared multi-node chassis holds several densely-built compute nodes, sharing power and cooling infrastructure to fit far more cores per rack than discrete 1U servers would.
That density is the dividing line from a general-purpose box. A standard rack server is provisioned to stand alone; an Apollo HPC node is one of many in a chassis, stripped to compute and engineered for population at scale. It is the same family logic as the rest of the HPE Apollo range, applied to CPU cores rather than disks or GPUs, and distinct from how a standalone HPE server is specified.
Choosing the processors
In an HPC node the processor choice is the heart of the decision, and it is not always more cores at any cost. Tightly-coupled parallel codes care about the balance of core count, per-core clock and, very often, memory bandwidth, because many scientific workloads are limited by how fast data reaches the cores rather than by compute alone. High-core-count parts from modern Intel Xeon and AMD EPYC lines suit throughput-heavy, embarrassingly-parallel work; higher-clock parts can win for codes sensitive to per-core speed.
Memory bandwidth deserves particular attention, because an HPC node that is starved of memory throughput wastes the cores you paid for. Populating all memory channels evenly to reach the platform's full bandwidth is as important as the core count itself. Matching the processor and memory layout to the specific class of code is exactly the kind of decision we work through using our processors guidance rather than defaulting to the biggest part on the list.
- •Aggregate cores per rack is the HPC unit of value, not single-node spec
- •Balance core count against per-core clock for the workload class
- •Memory bandwidth often limits scientific codes more than raw cores
- •Populate every memory channel evenly to reach full platform bandwidth
The interconnect is half the machine
A cluster is only as good as the network joining its nodes, because parallel codes constantly exchange intermediate results, and the speed of that exchange often determines overall performance more than any single node's power. This is the interconnect, and for HPC it generally takes one of two forms. InfiniBand offers very high bandwidth and very low latency and has long been the default for tightly-coupled, latency-sensitive parallel work. High-speed Ethernet, increasingly with RDMA, is a capable and often more familiar alternative, particularly where workloads are less latency-bound or where an organisation prefers a single network technology.
The choice depends on how tightly coupled the codes are: latency-critical MPI workloads lean toward InfiniBand, while more loosely-coupled or throughput-oriented jobs can run very well on RDMA Ethernet. Either way, the fabric is a first-class design decision, not an add-on, and the head node and storage nodes hang off that same fabric. We size the interconnect alongside the compute, drawing on our processors and platform expertise so the network does not become the bottleneck the cores then wait on.
Power and cooling set the real limit
The reason density is an engineering decision and not just a marketing number is that power and cooling, not physical space, are usually what cap how many nodes you can run. Packing many high-core-count CPUs into a chassis concentrates a great deal of power, and therefore heat, into a small footprint, and a rack can only deliver and remove so much.
So an honest HPC design works within a power-and-cooling envelope: how many kilowatts the rack can supply, and how that heat is removed, whether by high-airflow cooling or, at higher densities, by liquid cooling. The shared-infrastructure chassis is efficient precisely because it pools power and cooling across nodes, but the rack-level budget still sets the ceiling. Planning the node-in-chassis-in-rack power envelope up front is part of designing a workable Apollo cluster rather than discovering the limit after installation.
Building the cluster
A complete HPC cluster is more than compute nodes: it is the dense Apollo compute, joined by the chosen interconnect, served by a head or login node that schedules and manages jobs, and backed by storage nodes that feed data to the computation. The Apollo provides the compute building block at the centre of that picture, and the design exercise is sizing each part so they are in proportion, enough storage bandwidth and network capacity to keep the cores busy.
For UK research and engineering teams, that makes the Apollo a natural foundation for on-premises HPC where data residency, sustained utilisation or cost favour owning the cluster over renting cloud capacity. We design the whole system, compute, fabric, head and storage nodes, within the power envelope, drawing the compute from the HPE Apollo range and the processors from our processors guidance, alongside the rest of our HPE servers portfolio.