Chips & cluster

GPUs, the blueprint & the fabric.

GPUs become a cluster through the reference design and the network fabric that join them into one machine. We work all three fronts: securing the allocation, fitting the validated architecture to your site, and engineering the fabric.

ALLOCATED SILICON · NON-BLOCKING FABRIC

Map your cluster →

What this layer is

What joins GPUs into one machine.

Securing the accelerators is the visible challenge: current-generation allocation is rationed. Just as consequential are the reference design, the validated architecture for how chips, memory, power and cooling fit together, and the fabric, the network that lets every GPU reach every other one.

Together they determine what the silicon delivers. An oversubscribed fabric leaves expensive GPUs waiting on each other, and an unvalidated design fails the diligence a lender or tenant runs before signing.

The scale of a modern cluster

What one rack now holds.

72 GPUs

fused into a single rack-scale system by NVLink in a current Nvidia reference design, addressed by software as one accelerator.

400 Gb/s

per-port bandwidth on current-generation InfiniBand, the speed that keeps thousands of GPUs from stalling on each other.

1:1

the non-blocking ratio a training fabric is built to hold: every GPU reaching every other at full line rate.

After the allocation

What the silicon still needs.

Between purchased GPUs and a running cluster sit four more decisions.

A validated reference design

Vendors publish reference architectures: the tested recipe for how GPUs, NVLink, fabric, power and cooling fit at rack and row scale. Untested deviations surface as problems under full load.

A fabric matched to the workload

InfiniBand or Ethernet with RoCE, non-blocking or oversubscribed, rail-optimized or not: the right fabric depends on whether you are training frontier models or serving inference.

An allocation you can count on

Quantities, generations and delivery timing have to be pinned down in writing before the rest of the build commits to a date, which usually takes an existing vendor relationship.

The envelope the design assumes

A rack-scale system can draw over 100 kW and increasingly expects liquid cooling. The reference design assumes an envelope your site has to meet, which ties this layer straight back to power and retrofit.

Why it is hard to assemble alone

Buying GPUs is easy. Making them one machine is not.

Allocation is gated

Current-generation allocation is rationed and relationship-driven. The vendors prioritize buyers they know, at volumes and on timelines a first-time buyer rarely commands alone.

The fabric is a design problem

Topology, cabling, rail assignment and congestion control decide whether your GPUs run near peak or wait on the network. The box vendors do not do this engineering for you.

Every layer has to agree

The reference design only holds if power, cooling, floor loading and the fabric all match its assumptions. A single mismatch invalidates it, and tends to surface late in the build.

What we bring

An allocation that becomes a running cluster.

Access to the silicon: the vendor relationships that move you up the allocation queue for the generation you need.
A reference design fit to your site: the validated architecture, adapted to the power, cooling and space you hold.
A fabric matched to the workload: InfiniBand or RoCE Ethernet, sized non-blocking where training demands it, costed and sequenced with the rest of the build.
One coherent team across chips, design and fabric, sequenced so the cluster comes up on schedule instead of stalling in the gaps between vendors.

Pressure-test your build →

ALLOCATED · VALIDATED · WIRED

Who we bring to this layer

The names that build the cluster.

Silicon, reference designs and the fabric are partner work. We hold the vendor relationships and keep their pieces aligned with the rest of the build.