The Conversation Is Shifting from GPUs to Memory
For the past several years, the narrative around AI infrastructure costs has been dominated by a single topic: Nvidia GPUs. The scarcity, pricing, and allocation of graphics processing units have driven headlines, investment decisions, and corporate strategy across the technology industry. But a quieter shift is underway in how the industry thinks about AI infrastructure economics. Increasingly, memory, not processing power, is emerging as the binding constraint on AI system performance and cost.
The dynamic makes intuitive sense when you examine how modern AI models actually operate. A large language model does not simply compute answers. It must hold vast amounts of data in active memory, accessible at extremely high speeds, to process each request. The model's weights, the numerical parameters that encode its knowledge and capabilities, must be loaded into memory before inference can begin. For frontier models with hundreds of billions or even trillions of parameters, the memory required to hold these weights dwarfs what conventional computing systems were designed to provide.
High Bandwidth Memory: The Critical Component
The specific type of memory that has become central to AI infrastructure is High Bandwidth Memory, known as HBM. Unlike the standard DRAM found in consumer computers, HBM stacks multiple layers of memory chips vertically and connects them with an extremely wide data bus, enabling data transfer rates that are orders of magnitude faster than conventional memory. This speed is essential because AI accelerators like Nvidia's H100 and H200 GPUs can process data far faster than standard memory can deliver it. Without HBM, these processors would spend most of their time waiting for data, rendering their computational capabilities largely useless.
HBM is physically bonded to the AI accelerator using advanced packaging techniques, creating an integrated module where memory and processing are tightly coupled. This integration provides the bandwidth needed for AI workloads but also creates a supply chain dependency: every AI accelerator shipped requires a corresponding allocation of HBM, and the production capacity for HBM is concentrated among just three manufacturers globally.








