Posted On06-04-2024 07:38 AM

Inside the AI Compute Gold Rush: How Compute Power is Shaping AI’s Future

Posted by Gabriele Farei

The Trials of the Compute Gold Rush
The Opportunities of the Compute Gold Rush
Sources

In the past 12-18 months, we’ve witnessed remarkable advancements in AI performance as new models have been released [1]. However, what’s less visible is the staggering amount of compute power required to create and operate these models, which has been growing at an exponential rate.

It is always hard to wrap your head around exponential growth, so here are a few figures and anecdotes to understand this better.

By 2021, only a few models had surpassed the milestone of 10^23 FLOP of estimated training compute (with GPT-3 being one of them). Fast forward three years, and now over 77 models exceed that threshold [2].These are very big numbers; only a handful of data centers in the world are capable of producing them. Both US and EU regulators have established reporting requirements on similar compute thresholds to regulate consequential model developments [3][4].

How did we get here so quickly? As in many of these paradigm shifts, a combination of factors coincided into this swift consolidation of efforts and resources.

The pivotal 2017 paper ‘Attention Is All You Need’ [5] introduced the Transformer architecture, which introduced compelling parallelisation and scaling laws compared to previous architectures[6] which turned out to be perfectly suited for hardware accelerators like GPUs. With a given compute budget, larger models yield better results, and as the compute budget increases, the performance improves with it. Great!

Additionally, the decreasing cost of compute [7] has made creating and training very large models economically viable for the first time. This factor is often understated; a recent survey among AI practitioners [8] identified the falling cost of compute as one of primary driver behind the shift towards large language models. Coupled with the explosive launch of ChatGPT, which reached 100 million users in just two months – the most successful consumer product launch in history – the race was on.

Let’s look at a specific example of the compute growth:

Microsoft Azure AI Datacenter Growth in the Last Four Years

Microsoft Azure, the cloud service provider (CSP) behind OpenAI, launched an AI datacenter in 2020 with a capacity of around 160 petaflops (10^15 FLOPS) to train GPT-3.5. Within three years, Microsoft climbed to the third spot among the world’s largest AI datacenters [9], boasting 560 petaflops.

Although the 2024 figures haven’t been disclosed yet, Azure CTO Mark Russinovich recently mentioned that their growth trajectory is accelerating:

“Today we are deploying the equivalent of five supercomputers of that scale every single month.” [10]

Kevin Scott confirmed at Build that this isn’t a one-off:

“We are nowhere near the point of diminishing returns on how powerful we can make AI models as we increase the scale of compute.” [11]

As in any gold rush, there has to be gold once we get there, and the estimates are pointing at a very large market shift in technology spend over the coming five years. Most developers and enterprises are considering how to integrate neural networks in the way they build their products, services, and experiences, as well as how they run their own companies more effectively. A recent survey by a16z [12] found that enterprises expect a 2.5x increase in AI spending in 2024 alone – with use cases ranging from internal operations, and customer service to value-added features to their services. The Generative AI market – as measured by technology spend – is forecasted to surpass the trillion dollars in the next five years, reaching over 10% of the total technology spend of enterprises [13].

This means that a significant amount of all of our software-based experiences will be running on a neural network in just five years!

The Trials of the Compute Gold Rush

While the direction (more compute > better models, better and cheaper models > more demand) seems clear, the journey to get there isn’t as straightforward.

Design and production of chips and high-bandwidth memory are concentrated in a handful of companies. NVIDIA is dominating design with their consumer and server offering, but TSCM – the chip manufacturer – serves most chip designers in the world. Equally, SK Hynix and Micron are producing most high-bandwidth memories in consumer and server units. Their production capacity is booked for months or even years.

As computers, tablets, phones, cars, and servers all race towards adding AI compute capabilities, supply is physically constrained by a handful of companies capable of satisfying it, making it very inelastic.

Access to the produced compute is also not distributed equally either. As foundational model companies amass the majority of the production of state-of-the-art hardware for training (famously, Meta acquired 70% of the H2 2023 production of H100 [14]), only a minority is being put back in circulation via CSP for developers’ ML workloads. According to the latest NVIDIA earnings, only 45% of their production ended up in CSPs for upstream consumption by developers in Q1, and this is partly explained by some large companies holding off purchases for H200 (the next NVIDIA SOTA product) later in the year [15].

The growth and deployment of large-scale datacenters – like in the Azure example – is bottlenecked by an obvious next hurdle: electricity. Annoyingly, physics is getting in the way of our software ambitions! The electric grid supporting existing data centers cannot support 5x monthly increases in consumption. The AI gold rush, similar to chip manufacturing, isn’t happening in isolation. Many other industries and sectors are being electrified and are competing for energy production and stressing the distribution infrastructure.

To accelerate this gold rush, we need innovative ways to make compute more broadly available.

The Opportunities of the Compute Gold Rush

If we consider the total FLOP production of high-end consumer and server-grade AI chipsets sold in the last four years, the outlook is less daunting. [16]

The sum of all consumer gaming PCs and laptops sold in this period could collectively achieve an astounding 13 exaflops (10^18 FLOPS). Networked together, this would represent a 13x increase over the world’s largest supercomputer today.[17]

Qualcomm’s recent announcement of the Snapdragon X family, a System on a Chip (SoC) comparable to Apple’s M series, showcases an NPU energy efficiency that is incredibly competitive and could make such a compute grid even more cost-effective. [18]

The long tail of the supply is an incredible opportunity to scale the availability of compute and lower its cost to unleash further innovation. These chips are ubiquitous, come with thermal solutions and distributed energy access, are only used sporadically [19] and are all internet-connected, with bandwidth speed improving year over year.

The missing piece is a way to network supply over the internet effectively to perform ML jobs.

The challenge kluster.ai is tackling is how to make heterogeneous, unreliable, distributed compute nodes work as an aggregate (in a grid) to perform large distributed ML jobs at scale. In other words, how can we ‘feed’ orders of magnitude more compute to our exponential ‘hunger’ curve using a distributed protocol versus or alongside co-located data centers, to unlock as much of the compute available today.

While decentralized physical infrastructure is not a new idea, our emphasis is on software to create a developer friendly abstraction layer rather than merely aggregating physical GPUs. This is achieved by delivering competitive cost-to-performance and reliable service goals that current hardware-centric DePIN solutions cannot match.

The core technology kluster.ai is developing to do that is “Adaptive pipelines”. Through a series of post-training enhancements to open models, these models are allowed to run efficiently in a distributed compute grid. A compute scheduling algorithm that is model-aware and environment-aware can then render a large enough group of unreliable nodes dependable. The scheduling of compute tasks is designed to ensure that compute is undertaken only if it enhances performance and aligns with the grid’s health and status

There’s much more to share in the coming months, but in the meantime, find more information about kluster.ai.

About Moonsong Labs

Moonsong Labs is a Web3-focused company providing venture studio and engineering services that come together to shape today’s innovations while forecasting tomorrow’s breakthroughs. Every endeavor, be it in-house ventures or external partnerships, is anchored in engineering expertise. Moonsong’s vision is that Web3-based software will democratize, opening new and more efficient ways for people to have value-based interactions with each other. The team’s mission is to make that vision a reality by supporting the creation of decentralized infrastructure protocols.

Sources

Artificial Analysis. (n.d.). Quality, speed, price. Retrieved from https://artificialanalysis.ai/models
Epoch AI. (2024, April 5). Tracking compute-intensive models. Retrieved from https://epochai.org/blog/tracking-compute-intensive-ai-models
The White House. (2023, October 30). Executive order on the safe, secure, and trustworthy development and use of artificial intelligence. Retrieved from https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/
European Commission. (2021). EU AI Act. Retrieved from https://ec.europa.eu/commission/presscorner/detail/en/qanda_21_1683
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need. Retrieved from: https://arxiv.org/abs/1706.03762
Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., … & Amodei, D. (2020). Scaling laws for neural language models. arXiv preprint arXiv:2001.08361. Retrieved from https://arxiv.org/pdf/2001.08361
Epoch AI. (n.d.). Trends in machine learning hardware. Retrieved from https://epochai.org/blog/trends-in-machine-learning-hardware
Grace, K., Stewart, H., Sandkühler, J. F., Thomas, S., Weinstein-Raun, B., & Brauner, J. (2024, January). Thousands of AI authors on the future of AI. Retrieved from https://arxiv.org/abs/2401.02843
TOP500. (2024, June). June 2024 edition of TOP500. Retrieved from https://top500.org/lists/top500/2024/06/highs/
Russinovich, M. (2024). What runs GPT-4o and Microsoft Copilot? Inside the 2024 AI supercomputer. Retrieved from https://youtu.be/DlX3QVFUtQI?si=z9UJf28jUKBEpvUs&t=63
Microsoft Build. (2024). Retrieved from https://www.youtube.com/live/2bnayWpTpW8?si=LWhN9U3XdyZH1RMw&t=8065
A16Z. (2023). A16Z survey. Retrieved from https://x.com/chiefaioffice/status/1771884750589841823
Bloomberg. (2023). Generative AI spending: Generative AI races toward $1.3 trillion in revenue by 2032. Retrieved from https://www.bloomberg.com/professional/insights/data/generative-ai-races-toward-1-3-trillion-in-revenue-by-2032/
PCMag. (2023). Meta buying 350k H100. Retrieved from https://www.pcmag.com/news/zuckerbergs-meta-is-spending-billions-to-buy-350000-nvidia-h100-gpus
CNBC. (2024, May 22). NVIDIA Q1 earnings report. Retrieved from https://www.cnbc.com/2024/05/22/nvidia-nvda-earnings-report-q1-2025-.html
Business Wire. (2021, March 29). Global gaming PC and monitor market hit new record high in 2020. Retrieved from https://www.businesswire.com/news/home/20210329005150/en/Global-Gaming-PC-and-Monitor-Market-Hit-New-Record-High-in-2020-According-to-IDC
This extrapolates sales of gaming computers and laptops in the last 4 years [16] at around 160m units and assumes 4090 performance for simplicity
Qualcomm. (2023). Snapdragon X series. Retrieved from https://www.qualcomm.com/products/mobile/snapdragon/pcs-and-tablets/snapdragon-x-elite
Gaming computers hours used.”Taming the Energy Use of Gaming Computers”. Retrieved from https://sites.google.com/site/greeningthebeast/energy/taming-the-energy-use-of-gaming-computers