Posted On05-16-2024 06:31 AM

kluster.ai: the distributed platform for large AI models

Posted by Derek Yoo

The AI Compute Problem
Solution
Core Technology – Adaptive Pipelines
Who benefits?
A Bright Future for kluster.ai and AI x Web3

I’ve had a personal interest in AI since I first encountered the program Eliza running on an Apple II. It was far from impressive by today’s standards, yet it offered me an early glimpse of what might be possible in the future, and the kinds of interactions computers would one day be capable of. I soon forgot about that experience and became consumed by Linux, the Internet, and building other kinds of software. But that feeling of wonder and possibility is something I’ve come full circle to with the latest project coming out of the Moonsong Labs Venture Studio: kluster.ai.

kluster.ai sits right at the intersection of Web3 and AI, which is one of the most exciting places to be building software today. It is a next generation distributed GPU compute grid that will enable devs to run large open models, free from computational constraints. It bears some similarities to existing GPU marketplaces like akash and io.net, but adds a unique software layer specifically designed for large AI jobs that creates a hardware abstraction for devs to more easily and efficiently run their AI workloads.

Born in Moonsong Labs, the project originated through a process of studying the latest trends in AI, researching problems potentially solvable with decentralized techniques, and engaging in thoughtful analysis. We wanted to approach this project in the same way we have other projects – by talking to customers (devs) and building something practical, needed, and useful.

Blockchains are good at organizing and coordinating many different participants and allowing them to have value based interactions. It turns out that this is a perfect match for problems that devs are facing today in developing and using open source AI-based services.

The AI Compute Problem

Development of AI models is currently constrained by available GPU supply. Since 2015, the amount of compute used in large-scale models has been doubling in roughly 9.9 months¹. Nvidia, a longtime provider of GPU chips, has a virtual monopoly on the market. Inundated with demand, its valuation has surged to $1 trillion² and reported record financial results in FY24³.

Big players have cornered the GPU compute market, enabling them to train and operate the most extensive and precise proprietary models. In contrast, getting access to GPU capacity to train, fine-tune and run inference against large open models is practically out of reach for most teams and organizations. The required GPU capacity is either not available at all, or if it is, it is very costly and requires complex hardware setups. Many devs default to using proprietary services like OpenAI due to ease of getting started and performance. However, these services become very costly when scaled up and lack the inherent flexibility and transparency found in open-source models.

The development of open source approaches to AI and models is extremely active right now, with open models rapidly improving their performance and accuracy. I saw this dynamic with the rise of Linux in 90s and 2000s where performance and feature gaps were closed over time relative to proprietary offerings. I expect it to be the same with AI that a large and growing part of the market will be serviced by open approaches.

Demand for GPU based compute continues to grow, with GPU data centers set to grow 70% annually and expected to reach a $400B market by 2027. Frontier model parameter size also continues to grow with a 4x increase in the last 4 years, driven by a desire for ever better accuracy. Demand will outstrip available supply for the foreseeable future, as AI continues to undergo widespread integration, reshaping entire domains and embedding itself within global knowledge workflows. Just as with the dotcom era of the late 90s, we are at the beginning of a transformative phase in the market figuring out how AI can be used to gain efficiencies and create new possibilities in many different domains.

Solution

Given this supply constrained environment, anything that makes more productive compute supply available to devs at a competitive cost creates a big unlock. This is the thread that we started pulling on over the last 6+ months. The question was: how could a blockchain facilitate a solution to this AI Compute supply problem?

We believe that a decentralized AI Compute marketplace offers many attractive benefits to devs facing this existing supply challenge. The objective for kluster.ai is to create a distributed compute grid where suppliers can seamlessly provide their GPUs to the network. These just need to be AI enabled chips with an internet connection (e.g. RTX 4090, H100, etc.), which the platform harnesses to efficiently and reliably distribute and schedule ML jobs across them. This opens the door to make many currently isolated or unproductive GPUs productive for high end AI use cases. And devs that need that GPU capacity can pay to consume this aggregate capacity for AI training, fine tuning and inference jobs.

First generation GPU marketplaces are just coming online now, and these operate by providing devs with raw access to the underlying hardware. In these marketplaces, devs must specify the quantity and type of GPU hardware they want to rent, if available. The commodity being transacted is GPU time on a rental basis. This leaves it entirely up to the devs to size, provision, and manage that raw hardware into productive use. The challenge only becomes more complex for large models since developers must split and distribute the execution across multiple GPUs. In short, while these platforms have the potential to bootstrap GPU supply, they make it difficult for devs to utilize it efficiently for large model AI workloads, leading to low effective utilization.

The idea for kluster.ai is to provide a higher level of abstraction so devs don’t need to think about hardware. Instead, they work at the model level, and the system maps what they want to accomplish against a distributed grid of underlying GPUs. The core project innovation that enables this abstraction is a technology that we are calling Adaptive Pipelines.

Core Technology – Adaptive Pipelines

Existing GPU marketplaces primarily aggregate hardware resources from suppliers and rent them to developers, passing along all the associated management and reliability challenges intrinsic to small-scale p2p networks.

kluster.ai‘s main innovation and key differentiator is a technology called Adaptive Pipelines, which improve upon existing approaches by optimizing large open models to natively run over a globally distributed compute grid while guaranteeing consistent performance at competitive cost. This setup creates an abstraction layer that allows developers to focus on running workloads directly on AI models without having to worry about the underlying hardware, while suppliers maximize productivity by rapidly onboarding GPUs of diverse specifications and architectures without having to plan about compatibility or size.

Adaptive pipelines consist of three pillars that are specifically designed to enable AI use cases in kluster.ai’s distributed and decentralized compute environment:

Tensor Fragments: a specialized way to dynamically re-package and optimize AI models so they can be run efficiently in kluster.ai’s distributed grid. Optimizations include minimizing network communications and memory requirements through a combination of industry standard and in-house strategies designed specifically for distributed environment challenges.

Selective Activation:a mechanism to perform sparse activation of the Tensor Fragments to only use a subset of the network for each inbound request.This approach reduces cost and latency while rendering similar accuracy to full network activation.

Compute Scheduler: the orchestrator that distributes model operations across a collection of internet connected GPUs. This includes navigating peer-to-peer pipelines in real-time, dynamically adapting to peer availability and achieving stable and reliable model interactions across a geographically distributed, heterogeneous, and potentially unreliable network of compute hardware.

Who benefits?

Benefits for Devs

kluster.ai allows devs to work with AI models as a service, where models exist as first class objects. This is one level of abstraction higher than interacting directly with the hardware as most providers currently offer. It relieves devs of the burden of managing individual GPU hardware. kluster.ai also helps devs achieve improved accuracy through the ability to run and tune larger and more precise models. All this while at the same time offering lower cost vs competing centralized or hardware centric options based on more efficient hardware use and the ability to collectively leverage unproductive, commodity, and previous gen hardware.

Benefits for Suppliers

GPU suppliers also realize benefits from kluster.ai’s Adaptive Pipelines. They get access to higher margin ML runs with diverse spec GPUs, maximizing their GPU utilization and enabling rapid onboarding/offboarding for optimal hardware portfolio utilization. You don’t have to win or dedicate your hardware for scheduled rental windows. The protocol will automatically schedule work for your supplied GPU.

kluster.ai allows suppliers to optimize and generate more revenue with their GPUs and extend the productive lifespan of their hardware, which will make it an attractive protocol to supply capacity to vs other hardware and rental centric options.

A Bright Future for kluster.ai and AI x Web3

The more I learn about AI, the more bullish I become on the inevitability of the technology to drive meaningful efficiencies and change in the world. It’s the same feeling I had when I first understood how permissionless blockchains can coordinate people in novel ways. I started my career during the emergence of the first internet wave, an opportunity that few encounter more than once in their lifetime. Now, as Web3 and AI revolutionize the world, I feel very fortunate to have a front-row seat and contribute to shaping these technologies for the betterment of society.

kluster.ai is our first project in the decentralized AI space. One of the reasons I like kluster.ai is because it aims to solve real problems faced by large numbers of devs. Unlike many Web3 projects, it uses decentralized technology as a means to an end, to aggregate and orchestrate GPU capacity all over the world in a way that makes it possible for devs to do things they couldn’t do before. This practicality fits Moonsong Labs vision and values very well. We want to help create projects that truly make a difference in the world.

I have known Julio Viera, the CEO and co-founder of the project, for many years, and he is one of the best leaders I have ever worked with. Julio is building an amazing team of AI and Web3 experts to realize the kluster.ai vision. I can’t wait to see what they are able to accomplish.

You can join the kluster.ai waitlist here to be part of the journey and follow on X for updates.