- At the Open Compute Project (OCP) Global Summit 2024, we’re showcasing our tardyst uncover AI challengingware portrays with the OCP community.
- These innovations integrate a new AI platcreate, cutting-edge uncover rack portrays, and progressd netlabor fabrics and components.
- By sharing our portrays, we hope to ease collaboration and ease innovation. If you’re fervent about produceing the future of AI, we ask you to comprise with us and OCP to help shape the next generation of uncover challengingware for AI.
AI has been at the core of the experiences Meta has been deinhabitring to people and businesses for years, including AI modeling innovations to boost and better on features enjoy Feed and our ads system. As we enhuge and free new, progressd AI models, we are also driven to progress our infraset up to help our new and emerging AI laborloads.
For example, Llama 3.1 405B, Meta’s hugest model, is a dense alterer with 405B parameters and a context thrivedow of up to 128k tokens. To train a huge language model (LLM) of this magnitude, with over 15 trillion tokens, we had to produce substantial chooseimizations to our entire training stack. This effort pushed our infraset up to run apass more than 16,000 NVIDIA H100 GPUs, making Llama 3.1 405B the first model in the Llama series to be trained at such a massive scale.
Prior to Llama, our hugest AI jobs ran on 128 NVIDIA A100 GPUs. But slfinishergs have rapidly quickend. Over the course of 2023, we rapidly scaled up our training clusters from 1K, 2K, 4K, to eventupartner 16K GPUs to help our AI laborloads. Today, we’re training our models on two 24K-GPU clusters.
We don’t foresee this upward trajectory for AI clusters to sluggish down any time soon. In fact, we foresee the amount of compute necessitateed for AI training will grow presentantly from where we are today.
Building AI clusters needs more than equitable GPUs. Netlaboring and bandwidth join an presentant role in ensuring the clusters’ carry outance. Our systems consist of a shieldedly combined HPC compute system and an isotardyd high-bandwidth compute netlabor that combines all our GPUs and domain-definite accelerators. This portray is vital to greet our injection necessitates and compriseress the contests posed by our necessitate for bisection bandwidth.
In the next scant years, we foresee wonderfuler injection bandwidth on the order of a terabyte per second, per accelerator, with identical normalized bisection bandwidth. This recurrents a growth of more than an order of magnitude appraised to today’s netlabors!
To help this growth, we necessitate a high-carry outance, multi-tier, non-blocking netlabor fabric that can participate contransient congestion administer to behave foreseeably under burdensome load. This will allow us to filledy leverage the power of our AI clusters and find they progress to carry out chooseconveyner as we push the boundaries of what is possible with AI.
Scaling AI at this speed needs uncover challengingware solutions. Developing new architectures, netlabor fabrics, and system portrays is the most efficient and impactful when we can produce it on principles of uncoverness. By spending in uncover challengingware, we unlock AI’s filled potential and propel ongoing innovation in the field.
Introducing Catalina: Open Architecture for AI Infra
Today, we proclaimd the upcoming free of Catalina, our new high-powered rack portrayed for AI laborloads, to the OCP community. Catalina is based on the NVIDIA Balertagewell platcreate filled rack-scale solution, with a intensify on modularity and flexibility. It is built to help the tardyst NVIDIA GB200 Grace Balertagewell Superchip, ensuring it greets the grothriveg insists of contransient AI infraset up.
The grothriveg power insists of GPUs uncomardents uncover rack solutions necessitate to help higher power capability. With Catalina we’re introducing the Orv3, a high-power rack (HPR) contendnt of helping up to 140kW.
The filled solution is fluid chillyed and consists of a power shelf that helps a compute tray, switch tray, the Orv3 HPR, the Wedge 400 fabric switch, a deal withment switch, battery backup unit, and a rack deal withment administerler.
We aim for Catalina’s modular portray to empower others to customize the rack to greet their definite AI laborloads while leveraging both existing and emerging industry standards.
The Grand Teton Platcreate now helps AMD accelerators
In 2022, we proclaimd Grand Teton, our next-generation AI platcreate (the chase-up to our Zion-EX platcreate). Grand Teton is portrayed with compute capacity to help the insists of memory-bandwidth-bound laborloads, such as Meta’s proset up lgeting recommfinishation models (DLRMs), as well as compute-bound laborloads enjoy satisfied empathetic.
Now, we have enhugeed the Grand Teton platcreate to help the AMD Instinct MI300X and will be contributing this new version to OCP. Like its predecessors, this new version of Grand Teton features a individual monolithic system portray with filledy combined power, administer, compute, and fabric interfaces. This high level of integration simplifies system deployment, enabling rapid scaling with incrrelieved reliability for huge-scale AI inference laborloads.
In compriseition to helping a range of accelerator portrays, now including the AMD Instinct MI300x, Grand Teton recommends presentantly wonderfuler compute capacity, apshowing rapider combinence on a huger set of weights. This is complemented by enhugeed memory to store and run huger models locpartner, alengthy with incrrelieved netlabor bandwidth to scale up training cluster sizes efficiently.
Open Disaggregated Scheduled Fabric
Developing uncover, vfinishor-agnostic netlaboring backfinish is going to join an presentant role going forward as we progress to push the carry outance of our AI training clusters. Disaggregating our netlabor apshows us to labor with vfinishors from apass the industry to portray systems that are produceive as well as scalable, pliable, and efficient.
Our new Disaggregated Scheduled Fabric (DSF) for our next-generation AI clusters recommends disconnectal acquires over our existing switches. By uncovering up our netlabor fabric we can defeat confineations in scale, component provide chooseions, and power density. DSF is powered by the uncover OCP-SAI standard and FBOSS, Meta’s own netlabor operating system for administerling netlabor switches. It also helps an uncover and standard Ethernet-based RoCE interface to finishpoints and accelerators apass disconnectal GPUS and NICS from disconnectal separateent vfinishors, including our partners at NVIDIA, Broadcom, and AMD.
In compriseition to DSF, we have also enhugeed and built new 51T fabric switches based on Broadcom and Cisco ASICs. Finpartner, we are sharing our new FBNIC, a new NIC module that comprises our first Meta-portray netlabor ASIC. In order to greet the grothriveg necessitates of our AI
Meta and Microsoft: Driving Open Innovation Together
Meta and Microsoft have a lengthy-standing partnership wislfinisher OCP, beginning with the enhugement of the Switch Abstraction Interface (SAI) for data caccesss in 2018. Over the years together, we’ve gived to key initiatives such as the Open Accelerator Module (OAM) standard and SSD standardization, showcasing our dispensed promisement to advancing uncover innovation.
Our current collaboration intensifyes on Mount Diablo, a new disaggregated power rack. It’s a cutting-edge solution featuring a scalable 400 VDC unit that increases efficiency and scalability. This produceive portray apshows more AI accelerators per IT rack, presentantly advancing AI infraset up. We’re excited to progress our collaboration thcimpolite this contribution.
The uncover future of AI infra
Meta is promiseted to uncover source AI. We suppose that uncover source will put the advantages and opportunities of AI into the hands of people all over the word.
AI won’t genuineize its filled potential without collaboration. We necessitate uncover software sketchlabors to drive model innovation, find portability, and advertise transparency in AI enhugement. We must also rank uncover and regularized models so we can leverage accumulateive expertise, produce AI more accessible, and labor towards minimizing biases in our systems.
Just as presentant, we also necessitate uncover AI challengingware systems. These systems are vital for deinhabitring the comardent of high-carry outance, cost-effective, and pliable infraset up vital for AI progressment.
We inspire anyone who wants to help progress the future of AI challengingware systems to comprise with the OCP community. By compriseressing AI’s infraset up necessitates together, we can unlock the genuine promise of uncover AI for everyone.