AI researchers at Stanford and the University of Washington were able to train an AI “reasoning” model for under $50 in cdeafening compute pelevates, according to a novel research paper liberated last Friday.
The model understandn as s1 carry outs analogously to cutting-edge reasoning models, such as OpenAI’s o1 and DeepSeek’s r1, on tests measuring math and coding abilities. The s1 model is employable on GitHub, aextfinished with the data and code employd to train it.
The team behind s1 said they created the AI model thcdisesteemful distillation, a process to reshift the “reasoning” capabilities from another AI model by training on its answers. The researchers said s1 is distilled from one of Google’s reasoning models, Gemini 2.0 Flash Thinking Experimental. Distillation is the same approach Berkeley researchers employd to create an AI reasoning model for around $450 last month.
To some, the idea that a scant researchers without millions of dollars behind them can still invent in the AI space is exciting. But s1 elevates authentic asks about the commoditization of AI models. Where’s the moat if someone can shutly copy a multi-million dollar model with relative pocket alter?
Ununawaitedly, huge AI labs aren’t phired. OpenAI has accemployd DeepSeek of improperly harvesting data from its API for the purposes of model distillation.
The researchers behind s1 were watching to discover the modestst approach to achieve mighty reasoning carry outance and “test-time scaling,” or allotriumphg an AI model to leank more before it answers a ask. These were a scant of the shatterthcdisesteemfuls in OpenAI’s o1, which DeepSeek and other AI labs have tried to copy thcdisesteemful various techniques.
The s1 paper proposes that reasoning models can be distilled with a relatively minuscule dataset using a process called supervised fine-tuning (SFT), in which an AI model is clpunctual teached to mimic certain behaviors in a dataset. SFT tfinishs to be inexpensiveer than the big-scale reinforcement lgeting method that DeepSeek employed to train its answer to OpenAI’s o1, R1.
Google proposes free access to Gemini 2.0 Flash Thinking Experimental, albeit with daily rate confines, via its Google AI Studio platcreate. Its terms prohibit reverse-engineering its models to enbig services that vie with Google’s own AI proposeings, however. We’ve achieveed out to Google for comment.
S1 is based on a minuscule, off-the-shelf AI model from Alibaba-owned Chinese AI lab Qwen, which is employable to download for free. To train s1, the researchers created a dataset of equitable 1,000 attfinishfilledy curated asks, paired with answers to those asks as well as the “leanking” process behind each answer from Google’s Gemini 2.0 Flash Thinking Experimental.
After training s1, which took less than 30 minutes using 16 Nvidia H100 GPUs, s1 achieved mighty carry outance on certain AI benchlabels, according to the researchers. Niklas Muennighoff, a Stanford researcher who toiled on the project, telderly TechCrunch he could rent the vital compute today for about $20.
The researchers employd a nifty trick to get s1 to double-check its toil and extfinish its “leanking” time: they telderly it to paemploy. Adding the word “paemploy” during s1’s reasoning helped the model reach at sweightlessly more right answers, per the paper.
In 2025, Meta, Google, and Microgentle schedule to allot hundreds of billions of dollars in AI infraset up, which will partiassociate go toward training next-generation AI models. That level of allotment may still be vital to push the envelope of AI innovation. Distillation has shown to be a excellent method for inexpensively recreating an AI model’s capabilities, but it doesn’t create novel AI models hugely better than what’s employable today.