Expect the next UX battle in AI to be over what "thinking tokens" to show
DeepSeek found that producing less pleasant (mixed languages, etc) thoughts might get better outcomes, but people didn't like them. They trained r1 to show nicer thoughts, which makes DeepSeek nice to use pic.twitter.com/RF9U9JWK10
— Ethan Mollick (@emollick) February 1, 2025
A research team from the University of California, Berkeley, has developed a small-scale reproduction of DeepSeek R1-Zero, an AI language model originally created in China, for approximately $30. The project, known as TinyZero, is led by campus graduate researcher Jiayi Pan and three other researchers, supervised by Professor Alane Suhr of UC Berkeley and Assistant Professor Hao Peng of the University of Illinois at Urbana-Champaign. Pan and his team took advantage of DeepSeek’s R1 model weights and code repositories, which are under a public MIT license, to create a significantly smaller model.
Surprised we haven't seen more about Deepseek r1-zero (no one seems to host it?)
Unlike r1, which was trained to "think" in a readable, kinda charming way, r1-zero is the self-trained reasoner that had the *aha moment* about math & produces "thoughts" that are not human readable pic.twitter.com/u9OYhSejVF
— Ethan Mollick (@emollick) January 31, 2025
TinyZero is also open sourced, providing public access to its code and allowing anyone to experiment with training and modifying the model. “Small-scale reproduction is very accessible and very cheap even for people as a side project to experiment with,” Pan explained. He emphasized that the aim was to demystify the process of training such models and to better understand the science and design decisions behind them.
The $30 expense primarily covered server costs for running the experiments.
The buzz over DeepSeek this week crystallized, for many people, a few important trends that have been happening in plain sight: (i) China is catching up to the U.S. in generative AI, with implications for the AI supply chain. (ii) Open weight models are commoditizing the…
— Andrew Ng (@AndrewYNg) January 30, 2025
Genevieve Smith, founding director of the Responsible AI Initiative and the AI Policy Hub interim co-director at UC Berkeley, noted that more cost-effective language models have already impacted the market.
Berkeley team makes AI more accessible
She pointed out that DeepSeek’s R1 model, requiring significantly less computing power, has influenced the stock market, particularly affecting companies like NVIDIA that supply processing chips. The creation of more efficient and cost-effective AI technology could potentially amplify demand and adoption, leading to greater value creation,” Smith said. However, she also warned about the potential long-term implications on the market and geopolitical dynamics, as the development of these new language models has stirred a competitive spirit between the U.S. and China.
The successful recreation of the AI model at a fraction of the cost has significant implications for the AI community. It underscores a shift from an era of extensive computation and vast datacenters to more efficient and accessible solutions. This development raises questions about the large investments made by major AI players like OpenAI, Meta, Google, and Microsoft.
The release of this cost-effective model has already triggered discussions among investors and technologists about the current strategies of big tech companies. If models like TinyZero can be developed cheaply and within a short timeframe, it suggests that more streamlined approaches might have been viable all along. This breakthrough could serve as a bellwether for open-source AI development, potentially democratizing access to advanced AI technologies and altering the landscape of artificial intelligence research.