The Chinese Artificial Intelligence Laboratory DeepSeek uses innovative technology to develop an AI model that has been trained by limited human intervention, which generates "AHA moment", which can change the cost of establishing killer applications based on technology.
The research papers on the work of the R1 "reasoning" model of Deepseek revealed how the group led by the hedge fund billionaire Liang Wenfeng to achieve strong results by eliminating the bottleneck in AI development.
This paper shows how DeepSeek uses a series of more effective technologies to develop R1, just like OpenAI's competitors O1 model. They gradually "think" by "thinking" by longer response than most large language models to generate accurate answers.
Deepseek's breakthrough comes from its "enhanced learning" to reduce the participation of the response of human participation prompts.
The company also builds a smaller model, which has fewer parameters (used to train the AI system and shape the number of variables for its output). By adjusting large models trained by competitors such as Meta and Alibaba, it is powerful. Reasoning function.
The development of these situations has caused shock waves in the entire Silicon Valley, because R1 is better than some tasks compared with the recently released OpenAI, Anthropic and Meta models, but a small part of the money to be developed.
Openai said on Tuesday that DEEPSEEK had got rid of technical evidence and used its model to train its LLM at a lower cost. This is a common approach to academic circles and well -financing startups.
Despite controversy, experts said that Deepseek showed real innovation. AI researchers also praised them to release a detailed technical report to outline how they constructed their reasoning models.
"I think this is just the tip of the iceberg," said Neil Lawrence, a professor of DeepMind at the University of Cambridge. Innovation, what we see from these large companies is the replacement of intellectual efforts to calculate investment. "
The thumb caused "AHA MIST"
Large language models are divided into two stages. The first is called "pre -training", of which developers use a large number of data sets to help the model predict the next word in the sentence. The second stage is called "after training". Developers follow the instructions through the teaching model at this stage, such as solving mathematical problems or coding.
A method for getting a more useful response to the chat robot generation is called "learning from human feedback" (RLHF), which is a technology that OpenAI is the first to improve ChatGPT.
RLHF marked the response of the AI model to the prompt and select the optimal response through the work of the human script. This step is usually laborious, expensive and time -consuming, usually a small group of human data marks.
Deepseek's major innovation is the last step of using a technical automation called enhanced learning (RL). In this technology, the AI model was rewarded in the correct way.
Deepseek first developed a powerful text prediction model called V3. Then, it uses RL to "reward" the model, such as providing the correct answer to provides thumbs.
The Chinese company found that by doing enough time, the model managed to solve the problem spontaneously without human supervision.
Google DeepMind also uses this technology to build AlphaGo, which is the AI system that defeats human players in ancient chessboard games and launched the current prosperity of deep learning computing technology about ten years ago.
Deepseek said it found that the model re -evaluated the answer and adjusted the processing time to solve the so -called "AHA moment" when solving different problems.
Deepseek's creator wrote in their research papers: "'AHA Moment' strongly reminded (RL) (RL) to unlock the new level of intelligent level in the artificial system, paving the more autonomous and adaptive model in the future. Road.
AI Research Company Hugging Face researcher Lewis Tunstall said: "The secret taste of this work has a very, very powerful pre -training model, and then you can have a very, very good sauce infrastructure. One strengthening learning process. "
Large -scale models built
Although OpenAI and Google are investing billions of dollars to build a large -scale language model, Deepseek has also established a smaller model that can run on mobile phones or web browsers through the "distillation" large model reasoning function.
Deepseek uses its R1 model to generate a relatively small 800,000 data set, and then use AI to adjust the model made by competitors (such as Alibaba's Qwen and Meta LLAMA).
DeepSeek found that these distilled models are particularly powerful in reasoning. In some cases, the performance of these models is better than flagship models such as Claude such as Anthropic. Tunstall said: "It can basically solve most of the mathematical problems I encountered in undergraduates."
For application developers, this development may be a gospel, and they have a cheap and efficient way to manufacture products. Lennart Heim, a researcher at the Smart Tank, said that when the model generates answers, the professor's AI model is much higher when the "model is being generated) during the" reasoning "period, and the training process requires a lot of computing power. Essence
He added that this new paradigm enables competitors to establish a competitive model with less computing power and money. However, there is no money for chips, "they just can't deploy them at the same scale."
Deepseek did not say how much it spent to build R1, but claimed that it trained R1 -based V3 model, only $ 5.6 million.
HEIM said that this amount does not include other costs, for example, it may acquire thousands of graphic processing units to train models or salary, experiments, training and deployment.
Although DeepSeek is the first person to use its specific technology, other AI laboratories are expected to follow suit, and the hug face has been copied R1.
The American AI company is also committed to using its most advanced models in small and more agile models. Google launched Gemma last year, which is a lighter model based on Gemini.
THOMAS Wolf, co -founder and chief scientific officer of Huging Face, said: "The secret to intelligence is very simple." This is why many teams hope that many teams can redo this. "
Murgia is London