Google's AI R&D Lab DeepMind said it has developed a new AI system to solve the problem of "machine grading" solutions.
In the experiment, the system, called Alphaevolve, could help optimize some of the infrastructure Google uses to train its AI models, DeepMind said. The company said it is building a user interface to interact with Alphaevolve and plans to launch early access programs for selected academics before a possible wider rollout.
Most AI models hallucinate. Because of their probabilistic architecture, they sometimes form things with confidence. In fact, new AI models like Openai's O3 Hallucination More It's more challenging than their ex illustrates the problem.
Alphaevolve introduced a clever mechanism to reduce hallucinations: an automatic evaluation system. The system uses models to generate, criticize and reach a pool of possible answers to a question and automatically evaluate and score the accuracy of answers.
Alphaevolve is not the first system to adopt this nail. Researchers, including the DeepMind team a few years ago, have adopted similar techniques in various fields of mathematics. However, DeepMind claims that Alphaevolve's use of "stand-alphaevolve's" model (particularly the Gemini model) makes it more capable than early AI instances.
To use Alphaevolve, users must prompt system problems, including detailed information such as instructions, equations, code snippets and related literature. They must also provide a mechanism to automatically evaluate system answers in the form of formulas.
Since Alphaevolve can only solve problems that it can self-evaluate, the system can only be used with certain types of problems, especially in areas such as computer science and system optimization. In another major limitation, Alphaevolve can only describe the solution as an algorithm, which makes it suitable for problems that are not numeric.
To benchmark Alphaevolve, DeepMind's system attempted a curated set of about 50 mathematical problems covering branches from geometry to combinations. Alphaevolve managed to "rediscover" the most famous answer of the 75th percentile of the time and found improved solutions in 20% of the cases.
DeepMind also evaluates Alphaevolve of practical problems, such as improving the efficiency of Google data centers and accelerating the operation of model training. According to the lab, Alphaevolve produced an algorithm that recovers on average 0.7% of Google's global computing resources. The system also proposes an optimization that reduces the total time Google needs to train its Gemini model by 1%.
It should be clear that Alphaevolve did not make breakthrough discoveries. In one experiment, the system was able to find improvements in Google's TPU AI accelerator chip design, which was marked by other tools.
However, DeepMind's case is the same as what many AI labs do for their systems: Alphaevolve can save time while freeing up experts to focus on other more important tasks.