According to the company's internal benchmarks, the recently released Google AI model scored worse on some security tests than its predecessor.
In a technical report released this week, Google revealed that its Gemini 2.5 Flash model is more likely to generate text that violates its security guidelines than Gemini 2.0 Flash. Regarding the two metrics, "text-to-text security" and "image-to-text security", Gemini 2.5 flash regressed 4.1% and 9.6% respectively.
Text-to-text security measures Given a hint, the model violates the frequency of Google's guidelines, while image-to-text security evaluates the tight bonding of the model to these boundaries when using image hints. Both tests are automated, not human supervision.
In an email statement, a Google spokesperson confirmed that Gemini 2.5 Flash has “poor performance on text-to-text and image-to-text security.”
These surprising benchmark results are that as AI companies grow, they make their models more permissible—in other words, there is less chance of refusing to respond to controversial or sensitive topics. Meta said that for the latest camel models, it adjusted those models, not endorsing "certain points of view on others" and answering more political tips for "debate". Openai said earlier this year that it would adjust the future model without taking an editorial stance and offering multiple perspectives on controversial topics.
Sometimes those efforts tolerate backfire. TechCrunch reported on Monday that the default model that powers Openai’s Chatppt enables minors to generate pornographic conversations. Openai blames the behavior on "error".
According to Google's technical report, the Gemini 2.5 Flash is still in preview, following instructions more faithfully than the Gemini 2.0 Flash, which includes instructions for cross-issues. The company claims that the regression can be partly attributed to false positives, but it also admits that when explicitly asked, Gemini 2.5 Flash sometimes produces "black content."
TechCrunch Events
Berkeley, CA | June 5
Book now“Naturally, there is tension between sensitive topics and the violation of security policy (source below) which is reflected in our assessment,” the report reads.
SpeechMap's score is a benchmark that tests how the model responds to sensitive and controversial cues, and also shows that Gemini 2.5 Flash refuses to answer controversial questions than Gemini 2.0 2.0 Flash refuses to answer controversial questions. TechCrunch's test of the model through the AI platform OpenRouter found that it will simply write articles to support the replacement of human judges with AI, weaken due process protection in the United States, and implement a broad assurance government surveillance program.
Thomas Woodside, co-founder of the Security AI project, said the limited details Google gave in its technical reports suggest that it is necessary to be more transparent in model testing.
"There is a trade-off between following the teaching and policy, as some users may ask for content that violates the policy," Woodside told TechCrunch. "In this case, Google's latest Flash model is more in line with the instructions while also violating the policy. Google doesn't provide much detail about the specific situation of a policy violation, although they say they are not serious. Without more knowledge, it's hard for independent analysts to know if there is a problem."
Google has been under fire for its model security reporting practices.
It took the company several weeks to release a technical report on the most capable model, Gemini 2.5 Pro. After the report was finally released, critical security testing details were initially omitted.
On Monday, Google released a more detailed report containing additional security information.