Anthropomorphic Google score wins by incorporating Openai-backed Harvey as a user

Harvey announced in a blog post on Tuesday that the popular legal AI tool, Harvey, will now use Anthropic and Google's leading base models, surpassing strict use of Openai's.

This is worth noting, as Harvey is one of the most successful early-stage portfolio companies in OpenAI startup fund. OpenAI Entrepreneurship Fund is an OpenAI-related fund that can support the development of products with AI technology (mainly OpenAi's own AI Technologies). Although Harvey said it wasn't about giving up on Openai, but just adding more models and clouds, it's still a huge coup for Openai's big competitors.

Harvey is one of the top four startups supported by the OpenAI startup fund in December 2022. That was when Openai CEO Sam Altman was still running the fund. (Others in the first queue include description, mem, and speaking.)

Since then, Harvey has grown up like crazy people and is now a $3 billion startup, when it announced the $300 million series D led by Sequoia, other big names such as Coatue, Kleiner Perkins and Openai Fund Fund Piling.

Interestingly, Google's Venture Farm GV led Harvey's $100 million Series C in July 2024 (Openai Fund also participated in the round). However, after Harvey placed Google's corporate venture capital firm on its capped table, Harvey did not immediately adopt Google's AI model. (GV also participated in Harvey's D series.)

So, what convinced Harvey to surpass Openai's model now? The startup’s internal development benchmark, called Biglaw, shows that a wide variety of basic models are becoming more proficient within a range of legal tasks, and some are better than others in terms of specific tasks.

Harvey believes that instead of spending hard work on training models, it is better to simply take a high-performance base model from other vendors such as Google and Amazon's Cloud Anthropic and then use it for the legal market.

The company said that with the creation of AI agents created by Harvey, using various models will also help.

"In less than a year, seven models, including three non-OAI models, are now better than the Harvey system originally owned by the benchmark on Biglaw Bench."

Harvey’s benchmark also shows that different base models are better than others in terms of specific legal tasks. For example, it says Google's Gemini 2.5 Pro is "good at" when drafting the law, but "struggle" in pre-trial tasks such as writing oral debates because the model does not fully understand "complex rules of evidence like rumors."

According to Harvey's test, Openai's O3 can do such pre-trial tasks well, with anthropomorphic Claude 3.7 sonnets following.

Harvey Biglaw — Results of Harvey's internal benchmark test.Image source:Harvey

Harvey said in his blog post that it will now join the public rankings of shared model benchmark performance. Its board will rank the performance of the main reasoning model on legal tasks. Moreover, the company not only lowers its ranking to a number, but will also release research that “top lawyers provide subtle insights into the performance of models that are not tested by single-score benchmarks.”

So not only did Harvey, backed by Openai, adopt a competitor’s model, but also added pressure from supporters (including Google) to continue to prove itself. It's not that Openai should have too many points for this score. Despite the increasing complexity and political nature of AI benchmarks, this is a world where Openai still shines.

“We are fortunate to have investors in Harvey and investors of major collaborators,” Harvey CEO Winston Weinberg told TechCrunch in a statement. “And, as we continue to meet the needs of our customers around the world, we are full of energy to provide our customers with choice.”