Rockfish is helping enterprises leverage synthetic data

For years, Vyas Sekar would call Muckai Girish, an old friend from undergrad, to discuss potential entrepreneurial ideas and get Girish’s input. The two usually hit upon an idea and then end the conversation. When Sekar called Girish in early 2022 with an idea involving synthetic data, the conversation didn’t end after they hung up.

Sekar and colleague Giulia Fanti at Carnegie Mellon University have been working on building synthetic data to address academia's reproducibility crisis, or the inability to replicate data. While Sekar saw a need for solutions in academia, Girish knew his clients were facing the same problems at the time. After conversations with several companies, the paper was further verified.

"At the time, I felt like this was very real and there was an opportunity," CEO Girish told TechCrunch. "That's why we started, and over the next few months we talked to a number of investors, people we knew and more importantly businesses and realized this was a major problem that was worth a lifetime's worth of work. effort. "

The result was Rockfish, a startup that uses generative AI to create synthetic data for operational workflows to help enterprises break down data silos. Rockfish integrates with database providers such as AWS and Azure to help users choose the best data configuration based on company policies or data usage.

Synthetic data is increasingly a hot topic in the field of artificial intelligence, but when the company launched in June 2022, it was already growing momentum. Girish said Rockfish wanted to make sure the product it was building was different from its own. Solutions that peers and businesses use every day, not just occasionally.

That’s why the company’s products are designed to continuously ingest data and focus on operational data, which includes things like financial transactions, cybersecurity and supply chains. These areas are constantly generating data for businesses and are constantly changing. Girish believes that focusing on this will help Rockfish stand out from other competitors.

Girish said the company now works with a handful of enterprise customers, including streaming analytics platform Conviva, as well as government agencies such as the U.S. Army and the U.S. Department of Defense.

Rockfish announced that it has received US$4 million in seed round financing, led by Emergent Ventures, with participation from Foster Ventures, TEN13 and Dallas VC. This brings the company's total funding to approximately $6 million.

Anupam Rastogi, managing partner at Emergent Ventures, told TechCrunch that he has been tracking Sekar since before Rockfish was founded. He said the factors that prompted the company to invest were "the team, the market, the product, in that order." Additionally, Rockfish's focus on building for the enterprise makes it a better fit for Emergent than some other players in the space.

“The team is extremely high-quality data scientists, with multiple Ph.D.s,” Rastogi said. "We think this is a very technically complex area and having that technical prowess is critical. They've done a lot of groundwork in this area, not just at the company but across the industry."

While Rockfish hopes its focus will help build a moat among its competitors, that doesn't change the fact that synthetic data is likely to become an increasingly crowded market. AI companies are turning to synthetic data as multiple players believe the market has exhausted alternative AI training data.

There are already a number of startups looking to tap into the market, including Tonic AI, which has raised more than $45 million in venture capital; Mainly AI, which has raised $31 million in venture capital; and Hazy, which is raising before being acquired by SAS in 2024 $14.5 million, to name a few.

Girish said the company hopes to enhance its approach to synthesizing data by incorporating other types of models, such as state-space models, mathematical models that use state variables. The company also wants to improve its end-to-end functionality.

"It's not like you're taking random data on the Internet and generating synthetic data," Girish said. "There's no guarantee it's going to do well. But if you put it all together, it's actually very relevant and realistic for businesses. So that's key, and we've found that being able to do that consistently is useful."