Reddit proposes to train Claude for allegedly “scratch” user comments

Social media platform Reddit sued artificial intelligence company Anthropic on Wednesday, accusing it of illegally "scraping" comments from millions of Reddit users to train its chatbot Claude.

Reddit claims that despite being asked not to do so, Anthropic has used an automated robot to access Reddit's content and "intentionally trained Reddit users' personal data without request for consent."

Anthropic said in a statement that it disagrees with Reddit's claim that "will defend itself."

Reddit filed a lawsuit Wednesday in the California Superior Court in San Francisco, where both companies are there.

"AI companies should not be allowed to scrape information and content from people without explicit restrictions on how they use the data," Reddit chief legal officer said in a statement Wednesday.

Reddit License Agreement

Reddit has previously signed license agreements with Google, OpenAI and other companies that are able to train their AI systems in public comments of Reddit's more than 100 million daily users.

These protocols “enable us to perform meaningful protections for users, including deletion of your content, user privacy protections, and the right to prevent users from spam using this content.”

The licensing agreement also helped the 20-year-old online platform raise funds ahead of Wall Street's debut last year.

Claude and Alex

Anthropomorphism was founded in 2021 by former Openai executives, and its flagship Claude Chatbot remains a key competitor to Openai Chatgpt. Despite Openai's close ties with Microsoft, Anthropic's main business partner is Amazon, which uses Claude to improve its widely used Alexa voice assistant.

Like other AI companies, Anthropic relies heavily on sites like Wikipedia and Reddit, which are the depth of written materials that can help teach AI assistants’ human language patterns.

The company, in a 2021 paper co-authored by Human CEO Dario Amodei, cited in the lawsuit - the company's researchers identified consultants who belong to gardening, history, relationships, relationship suggestions or ideas that people are bathing, which contains the highest quality AI training data.

In 2023, human opponents argued in a letter to the U.S. Copyright Office that “Claude was trained in a way that qualifies for typical legal use of materials” by making copies of information to conduct statistical analysis of large amounts of data. It has been fighting lawsuits from major music publishers that accused Claude of refuting the lyrics of the copyrighted song.

But Reddit's lawsuit is different from other lawsuits against AI companies because it does not claim copyright infringement. Instead, it focuses on alleged violations of Reddit's terms of use and creating unfair competition.

The Associated Press and OpenAI have licenses and technical agreements that allow OpenAI to access a portion of the AP's text archives.