ChatGPT Data Violates Copyrights And Compromises Privacy Of Millions, Alleges Latest Class-Action Lawsuit Against OpenAI

OpenAI unleashed a Generative AI revolution but its ChatGPT chatbot needs large amounts of data. A class-action lawsuit has alleged that this data violates the copyrights and privacy of millions of users.

The majority of Generative AI models rely on Large Language Models, which essentially means data collected from multiple sources. The new lawsuit could set a precedent and have a deep impact on the world of Artificial Intelligence.

ChatGPT Data Violates Copyrights And Compromises Privacy Of Millions?

OpenAI Stole Information Of Real People And Was Commercially Misappropriated, Alleges New Lawsuit

A class action lawsuit was filed against ChatGPT creator OpenAI in a San Francisco federal court this week. It accuses OpenAI's s technology violates the copyrights and privacy of millions of users.

The complaint states that ChatGPT's Machine Learning tech trained on texts "copied by OpenAI without consent, without credit, and without compensation." Speaking about the case, Ryan Clarkson, the managing partner of the Clarkson law firm that filed the case, said:

"The firm wants to represent real people whose information was stolen and commercially misappropriated to create this very powerful technology. All of that information is being taken at scale when it was never intended to be utilized by a large language model."

The law firm wants courts to place safeguards on how AI algorithms are trained. Additionally, the firm intends to ensure that people get compensated if their work is used.

Generative AI Draws Content From The Internet But Is It Liable For Compensation?

Massachusetts-based authors Paul Tremblay and Mona Awad are two writers who claim that ChatGPT generates "very accurate" summaries of their works. Hence, they strongly believe their books appeared in ChatGPT's database of scraped material.

The primary defense for OpenAI is that it makes "Fair Use" of copyrighted work. However, there's one more aspect that might bolster OpenAI's defense.

Large Language Models used to train platforms like ChatGPT, Mid-journey, Dall-E, Google's Bard and several other Generative AI platforms mostly rely on publicly-available information, argued Katherine Gardner, who is an intellectual property lawyer at the law firm Gunderson Dettmer:

"When you put content on a social media site or any site, you're generally granting a very broad license to the site to be able to use your content in any way."

The lawsuit could reveal multiple aspects about the datasets and how they are accessed and used to train Generative AI platforms. Moreover, it might help set some safeguards that ensure copyrights are respected and compensated if or when used.