As AI-Generated Data Builds Up On The Internet Future Iterations Of Chatbots Will Spew Gibberish: Report

Machine Learning (ML) and Artificial Intelligence (AI) experts have cautioned that the quality of the AI Chatbots and the content they generate is as good as it gets. Future iterations of these models will spew garbage or even gibberish as they will ingest their own content.

A new research paper has concluded what several mathematicians and AI experts have been saying for quite some time. Future generations of AI models may have to base their answers on data created by past iterations. This could ultimately spiral into incomprehensible content.

AI-Generated Data Build Up On The Internet To Ruin Future Versions?

AI-Generated Data Flooding The Internet Isn't Good, Caution Scientists

"Garbage in garbage out" is an old proverb, which means anything that ingests inferior content will spew even poorer quality content. As an increasing amount of AI-generated content is published online, future AIs trained on this material may put out garbage and incomprehensible content, a group of scientists has warned.

A group of British and Canadian scientists recently released a research paper. Through their study, they attempted to understand what happens after several generations of AIs are trained off each other. Needless to say, the results were quite harrowing.

In one of the multiple instances, a ninth-generation AI chatbot ended up babbling about jackrabbits when the original source material had been about medieval architecture. In other words, an original text eventually morphed into some garbage, concluded Prof. Ross Anderson of the University of Cambridge

"The math shows that within a few generations, text becomes garbage," said Anderson who is one of the authors of the research paper. He even added that the same fate awaits images too. Graphical information too loses intelligibility, he added.

The research paper is yet to be peer-reviewed. However, the researchers are calling it "model collapse".

Are The Current Versions Of ChatGPT, Bard, And Others The Best Ever?

Any Generative AI needs large amounts of data. In fact, platforms like ChatGPT, Bard, Dalle-E, and many other Generative AI tools and services are called Large Language Models or LLMs.

Platforms like ChatGPT and OpenAI are generally understood to have been trained on vast amounts of data pulled from the internet. This data, so far, was largely generated by actual humans and not computers or AI.

But moving forward the ratio of human-generated and AI-generated data will most probably change. Needless to say, AI-generated content is being rapidly added in large amounts to the online pool of data from which future iterations of LLMs will learn.

This would allow errors and non-sensical data generated from one or multiple Generative AI models to enter into the dataset of the next generations of LLMs. This could very easily make it impossible for later AIs to distinguish between fact and fiction. The AIs will "start misinterpreting what they believe to be real, by reinforcing their own beliefs," concluded the researchers.

It is possible companies training Generative AI models will help them spot content that has been generated by an AI. This ability could also help businesses and organizations that demand original content from actual humans and not machines, especially because Generative AI platforms are getting exceptionally good, and sometimes are even better than actual humans, at creating content.