ChatGPT Was a Happy Accident — Now OpenAI Wants to Develop AI That Understands You Without Being Told

OpenAI, the leading force in the AI industry, first ignited the current AI boom with its AI chatbot, ChatGPT. Since then, the company has continued to push boundaries with innovations in large language models.

Recently, OpenAI researcher and engineer Hunter Lightman offered a glimpse into the early days of MathGen - a team once focused on training models to solve high school math competition problems - and shared insights into what lies ahead for AI's reasoning capabilities.

OpenAI Wants to Develop AI That Understands You Without Being Told — AI-generated Image

Advancements in Mathematical Reasoning

In 2022, shortly after joining OpenAI as a researcher, Hunter Lightman witnessed the launch of ChatGPT. In the meanwhile, Lightman quietly worked behind the scenes with a team focused on improving mathematical reasoning in AI, teaching models to solve high school-level math competition problems.

"We were trying to make the models better at mathematical reasoning, which at the time they weren't very good at," Lightman told TechCrunch, describing the early days.

While OpenAI's AI agents still face challenges with complex tasks, progress in mathematical reasoning has been impressive. One of OpenAI's models recently clinched a gold medal at the International Mathematical Olympiad (IMO), a math competition for the world's top high school math students. OpenAI believes these advanced reasoning skills will extend to other domains as well.

Reinforcement Learning and AI Agents

The rise of OpenAI's reasoning models is linked to reinforcement learning (RL), a machine learning technique that provides feedback on an AI model's choices in simulated environments. RL has been around for decades; for instance, Google DeepMind's AlphaGo used RL to defeat a world champion in Go in 2016.

By 2018, OpenAI introduced its first large language model in the GPT series. These models excelled at text processing but struggled with basic math. It wasn't until 2023 that OpenAI achieved a breakthrough by combining LLMs, RL, and "test-time computation," allowing more time and computing power for problem-solving.

This combination led to "chain-of-thought" (CoT), enhancing AI performance on unseen math questions. "I could see the model starting to reason," said OpenAI researcher Ahmed El-Kishky. "It would notice mistakes and backtrack; it really felt like reading the thoughts of a person."

Scaling AI Reasoning Models

OpenAI identified two new axes for improving AI models: using more computational power during post-training and providing more time and processing power when answering questions. "OpenAI thinks a lot about not just how things are but how they will scale," said Lightman.

The Future of AI Reasoning

The goal of AI research is often seen as recreating human intelligence with computers. Since o1's launch, ChatGPT's user experience includes features like "thinking" and "reasoning." When asked if OpenAI's models truly reasoned, El-Kishky explained it from a computer science perspective.

"We're teaching the model how to efficiently expend compute to get an answer. So if you define it that way, yes, it is reasoning," said El-Kishky.

The Deliberate Path to Real AI Agents

The report further mentions that ChatGPT was a happy accident - a viral hit born from a research experiment - but OpenAI's agents are no accident at all. They're the result of years of deliberate, methodical work: solving math problems, applying reinforcement learning, scaling compute, and refining reasoning.

At OpenAI's first developer conference in 2023, CEO Sam Altman made the company's vision crystal clear: "Eventually, you'll just ask the computer for what you need and it'll do all of these tasks for you".

Moreover, OpenAI's next big goal is building AI agents that can handle more subjective, real-world tasks. While current agents work well for clear tasks like coding, they often struggle with nuance and personal preferences.

Researchers are exploring new training methods, like using multiple agents to brainstorm and pick the best answer. These improvements may show up in the upcoming GPT-5, bringing OpenAI closer to creating an AI that truly understands what you want and does it for you.

Via