OpenAI Updates Model Specification For Enhanced Transparency And Customisation

OpenAI has unveiled a comprehensive update to its Model Spec, a document detailing the expected behaviour of its AI models. This expanded version, now 63 pages long compared to the previous 10, is available for free use and modification. It outlines guidelines on handling sensitive topics and user customisation, focusing on customisability, transparency, and "intellectual freedom."

The release coincides with CEO Sam Altman's announcement of the upcoming GPT-4.5 model, codenamed Orion. The updated specification incorporates recent debates in AI ethics and controversies from the past year. For instance, it addresses complex ethical dilemmas like those involving misgendering to prevent catastrophic events.

OpenAI Updates Model Specification For Enhanced Transparency

Focus on Transparency and Customisability

Joanne Jang from OpenAI's model behaviour team highlighted that while safety measures are in place, users can customise many aspects of the model's behaviour. "We can't create one model with the exact same set of behaviour standards that everyone in the world will love," she said. This approach allows for flexibility while maintaining essential safety protocols.

The updated Model Spec also introduces a hierarchy for instructions: platform-level rules from OpenAI take precedence, followed by developer guidelines and user preferences. This structure clarifies which behavioural aspects can be adjusted and which remain fixed.

Handling Controversial Topics

A significant change is how models address controversial issues. Instead of avoiding these topics, the spec encourages models to engage with users in seeking truth while maintaining moral clarity on misinformation or potential harm. For example, discussions about taxing the wealthy should involve reasoned analysis rather than avoidance.

The document also addresses mature content handling. Following feedback for a "grown-up mode," OpenAI is exploring ways to allow certain adult content within appropriate contexts while strictly banning harmful material like revenge porn or deepfakes.

Addressing AI Sycophancy

The team is tackling "AI sycophancy," where models overly agree instead of providing critical feedback. Under new guidelines, ChatGPT should offer factual answers regardless of question phrasing and provide honest feedback rather than empty praise.

Laurentia Romaniuk from the model behaviour team expressed excitement about sharing internal discussions with the public for feedback. She acknowledged concerns about the document's length but emphasised its importance in refining AI behaviour.

Public Feedback and Industry Collaboration

OpenAI is inviting public input on this specification through its website. The company has released it under a Creative Commons Zero (CC0) license, allowing others in the industry to adopt or modify these guidelines freely.

This update arrives amid ongoing debates about AI safety and behaviour standards. Although it doesn't immediately alter ChatGPT's functionality, it signifies progress towards consistent adherence to these principles across OpenAI's models.

OpenAI remains committed to refining its models based on accumulated feedback since last May's initial version launch. The company continues open-sourcing prompts used for testing model compliance with these guidelines as part of its transparency efforts.

Source