What is Sora? We Answer All Your Questions About OpenAI’s New Text-to-Video Model

Microsoft-backed OpenAI aims to pull ahead of the competition with its latest innovation. Sora is a cutting-edge video-generation model that will dispel any skepticism over the company's future.

OpenAI's website explains, "We're teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction."

What is Sora?

Sora is a text-to-video model that can generate photorealistic videos up to 60 seconds long using text prompts. The company blog claims, "Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user's prompt."

The company highlights that the model possesses the ability to comprehend how objects exist in the physical world. Additionally, it can accurately interpret props and create compelling characters that express vibrant emotions. Notably, the model can generate a video based on a still image, filling in missing frames in an existing video, or extending its duration.

How does Sora Work?

Imagine initiating with a TV displaying a static, distorted picture, and then slowly eliminating the fuzziness until a clear, dynamic video emerges. This is essentially the function of Sora. It's a specialized program utilizing "transformer architecture" to progressively eliminate noise and craft videos.

Unlike generating videos frame by frame, Sora can generate entire videos in one go. Users can guide the video's content by providing text descriptions, ensuring, for instance, that a person remains visible even if they momentarily move off-screen.

Similar to how GPT models generate text based on words, Sora operates in a comparable manner but with images and videos. It dissects videos into smaller components known as patches.

The official blog post said, "Sora builds on past research in DALL·E and GPT models. It uses the recaptioning technique from DALL·E 3, which involves generating highly descriptive captions for the visual training data. As a result, the model is able to follow the user's text instructions in the generated video more faithfully."

Is Sora Available Publicly?

While Sora was announced on February 15, most of us will have to wait before getting our hands on OpenAI's new AI model. The company said, "We are also granting access to a number of visual artists, designers, and filmmakers to gain feedback on how to advance the model to be most helpful for creative professionals."

Are there Any Limitations to Sora?

The existing model exhibits certain limitations. It may face challenges in accurately simulating the physics of intricate scenes and understanding specific cause-and-effect instances. One example provided by the company was, "a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark."

Additionally, the model may encounter issues with spatial details, such as confusing left and right orientations. It may also struggle with providing precise descriptions of events unfolding over time, such as accurately following a specific camera trajectory.

Is Sora Safe?

OpenAI, in its official website statement, has outlined the implementation of various safety measures prior to the integration of Sora into its products. The company emphasized its collaboration with a team of domain experts specializing in areas such as misinformation, hateful content, and bias.

These experts will actively conduct adversarial testing on Sora. Additionally, OpenAI is developing tools, including a detection classifier, to identify misleading content and ascertain whether a video has been generated by Sora. "We'll be engaging policymakers, educators, and artists around the world to understand their concerns and to identify positive use cases for this new technology."

OpenAI's introduction of Sora coincides with a period where AI video generation, exemplified by models like Stability AI, has demonstrated remarkable capabilities in text-to-video conversion. Initial observations suggest that Sora significantly outpaces current generative AI video creation models.

Source