Open AI launches Sora, a revolutionary video tool that uses artificial intelligence

The text-to-video model is capable of creating high-quality images, although the company said it is taking precautions before making it available to the public

Screenshot of a video published by OpenAI and generated with the generative AI tool Sora.

The fascination with ChatGPT and generative artificial intelligence language models has not yet passed, but OpenAI has just introduced a dazzling and revolutionary video creation tool called Sora. With it, users just provide a description of what they want to see on the screen and there it is, created by artificial intelligence. Some videos look more polished than others, and sometimes they have a video game feel that differentiates them from reality, but they are all surprising in their own way.

The CEO of OpenAI, Sam Altman, announced the launch on the social media platform X, and the tool was immediately flooded with requests for new creations. Realistic, futuristic, crazy, cartoonish... the videos include all kinds of automatic creations created with generative artificial intelligence. Sora is capable of creating entire videos in one go or expanding the generated videos to make them longer.

Sora can generate complex scenes with multiple characters, specific types of movement, and precise subject and background details. The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world, according to OpenAI. The model has a deep understanding of language, allowing it to accurately interpret cues and generate compelling characters that express vibrant emotions, the company said.

“Here is Sora, our video generation model,” Altman wrote. “Today we are starting red-teaming and offering access to a limited number of creators,” he added, alluding to cybersecurity experts called “red teamers” who evaluate the product for harms and risks. Open AI is also giving early access to a range of visual artists, designers and filmmakers to give feedback on how to improve the model to make it more useful to creative professionals.

The instructions may be more or less detailed. One of the examples offered by OpenAI responds to the following prompt: “A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.” And the result is surprising.

Another video was created with the following prompt: “A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.”

In addition to being able to generate a video solely from text instructions, the model is capable of taking an existing still image and generating a video from it, animating the image content with precision and attention to small details. The model can also take an existing video and enlarge it or fill in missing frames.

Altman has been publishing new videos requested by tweeters, proving that the results are immediate. Sora can also create multiple shots within a single generated video while accurately maintaining characters and visual style.

“We’re teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction,” explains OpenAI’s website introducing the new text-to-video tool. “Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt.”

Improvements required

The artificial intelligence firm acknowledges that Sora still has some very obvious flaws: “It may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark. The model may also confuse spatial details of a prompt, for example, mixing up left and right, and may struggle with precise descriptions of events that take place over time, like following a specific camera trajectory.”

Before making the tool available to the public, OpenAI promises to take some safety precautions. In addition to having the model checked by the “red teamers,” the company said that they are also building tools to help detect misleading content such as a detection classifier that can tell when a video was generated by Sora.

“We’ll be engaging policymakers, educators and artists around the world to understand their concerns and to identify positive use cases for this new technology. Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it,” the company warned. “That’s why we believe that learning from real-world use is a critical component of creating and releasing increasingly safe AI systems over time.”

Sign up for our weekly newsletter to get more English-language news coverage from EL PAÍS USA Edition

More information

Archived In