Browse all

How Kling beat Sora in the AI race

Their new text-to-video model could well revolutionise the film industry

How Kling beat Sora in the AI race Their new text-to-video model could well revolutionise the film industry

Last February, OpenAI revealed Sora, a model similar to CHATGPT, except that with this one, the requested prompt will be able to generate very realistic videos of one minute, surpassing previous models which were limited to a few seconds. Then in May, during the Google I/O 2024 conference, Google unveiled VEO, extending Sora's video generation capabilities to more than a minute. Today, these two models, still unavailable to the public, must contend with a serious competitor: Kling, developed by Kuaishou Technology, which promises two-minute videos.

Kuaishou, mainly known for its short video sharing platform, has quickly gained popularity since its launch in 2011, becoming the second social network in China behind TikTok and also establishing itself internationally under the name Kwai. This application, offering a wide variety of content, ranging from entertainment videos to tutorials, including personal vlogs, has simultaneously strengthened its AI strategy. In August 2023, it presented its family of LLM KwaiYii and more recently its text-image model Kolors, similar to Dall-E from their competitor OpenAI. Kling, their latest innovation, currently in the testing phase, allows converting text into a two-minute video with a resolution of 1080p and a frequency of 30 frames per second, thanks, according to the company, « to an efficient training infrastructure, extreme inference optimization, and scalable infrastructure. » But the model also stands out for its flexibility in output formats : trained for variable resolution, the application allows generating videos in various Width/Height formats, thus adapting to different staging and broadcasting needs.

Kling, like Sora, uses an advanced 3D spatio-temporal attention mechanism and a transformer-type diffusion model, allowing the modeling of complex movements. Its 3D face and body reconstruction technology (3D VAE) enhances facial and body expression from a single image. Allowing its users to animate their 3D model by finely controlling its expressions and movements, such as making it dance or sing. Models like Kling could well transform the film industry, as evidenced by the upcoming screening of « Sora Shorts », a series of short films created with Sora, at the Tribeca Film Festival, demonstrating the revolutionary potential of these technologies in the 7th art. We may wonder if cinema, in a few decades, will still need actors.