Hands-on with Vidu: the Chinese AI video generator that reignited our excitement in a post-Sora world
7 incredible video showcases
Three months ago, ShengShu Technology and Tsinghua University unveiled Vidu, China's first long-duration, highly consistent, and dynamic video large language model. Its debut video quality rivaled that of Sora, earning it the nickname "China's Strongest Sora" from netizens.
After a period of anticipation, Vidu finally launched on July 30th, offering global users immediate access through email registration, without the need for queuing.
According to official information, Vidu now offers text-to-video and image-to-video capabilities, with options for 4-second and 8-second durations, and resolutions up to 1080P. In terms of speed, it can generate a 4-second clip in just 30 seconds - currently the fastest among similar products worldwide.
Let's dive into our comprehensive first-hand experience with Vidu.
Video test cases:
1. Paris Olympics Opening Ceremony
Input image:
Prompt 1:
Paris 2024 Olympics opening ceremony, set along the Seine River. The Eiffel Tower stands majestically in the background, bathed in the warm light of the setting sun. The riverbanks are lined with enthusiastic spectators, seated in grandstands adorned with colorful Olympic banners.
Prompt 2:
Paris 2024 Olympics opening ceremony,Severalboats, decorated with national flags and athletes, navigate the Seine, adding to the festive atmosphere. The bridges spanning the river are also crowded with onlookers.
Using the same reference image, the first video shows consistent, dynamic water ripples as the camera pans. Shifting focus to the boats, the lighting remains accurate and boat movements are realistic. Without the blurred people, it's hard to distinguish from real footage at a glance.
2. First-person immersive gaming experience
Prompt:
Experience an immersive first-person perspective in a shooting game set within a millennium-themed corridor, adorned with neon lights and futuristic decorations reminiscent of Cyberpunk 2077. The player's hands extend forward, gripping a sleek, high-tech firearm aimed at an adversary.
The name tags, health bars, and blurred UI make it look like live gameplay. Fast-moving scenes and cover actions follow physical laws, with realistic visuals and movements. Ammo count decreases after shooting. The only drawback is some screen distortion.
3. Anime style
Prompt:
Tom Cat, wearing a chef’s hat, is busy making a cake. Jerry Mouse sneaks into the kitchen, tiptoes, and secretly puts a bomb in the cake. Tom Cat, unaware, continues his work. Suddenly, the bomb explodes, turning the cake into a cloud of smoke.
The video process is smooth with no noticeable distortion, except for slight distortion in Tom Cat’s eyes in some frames. Both characters remain consistent. The detail of Jerry placing the bomb is missing, but everything else aligns with the prompt. Jerry’s sudden appearance during the cake explosion is the only unrealistic detail.
4. Funny Plot
Prompt:
An elderly man in a black cotton jacket is making popcorn in a black pot at the village entrance. The pot suddenly explodes, and an Ultraman emerges from it.
The video has high character consistency and matches the prompt. Some frames are choppy, and the old man’s coat deforms in one frame. Movements, shadows, and popcorn falling are mostly accurate, but the explosion sparks are slightly unrealistic.
5. Rapid MovementÂ
Prompt:
Under the sunset, a silver car speeds along a winding mountain road. The headlights pierce through the twilight like swords, and the car’s body gleams with a metallic sheen in the light and shadows. The driver grips the steering wheel firmly, eyes determined, as the speedometer needle climbs higher and higher.
The video has high stability, smoothness, and consistency, with a cinematic shooting style. The understanding of the prompt is average, lacking some close-up details, possibly due to the limited video length. The car’s movement and the changes in the headlights’ lighting follow the laws of physics.
6. Artistic Recreation
Prompt:
The character in the picture takes out a sunflower and starts eating the seeds.
The video character is 89% similar to the original Van Gogh-style Trump, with high consistency and stability. It accurately replicates Van Gogh’s style and presents the prompt content well.
7. Classic Anime
Prompt:
On a desolate battlefield, Kakarot from Dragon Ball stands on the shattered ground, surrounded by a scene of devastation. He clenches his fists, his face showing a determined expression, as a brilliant golden aura bursts around him.
The video accurately shows the prompt, with detailed expressions of Kakarot. Movements are smooth, visuals are stable, and the character remains consistent. The only drawback is slightly low image quality.
Key features of Vidu:
Lightning-fast generation: True to its claim, Vidu consistently produced 4-second videos in about 30 seconds.
Stylistic versatility: Beyond realistic and anime styles, users can specify various video styles in prompts. The anime style particularly impressed us, suggesting specific optimization in this area.
Dual image-to-video modes: "Reference starting frame" and "Reference character" modes add to the playfulness of the platform.
High character consistency: Both in text-to-video and image-to-video modes, character consistency remained impressively high.
Strong semantic understanding: Vidu accurately rendered most elements and actions from the prompts.
Aesthetic appeal and dynamic movement: In a sea of "Sora-like" models, Vidu stood out with its pleasing aesthetics and notably larger range of motion for subjects like race cars, while maintaining fluidity.
How to access:
Our team found Vidu addictively fun to use. Now, everyone can experience it at www.vidu.studio.