Digital Media Net - Your Gateway To Digital media Creation. News and information on Digital Video, VR, Animation, Visual Effects, Mac Based media. Post Production, CAD, Sound and Music
Categories: News

Day1/5: SkyReels-A3: The Art of Natural Speech for Digital Humans

SINGAPORE, Aug. 11, 2025 /PRNewswire/ — The Skywork AI Technology Release Week officially kicked off on August 11. From August 11 to August 15, a new model will be unveiled each day, covering cutting-edge models for  multimodal AI scenarios.

On August 11, Skywork officially launched the SkyReels-A3 model. Combining a Diffusion Transformer (DiT) model, frame interpolation for extended video generation, reinforcement learning-based motion refinement, and controllable camera techniques, SkyReels-A3 supports full-modality, audio-driven digital human synthesis with unrestricted duration.

The SkyReels-A3 model is now live! Visit the SkyReels official website to try it out:

Links
SkyReels-A3 homepage:

https://skyworkai.github.io/skyreels-a3.github.io/

SkyReels official website (After logging in, select the “Talking Avatar” tool from the left navigation bar):

https://www.skyreels.ai/home

SkyReels open-source model repository:

https://huggingface.co/Skywork 

SkyReels-A3 is an audio-driven portrait video generation model that acts like an “AI vocal cord” for any photo or video:

  • Bring photos to life: Upload a portrait image and a voice clip – the person in the photo will lip-sync and speak or sing naturally;
  • Generate custom videos: Upload a portrait, add a voice clip, and provide a text prompt – the character will perform with directed expressions and motions;
  • Re-dub existing videos: Replace the original audio, and the model will automatically adjust lip movements, facial expressions, and gestures while preserving visual continuity.

The SkyReels-A3 model delivers innovative experiences across four key dimensions:

  1. Text Prompt input enables dynamic scene modification;
  2. Enhanced Natural Movements – More lifelike interactions, including object handling and natural hand gestures during speech;
  3. Advanced Cinematic Control – Sophisticated camera work for artistic scenes (music/MVs) with elevated aesthetic quality;
  4. Extended Video Generation – Single-shot videos up to 60 seconds; multi-shot sequences with unlimited duration potential.

Through analysis of real-world applications (e.g., advertising, live-stream commerce), we identified two key requirements: longer-duration videos with consistent quality, and more natural and precise interactive motions. To address these, we developed specialized training datasets for live-stream scenarios and implemented targeted optimizations in video generation.

Moreover, in scenarios requiring high artistic fidelity—such as music videos, film clips, or professional presentations—traditional digital humans are limited to generating “static shots,” producing rigid and visually flat results.

To enable dynamic cinematography, we developed a ControlNet-based camera control module. By processing precise camera parameters, the system achieves frame-accurate camera motion control. Specifically, the module extracts depth data from reference images, and integrates user-defined camera parameters to render trajectory-guided reference videos. It uses these videos as explicit motion priors to reconstruct professional-grade camera movements frame-by-frame. The output is digital human videos with cinematic-quality camera work.

Currently, we offer eight preset camera movement parameters: static shot, push in, push out, pan left, pan right, crane up, crane down, and handheld swing shot. Each movement type supports continuous intensity adjustment from 0-100%, allowing users to achieve precisely tailored cinematographic effects for diverse needs.

SkyReels-A3 is built upon a Diffusion Transformer (DiT) video diffusion model framework.

The DiT model has garnered significant attention for its exceptional performance in image and video generation. By replacing traditional U-Net architectures with a Transformer structure, it demonstrates superior capability in capturing long-range dependencies. In SkyReels-A3, we employ a 3D Variational Autoencoder (3D-VAE) to process video data in latent space representation. The 3D-VAE compresses video data across both spatial and temporal dimensions, transforming high-dimensional raw video data into compact latent representations. This latent-space processing approach substantially reduces the computational load for subsequent diffusion models while preserving critical visual information.

SkyReels-A3’s performance has been rigorously validated through extensive experimentation, including both quantitative and qualitative comparisons against state-of-the-art models (both open-source and proprietary). The results comprehensively demonstrate its capabilities in audio-driven video generation.

In addition, through step distillation techniques, we reduced the required inference steps from 40 to just 4 while maintaining comparable output quality.

From celluloid to digital, 2D to 3D – each imaging revolution has redrawn the boundaries of content creation.

SkyReels-A3 pioneers democratized voice-to-video synthesis, delivering studio-quality animation from just a single image and audio clip – no specialized hardware or production expertise required.

SkyReels-A3 animates static photos into lifelike talking portraits, overdubs speech in existing videos without face replacement, and delivers flawlessly smooth digital human livestreams. By offering an accessible, cost-effective, and high-fidelity AI solution, it serves diverse fields—from film production and virtual streaming to game development and educational content creation. With SkyReels-A3, personalized and interactive content has never been easier to produce.

SkyReels-A3 brings the “voice as vision” paradigm to life—where your inspiration could spark the next viral sensation.

View original content:https://www.prnewswire.com/news-releases/day15-skyreels-a3-the-art-of-natural-speech-for-digital-humans-302526394.html

SOURCE Skywork AI pte ltd

Staff

Recent Posts

WiMi Studies Hybrid Quantum-Classical Learning Architecture for Multi-Class Image Classification

BEIJING, Dec. 4, 2025 /PRNewswire/ -- WiMi Hologram Cloud Inc. (NASDAQ: WiMi) ("WiMi" or the…

1 hour ago

ACADEMY AWARD-WINNING SEAN BAKER TO DEBUT FILM, ‘SANDIWARA’ AS PART OF SELF-PORTRAIT RESIDENCY

LONDON, Dec. 4, 2025 /PRNewswire/ -- self-portrait, the London-based fashion house, is proud to announce that…

1 hour ago

Award-Winning Designer & Filmmaker Nick Ace Joins Otherlife as Executive Creative Director

Fmr Cash App Design Lead brings nearly two decades of experience leading design and strategy…

1 hour ago

Abu Dhabi Gaming Announced as Co-Host of Global Games Show 2025

Global Games Show 2025 brings together leading game developers, publishers, esports teams, content creators, investors and…

1 hour ago

InFlow Launches to Power Frictionless Growth and Monetization for AI Agents

AI-native payments platform enables agents and businesses to onboard and transact directly within their workflows.…

10 hours ago

New Motion Design Best Practices Guide Published by Digital Silk Web Design Agency

MIAMI, Dec. 4, 2025 /PRNewswire/ -- A new article outlining key considerations for creating effective motion…

11 hours ago