Search: stream-friendly - ai.jp.net

Research Paper #Computer Vision, Generative Models, Talking Heads 🔬 ResearchAnalyzed: Jan 3, 2026 09:30

Real-time Dyadic Talking Head Generation with Low Latency

Published:Dec 30, 2025 18:43

•

1 min read

•

ArXiv

Analysis

This paper addresses the critical latency issue in generating realistic dyadic talking head videos, which is essential for realistic listener feedback. The authors propose DyStream, a flow matching-based autoregressive model designed for real-time video generation from both speaker and listener audio. The key innovation lies in its stream-friendly autoregressive framework and a causal encoder with a lookahead module to balance quality and latency. The paper's significance lies in its potential to enable more natural and interactive virtual communication.

Key Takeaways

•Addresses the high latency problem in dyadic talking head generation.
•Proposes DyStream, a flow matching-based autoregressive model.
•Employs a stream-friendly autoregressive framework and a causal encoder with a lookahead module.
•Achieves real-time video generation with low latency (under 100 ms).
•Demonstrates state-of-the-art lip-sync quality.

Reference

“DyStream could generate video within 34 ms per frame, guaranteeing the entire system latency remains under 100 ms. Besides, it achieves state-of-the-art lip-sync quality, with offline and online LipSync Confidence scores of 8.13 and 7.61 on HDTF, respectively.”

Permalink ArXiv

Real-time Dyadic Talking Head Generation with Low Latency

Analysis

Key Takeaways

📬 Get AI News Delivered

Browse by Category

Trending Topics

📬 Get AI News Delivered

Browse by Category

Trending Topics