Real-time Dyadic Talking Head Generation with Low Latency
Analysis
Key Takeaways
- •Addresses the high latency problem in dyadic talking head generation.
- •Proposes DyStream, a flow matching-based autoregressive model.
- •Employs a stream-friendly autoregressive framework and a causal encoder with a lookahead module.
- •Achieves real-time video generation with low latency (under 100 ms).
- •Demonstrates state-of-the-art lip-sync quality.
“DyStream could generate video within 34 ms per frame, guaranteeing the entire system latency remains under 100 ms. Besides, it achieves state-of-the-art lip-sync quality, with offline and online LipSync Confidence scores of 8.13 and 7.61 on HDTF, respectively.”