DMP-TTS: Disentangled multi-modal Prompting for Controllable Text-to-Speech with Chained Guidance
Published:Dec 10, 2025 10:28
•1 min read
•ArXiv
Analysis
The article introduces DMP-TTS, a new approach for text-to-speech (TTS) that emphasizes control and flexibility. The use of disentangled multi-modal prompting and chained guidance suggests an attempt to improve the controllability of generated speech, potentially allowing for more nuanced and expressive outputs. The focus on 'disentangled' prompting implies an effort to isolate and control different aspects of speech generation (e.g., prosody, emotion, speaker identity).
Key Takeaways
- •DMP-TTS is a new TTS approach.
- •It uses disentangled multi-modal prompting.
- •It incorporates chained guidance for control.
Reference
“”