ManchuTTS: High-Quality Speech Synthesis for an Endangered Language

Research Paper#Speech Synthesis, Low-Resource Language Processing, Endangered Languages🔬 Research|Analyzed: Jan 3, 2026 16:26
Published: Dec 27, 2025 06:21
1 min read
ArXiv

Analysis

This paper addresses the challenge of speech synthesis for the endangered Manchu language, which faces data scarcity and complex agglutination. The proposed ManchuTTS model introduces innovative techniques like a hierarchical text representation, cross-modal attention, flow-matching Transformer, and hierarchical contrastive loss to overcome these challenges. The creation of a dedicated dataset and data augmentation further contribute to the model's effectiveness. The results, including a high MOS score and significant improvements in agglutinative word pronunciation and prosodic naturalness, demonstrate the paper's significant contribution to the field of low-resource speech synthesis and language preservation.
Reference / Citation
View Original
"ManchuTTS attains a MOS of 4.52 using a 5.2-hour training subset...outperforming all baseline models by a notable margin."
A
ArXivDec 27, 2025 06:21
* Cited for critical analysis under Article 32.