Search: acoustic-linguistic - ai.jp.net

Research Paper #Speech Synthesis, Low-Resource Language Processing, Endangered Languages 🔬 ResearchAnalyzed: Jan 3, 2026 16:26

ManchuTTS: High-Quality Speech Synthesis for an Endangered Language

Published:Dec 27, 2025 06:21

•

1 min read

•

ArXiv

Analysis

This paper addresses the challenge of speech synthesis for the endangered Manchu language, which faces data scarcity and complex agglutination. The proposed ManchuTTS model introduces innovative techniques like a hierarchical text representation, cross-modal attention, flow-matching Transformer, and hierarchical contrastive loss to overcome these challenges. The creation of a dedicated dataset and data augmentation further contribute to the model's effectiveness. The results, including a high MOS score and significant improvements in agglutinative word pronunciation and prosodic naturalness, demonstrate the paper's significant contribution to the field of low-resource speech synthesis and language preservation.

Key Takeaways

•Addresses the challenge of speech synthesis for a low-resource, agglutinative language (Manchu).
•Proposes a novel ManchuTTS model with a three-tier text representation and hierarchical attention.
•Employs flow-matching Transformer for efficient, non-autoregressive generation.
•Introduces a hierarchical contrastive loss for structured acoustic-linguistic correspondence.
•Achieves state-of-the-art results with a high MOS score and significant improvements in pronunciation and prosody.

Reference

“ManchuTTS attains a MOS of 4.52 using a 5.2-hour training subset...outperforming all baseline models by a notable margin.”

Permalink ArXiv

ManchuTTS: High-Quality Speech Synthesis for an Endangered Language

Analysis

Key Takeaways

📬 Get AI News Delivered

Browse by Category

Trending Topics

📬 Get AI News Delivered

Browse by Category

Trending Topics