Search:
Match:
1 results

Analysis

This paper introduces SemDAC, a novel neural audio codec that leverages semantic codebooks derived from HuBERT features to improve speech compression efficiency and recognition accuracy. The core idea is to prioritize semantic information (phonetic content) in the initial quantization stage, allowing for more efficient use of acoustic codebooks and leading to better performance at lower bitrates compared to existing methods like DAC. The paper's significance lies in its demonstration of how incorporating semantic understanding can significantly enhance speech compression, potentially benefiting applications like speech recognition and low-bandwidth communication.
Reference

SemDAC outperforms DAC across perceptual metrics and achieves lower WER when running Whisper on reconstructed speech, all while operating at substantially lower bitrates (e.g., 0.95 kbps vs. 2.5 kbps for DAC).