Research#speech recognition👥 CommunityAnalyzed: Dec 28, 2025 21:57

Can Fine-tuning ASR/STT Models Improve Performance on Severely Clipped Audio?

Published:Dec 23, 2025 04:29
1 min read
r/LanguageTechnology

Analysis

The article discusses the feasibility of fine-tuning Automatic Speech Recognition (ASR) or Speech-to-Text (STT) models to improve performance on heavily clipped audio data, a common problem in radio communications. The author is facing challenges with a company project involving metro train radio communications, where audio quality is poor due to clipping and domain-specific jargon. The core issue is the limited amount of verified data (1-2 hours) available for fine-tuning models like Whisper and Parakeet. The post raises a critical question about the practicality of the project given the data constraints and seeks advice on alternative methods. The problem highlights the challenges of applying state-of-the-art ASR models in real-world scenarios with imperfect audio.

Reference

The audios our client have are borderline unintelligible to most people due to the many domain-specific jargons/callsigns and heavily clipped voices.