Bengali Deepfake Audio Detection: Zero-Shot vs. Fine-Tuning
Paper#Audio Deepfake Detection🔬 Research|Analyzed: Jan 4, 2026 00:15•
Published: Dec 25, 2025 14:53
•1 min read
•ArXivAnalysis
This paper addresses the growing concern of deepfake audio, specifically focusing on the under-explored area of Bengali. It provides a benchmark for Bengali deepfake detection, comparing zero-shot inference with fine-tuned models. The study's significance lies in its contribution to a low-resource language and its demonstration of the effectiveness of fine-tuning for improved performance.
Key Takeaways
- •Zero-shot inference with pre-trained models showed limited performance in detecting Bengali deepfakes.
- •Fine-tuning significantly improved detection accuracy, with ResNet18 achieving the best results.
- •The study provides a benchmark for Bengali deepfake audio detection, addressing a low-resource language.
- •Fine-tuning is crucial for effective deepfake detection in this context.
Reference / Citation
View Original"Fine-tuned models show strong performance gains. ResNet18 achieves the highest accuracy of 79.17%, F1 score of 79.12%, AUC of 84.37% and EER of 24.35%."