Bengali Deepfake Audio Detection: Zero-Shot vs. Fine-Tuning
Published:Dec 25, 2025 14:53
•1 min read
•ArXiv
Analysis
This paper addresses the growing concern of deepfake audio, specifically focusing on the under-explored area of Bengali. It provides a benchmark for Bengali deepfake detection, comparing zero-shot inference with fine-tuned models. The study's significance lies in its contribution to a low-resource language and its demonstration of the effectiveness of fine-tuning for improved performance.
Key Takeaways
- •Zero-shot inference with pre-trained models showed limited performance in detecting Bengali deepfakes.
- •Fine-tuning significantly improved detection accuracy, with ResNet18 achieving the best results.
- •The study provides a benchmark for Bengali deepfake audio detection, addressing a low-resource language.
- •Fine-tuning is crucial for effective deepfake detection in this context.
Reference
“Fine-tuned models show strong performance gains. ResNet18 achieves the highest accuracy of 79.17%, F1 score of 79.12%, AUC of 84.37% and EER of 24.35%.”