Bengali Deepfake Audio Detection: Zero-Shot vs. Fine-Tuning

Paper #Audio Deepfake Detection 🔬 Research|Analyzed: Jan 4, 2026 00:15•

Published: Dec 25, 2025 14:53

•

1 min read

Analysis

This paper addresses the growing concern of deepfake audio, specifically focusing on the under-explored area of Bengali. It provides a benchmark for Bengali deepfake detection, comparing zero-shot inference with fine-tuned models. The study's significance lies in its contribution to a low-resource language and its demonstration of the effectiveness of fine-tuning for improved performance.

Key Takeaways

•Zero-shot inference with pre-trained models showed limited performance in detecting Bengali deepfakes.
•Fine-tuning significantly improved detection accuracy, with ResNet18 achieving the best results.
•The study provides a benchmark for Bengali deepfake audio detection, addressing a low-resource language.
•Fine-tuning is crucial for effective deepfake detection in this context.

Reference / Citation

View Original

"Fine-tuned models show strong performance gains. ResNet18 achieves the highest accuracy of 79.17%, F1 score of 79.12%, AUC of 84.37% and EER of 24.35%."

ArXivDec 25, 2025 14:53

* Cited for critical analysis under Article 32.

Older

Determination of the HERA coherent diffractive $J/ψ$ production cross section via artificial neural network

Newer

Inference in the $p_0$ model for directed networks under local differential privacy