Research Paper#Computer Vision, Deep Learning, Image Classification🔬 ResearchAnalyzed: Jan 3, 2026 15:53
Bayesian Self-Distillation Improves Image Classification
Published:Dec 30, 2025 11:48
•1 min read
•ArXiv
Analysis
This paper introduces Bayesian Self-Distillation (BSD), a novel approach to training deep neural networks for image classification. It addresses the limitations of traditional supervised learning and existing self-distillation methods by using Bayesian inference to create sample-specific target distributions. The key advantage is that BSD avoids reliance on hard targets after initialization, leading to improved accuracy, calibration, robustness, and performance under label noise. The results demonstrate significant improvements over existing methods across various architectures and datasets.
Key Takeaways
- •Proposes Bayesian Self-Distillation (BSD) for image classification.
- •BSD uses Bayesian inference to create sample-specific target distributions.
- •Avoids reliance on hard targets after initialization.
- •Achieves higher accuracy, better calibration, and improved robustness.
- •Outperforms existing self-distillation methods across various architectures and datasets.
Reference
“BSD consistently yields higher test accuracy (e.g. +1.4% for ResNet-50 on CIFAR-100) and significantly lower Expected Calibration Error (ECE) (-40% ResNet-50, CIFAR-100) than existing architecture-preserving self-distillation methods.”