Bayesian Self-Distillation Improves Image Classification

Published:Dec 30, 2025 11:48
1 min read
ArXiv

Analysis

This paper introduces Bayesian Self-Distillation (BSD), a novel approach to training deep neural networks for image classification. It addresses the limitations of traditional supervised learning and existing self-distillation methods by using Bayesian inference to create sample-specific target distributions. The key advantage is that BSD avoids reliance on hard targets after initialization, leading to improved accuracy, calibration, robustness, and performance under label noise. The results demonstrate significant improvements over existing methods across various architectures and datasets.

Reference

BSD consistently yields higher test accuracy (e.g. +1.4% for ResNet-50 on CIFAR-100) and significantly lower Expected Calibration Error (ECE) (-40% ResNet-50, CIFAR-100) than existing architecture-preserving self-distillation methods.