BertsWin: Accelerating 3D Medical Image Analysis with Topological Preservation

Paper #Medical Imaging, Deep Learning, Transformers 🔬 Research|Analyzed: Jan 4, 2026 00:08•

Published: Dec 25, 2025 19:32

•

1 min read

Analysis

This paper addresses the challenge of applying self-supervised learning (SSL) and Vision Transformers (ViTs) to 3D medical imaging, specifically focusing on the limitations of Masked Autoencoders (MAEs) in capturing 3D spatial relationships. The authors propose BertsWin, a hybrid architecture that combines BERT-style token masking with Swin Transformer windows to improve spatial context learning. The key innovation is maintaining a complete 3D grid of tokens, preserving spatial topology, and using a structural priority loss function. The paper demonstrates significant improvements in convergence speed and training efficiency compared to standard ViT-MAE baselines, without incurring a computational penalty. This is a significant contribution to the field of 3D medical image analysis.