Block-Recurrent Dynamics in Vision Transformers
Research#llm🔬 Research|Analyzed: Dec 25, 2025 03:55•
Published: Dec 24, 2025 05:00
•1 min read
•ArXiv VisionAnalysis
This paper introduces the Block-Recurrent Hypothesis (BRH) to explain the computational structure of Vision Transformers (ViTs). The core idea is that the depth of ViTs can be represented by a small number of recurrently applied blocks, suggesting a more efficient and interpretable architecture. The authors demonstrate this by training \
Key Takeaways
- •Introduces the Block-Recurrent Hypothesis (BRH) for ViTs.
- •Demonstrates that ViT depth can be approximated by recurrently applying a small number of blocks.
- •Presents \
- • models as block-recurrent surrogates of ViTs, achieving high accuracy with fewer blocks.
Reference / Citation
View Original"trained ViTs admit a block-recurrent depth structure such that the computation of the original $L$ blocks can be accurately rewritten using only $k \ll L$ distinct blocks applied recurrently."