Block-Recurrent Dynamics in Vision Transformers
Analysis
This paper introduces the Block-Recurrent Hypothesis (BRH) to explain the computational structure of Vision Transformers (ViTs). The core idea is that the depth of ViTs can be represented by a small number of recurrently applied blocks, suggesting a more efficient and interpretable architecture. The authors demonstrate this by training \
Key Takeaways
- •Introduces the Block-Recurrent Hypothesis (BRH) for ViTs.
- •Demonstrates that ViT depth can be approximated by recurrently applying a small number of blocks.
- •Presents \
- • models as block-recurrent surrogates of ViTs, achieving high accuracy with fewer blocks.
Reference
“trained ViTs admit a block-recurrent depth structure such that the computation of the original $L$ blocks can be accurately rewritten using only $k \ll L$ distinct blocks applied recurrently.”