Block-Recurrent Dynamics in Vision Transformers
分析
This paper introduces the Block-Recurrent Hypothesis (BRH) to explain the computational structure of Vision Transformers (ViTs). The core idea is that the depth of ViTs can be represented by a small number of recurrently applied blocks, suggesting a more efficient and interpretable architecture. The authors demonstrate this by training \
要点
引用
“trained ViTs admit a block-recurrent depth structure such that the computation of the original $L$ blocks can be accurately rewritten using only $k \ll L$ distinct blocks applied recurrently.”