Compositionality in Vision Transformers Explored with Wavelets

Research Paper#Vision Transformers, Compositionality, Wavelet Transforms🔬 Research|Analyzed: Jan 3, 2026 09:28
Published: Dec 30, 2025 19:43
1 min read
ArXiv

Analysis

This paper investigates the compositionality of Vision Transformers (ViTs) by using Discrete Wavelet Transforms (DWTs) to create input-dependent primitives. It adapts a framework from language tasks to analyze how ViT encoders structure information. The use of DWTs provides a novel approach to understanding ViT representations, suggesting that ViTs may exhibit compositional behavior in their latent space.
Reference / Citation
View Original
"Primitives from a one-level DWT decomposition produce encoder representations that approximately compose in latent space."
A
ArXivDec 30, 2025 19:43
* Cited for critical analysis under Article 32.