Modular Addition Representations: Geometric Equivalence

Analysis

This paper challenges the notion that different attention mechanisms lead to fundamentally different circuits for modular addition in neural networks. It argues that, despite architectural variations, the learned representations are topologically and geometrically equivalent. The methodology focuses on analyzing the collective behavior of neuron groups as manifolds, using topological tools to demonstrate the similarity across various circuits. This suggests a deeper understanding of how neural networks learn and represent mathematical operations.
Reference / Citation
View Original
"Both uniform attention and trainable attention architectures implement the same algorithm via topologically and geometrically equivalent representations."
A
ArXivDec 31, 2025 18:53
* Cited for critical analysis under Article 32.