Modular Addition Representations: Geometric Equivalence
Analysis
Key Takeaways
- •Different attention mechanisms (uniform vs. trainable) learn equivalent representations for modular addition.
- •The study uses topological tools to analyze the geometry of learned representations.
- •The findings suggest a common underlying algorithm for modular addition across different architectures.
“Both uniform attention and trainable attention architectures implement the same algorithm via topologically and geometrically equivalent representations.”