Geometric Structure in LLMs for Bayesian Inference
Analysis
This paper investigates the geometric properties of modern LLMs (Pythia, Phi-2, Llama-3, Mistral) and finds evidence of a geometric substrate similar to that observed in smaller, controlled models that perform exact Bayesian inference. This suggests that even complex LLMs leverage geometric structures for uncertainty representation and approximate Bayesian updates. The study's interventions on a specific axis related to entropy provide insights into the role of this geometry, revealing it as a privileged readout of uncertainty rather than a singular computational bottleneck.
Key Takeaways
- •Modern LLMs exhibit a geometric structure in their value representations, similar to that found in smaller models performing exact Bayesian inference.
- •This geometry is linked to predictive entropy and uncertainty representation.
- •Targeted interventions on the entropy-aligned axis disrupt local uncertainty geometry.
- •The geometry appears to be a privileged readout of uncertainty rather than a computational bottleneck.
“Modern language models preserve the geometric substrate that enables Bayesian inference in wind tunnels, and organize their approximate Bayesian updates along this substrate.”