LLaMA-3.2-3B fMRI-style Probing Reveals Bidirectional "Constrained ↔ Expressive" Control
Published:Dec 29, 2025 00:46
•1 min read
•r/LocalLLaMA
Analysis
This article describes an intriguing experiment using fMRI-style visualization to probe the inner workings of the LLaMA-3.2-3B language model. The researcher identified a single hidden dimension that acts as a global control axis, influencing the model's output style. By manipulating this dimension, they could smoothly transition the model's responses between restrained and expressive modes. This discovery highlights the potential for interpretability tools to uncover hidden control mechanisms within large language models, offering insights into how these models generate text and potentially enabling more nuanced control over their behavior. The methodology is straightforward, using a Gradio UI and PyTorch hooks for intervention.
Key Takeaways
- •A single hidden dimension in LLaMA-3.2-3B acts as a global control axis for output style.
- •Manipulating this dimension allows for bidirectional control between restrained and expressive outputs.
- •The findings suggest the potential for interpretability tools to reveal and control LLM behavior.
Reference
“By varying epsilon on this one dim: Negative ε: outputs become restrained, procedural, and instruction-faithful Positive ε: outputs become more verbose, narrative, and speculative”