LLaMA-3.2-3B fMRI-style Probing Reveals Bidirectional "Constrained ↔ Expressive" Control
Analysis
Key Takeaways
- •A single hidden dimension in LLaMA-3.2-3B acts as a global control axis for output style.
- •Manipulating this dimension allows for bidirectional control between restrained and expressive outputs.
- •The findings suggest the potential for interpretability tools to reveal and control LLM behavior.
“By varying epsilon on this one dim: Negative ε: outputs become restrained, procedural, and instruction-faithful Positive ε: outputs become more verbose, narrative, and speculative”