Unsupervised Discovery of Reasoning Behaviors in LLMs
Published:Dec 30, 2025 05:09
•1 min read
•ArXiv
Analysis
This paper introduces an unsupervised method (RISE) to analyze and control reasoning behaviors in large language models (LLMs). It moves beyond human-defined concepts by using sparse auto-encoders to discover interpretable reasoning vectors within the activation space. The ability to identify and manipulate these vectors allows for controlling specific reasoning behaviors, such as reflection and confidence, without retraining the model. This is significant because it provides a new approach to understanding and influencing the internal reasoning processes of LLMs, potentially leading to more controllable and reliable AI systems.
Key Takeaways
- •Proposes an unsupervised framework (RISE) for discovering reasoning vectors in LLMs.
- •RISE uses sparse auto-encoders to identify interpretable reasoning behaviors.
- •Enables control over specific reasoning behaviors (e.g., reflection, confidence) without retraining.
- •Discovers novel reasoning behaviors beyond human supervision.
Reference
“Targeted interventions on SAE-derived vectors can controllably amplify or suppress specific reasoning behaviors, altering inference trajectories without retraining.”