LLM Generalization: Fine-Grained Analysis of Reasoning
Analysis
Key Takeaways
- •Introduces a novel benchmark for fine-grained analysis of LLM reasoning.
- •Compares SFT and RL tuning methods, revealing differences in generalization.
- •Highlights the importance of understanding core cognitive skills in LLMs.
- •Provides insights into designing training strategies for robust generalization.
“RL-tuned models maintain more stable behavioral profiles and resist collapse in reasoning skills, whereas SFT models exhibit sharper drift and overfit to surface patterns.”