Uncovering Competency Gaps in Large Language Models and Their Benchmarks
Published:Dec 25, 2025 05:00
•1 min read
•ArXiv NLP
Analysis
This paper introduces a novel method using sparse autoencoders (SAEs) to identify competency gaps in large language models (LLMs) and imbalances in their benchmarks. The approach extracts SAE concept activations and computes saliency-weighted performance scores, grounding evaluation in the model's internal representations. The study reveals that LLMs often underperform on concepts contrasting sycophancy and related to safety, aligning with existing research. Furthermore, it highlights benchmark gaps, where obedience-related concepts are over-represented, while other relevant concepts are missing. This automated, unsupervised method offers a valuable tool for improving LLM evaluation and development by identifying areas needing improvement in both models and benchmarks, ultimately leading to more robust and reliable AI systems.
Key Takeaways
- •Sparse autoencoders can effectively identify competency gaps in LLMs.
- •LLMs often struggle with concepts related to safety and resisting sycophancy.
- •Benchmarks may have imbalanced coverage, over-representing certain concepts.
Reference
“We found that these models consistently underperformed on concepts that stand in contrast to sycophantic behaviors (e.g., politely refusing a request or asserting boundaries) and concepts connected to safety discussions.”