Uncovering Competency Gaps in Large Language Models and Their Benchmarks
Research#llm🔬 Research|Analyzed: Dec 25, 2025 09:40•
Published: Dec 25, 2025 05:00
•1 min read
•ArXiv NLPAnalysis
This paper introduces a novel method using sparse autoencoders (SAEs) to identify competency gaps in large language models (LLMs) and imbalances in their benchmarks. The approach extracts SAE concept activations and computes saliency-weighted performance scores, grounding evaluation in the model's internal representations. The study reveals that LLMs often underperform on concepts contrasting sycophancy and related to safety, aligning with existing research. Furthermore, it highlights benchmark gaps, where obedience-related concepts are over-represented, while other relevant concepts are missing. This automated, unsupervised method offers a valuable tool for improving LLM evaluation and development by identifying areas needing improvement in both models and benchmarks, ultimately leading to more robust and reliable AI systems.
Key Takeaways
- •Sparse autoencoders can effectively identify competency gaps in LLMs.
- •LLMs often struggle with concepts related to safety and resisting sycophancy.
- •Benchmarks may have imbalanced coverage, over-representing certain concepts.
Reference / Citation
View Original"We found that these models consistently underperformed on concepts that stand in contrast to sycophantic behaviors (e.g., politely refusing a request or asserting boundaries) and concepts connected to safety discussions."