CoT's Faithfulness Questioned: Beyond Hint Verbalization
Published:Dec 28, 2025 18:18
•1 min read
•ArXiv
Analysis
This paper challenges the common understanding of Chain-of-Thought (CoT) faithfulness in Large Language Models (LLMs). It argues that current metrics, which focus on whether hints are explicitly verbalized in the CoT, may misinterpret incompleteness as unfaithfulness. The authors demonstrate that even when hints aren't explicitly stated, they can still influence the model's predictions. This suggests that evaluating CoT solely on hint verbalization is insufficient and advocates for a more comprehensive approach to interpretability, including causal mediation analysis and corruption-based metrics. The paper's significance lies in its re-evaluation of how we measure and understand the inner workings of CoT reasoning in LLMs, potentially leading to more accurate and nuanced assessments of model behavior.
Key Takeaways
Reference
“Many CoTs flagged as unfaithful by Biasing Features are judged faithful by other metrics, exceeding 50% in some models.”