HeartBench: Evaluating Anthropomorphic Intelligence in Chinese LLMs

Research Paper#LLMs, AI Evaluation, Anthropomorphic Intelligence, Chinese Language🔬 Research|Analyzed: Jan 3, 2026 23:59
Published: Dec 26, 2025 03:54
1 min read
ArXiv

Analysis

This paper introduces HeartBench, a novel framework for evaluating the anthropomorphic intelligence of Large Language Models (LLMs) specifically within the Chinese linguistic and cultural context. It addresses a critical gap in current LLM evaluation by focusing on social, emotional, and ethical dimensions, areas where LLMs often struggle. The use of authentic psychological counseling scenarios and collaboration with clinical experts strengthens the validity of the benchmark. The paper's findings, including the performance ceiling of leading models and the performance decay in complex scenarios, highlight the limitations of current LLMs and the need for further research in this area. The methodology, including the rubric-based evaluation and the 'reasoning-before-scoring' protocol, provides a valuable blueprint for future research.
Reference / Citation
View Original
"Even leading models achieve only 60% of the expert-defined ideal score."
A
ArXivDec 26, 2025 03:54
* Cited for critical analysis under Article 32.