HeartBench: Evaluating Anthropomorphic Intelligence in Chinese LLMs

Research Paper #LLMs, AI Evaluation, Anthropomorphic Intelligence, Chinese Language 🔬 Research|Analyzed: Jan 3, 2026 23:59•

Published: Dec 26, 2025 03:54

•

1 min read

•ArXiv

Analysis

This paper introduces HeartBench, a novel framework for evaluating the anthropomorphic intelligence of Large Language Models (LLMs) specifically within the Chinese linguistic and cultural context. It addresses a critical gap in current LLM evaluation by focusing on social, emotional, and ethical dimensions, areas where LLMs often struggle. The use of authentic psychological counseling scenarios and collaboration with clinical experts strengthens the validity of the benchmark. The paper's findings, including the performance ceiling of leading models and the performance decay in complex scenarios, highlight the limitations of current LLMs and the need for further research in this area. The methodology, including the rubric-based evaluation and the 'reasoning-before-scoring' protocol, provides a valuable blueprint for future research.