HiSciBench: A Hierarchical Benchmark for Scientific Intelligence

Paper #llm 🔬 Research|Analyzed: Jan 3, 2026 19:27•

Published: Dec 28, 2025 12:08

•

1 min read

Analysis

This paper introduces HiSciBench, a novel benchmark designed to evaluate large language models (LLMs) and multimodal models on scientific reasoning. It addresses the limitations of existing benchmarks by providing a hierarchical and multi-disciplinary framework that mirrors the complete scientific workflow, from basic literacy to scientific discovery. The benchmark's comprehensive nature, including multimodal inputs and cross-lingual evaluation, allows for a detailed diagnosis of model capabilities across different stages of scientific reasoning. The evaluation of leading models reveals significant performance gaps, highlighting the challenges in achieving true scientific intelligence and providing actionable insights for future model development. The public release of the benchmark will facilitate further research in this area.