LeanCat: A Benchmark for Category Theory in Lean

Research Paper#Artificial Intelligence, Formal Verification, Category Theory🔬 Research|Analyzed: Jan 3, 2026 08:41
Published: Dec 31, 2025 11:33
1 min read
ArXiv

Analysis

This paper introduces LeanCat, a benchmark suite for formal category theory in Lean, designed to assess the capabilities of Large Language Models (LLMs) in abstract and library-mediated reasoning, which is crucial for modern mathematics. It addresses the limitations of existing benchmarks by focusing on category theory, a unifying language for mathematical structure. The benchmark's focus on structural and interface-level reasoning makes it a valuable tool for evaluating AI progress in formal theorem proving.
Reference / Citation
View Original
"The best model solves 8.25% of tasks at pass@1 (32.50%/4.17%/0.00% by Easy/Medium/High) and 12.00% at pass@4 (50.00%/4.76%/0.00%)."
A
ArXivDec 31, 2025 11:33
* Cited for critical analysis under Article 32.