LLMs Struggle on Underrepresented Math Problems, Especially Geometry
Analysis
This paper addresses a crucial gap in LLM evaluation by focusing on underrepresented mathematics competition problems. It moves beyond standard benchmarks to assess LLMs' reasoning abilities in Calculus, Analytic Geometry, and Discrete Mathematics, with a specific focus on identifying error patterns. The findings highlight the limitations of current LLMs, particularly in Geometry, and provide valuable insights into their reasoning processes, which can inform future research and development.
Key Takeaways
- •LLMs were evaluated on Missouri Collegiate Mathematics Competition problems.
- •DeepSeek-V3 performed best overall, but all models struggled with Geometry.
- •The study identified distinct error patterns for each LLM, highlighting areas for improvement.
Reference
“DeepSeek-V3 has the best performance in all three categories... All three LLMs exhibited notably weak performance in Geometry.”