CricBench: A Benchmark for LLMs in Cricket Analytics

Published:Dec 26, 2025 05:59
1 min read
ArXiv

Analysis

This paper introduces CricBench, a specialized benchmark for evaluating Large Language Models (LLMs) in the domain of cricket analytics. It addresses the gap in LLM capabilities for handling domain-specific nuances, complex schema variations, and multilingual requirements in sports analytics. The benchmark's creation, including a 'Gold Standard' dataset and multilingual support (English and Hindi), is a key contribution. The evaluation of state-of-the-art models reveals that performance on general benchmarks doesn't translate to success in specialized domains, and code-mixed Hindi queries can perform as well or better than English, challenging assumptions about prompt language.

Reference

The open-weights reasoning model DeepSeek R1 achieves state-of-the-art performance (50.6%), surpassing proprietary giants like Claude 3.7 Sonnet (47.7%) and GPT-4o (33.7%), it still exhibits a significant accuracy drop when moving from general benchmarks (BIRD) to CricBench.