M2G-Eval: A Multi-Granularity Benchmark for Code Generation Evaluation

Research Paper #Code Generation, LLMs, Benchmarking 🔬 Research|Analyzed: Jan 3, 2026 19:49•

Published: Dec 27, 2025 16:00

•

1 min read

Analysis

This paper introduces M2G-Eval, a novel benchmark designed to evaluate code generation capabilities of LLMs across multiple granularities (Class, Function, Block, Line) and 18 programming languages. This addresses a significant gap in existing benchmarks, which often focus on a single granularity and limited languages. The multi-granularity approach allows for a more nuanced understanding of model strengths and weaknesses. The inclusion of human-annotated test instances and contamination control further enhances the reliability of the evaluation. The paper's findings highlight performance differences across granularities, language-specific variations, and cross-language correlations, providing valuable insights for future research and model development.