Search: 研究结果表明，模型学习可转移的编程概念。 - ai.jp.net

Research Paper #Code Generation, LLMs, Benchmarking 🔬 ResearchAnalyzed: Jan 3, 2026 19:49

M2G-Eval: A Multi-Granularity Benchmark for Code Generation Evaluation

Published:Dec 27, 2025 16:00

•

1 min read

•

ArXiv

Analysis

This paper introduces M2G-Eval, a novel benchmark designed to evaluate code generation capabilities of LLMs across multiple granularities (Class, Function, Block, Line) and 18 programming languages. This addresses a significant gap in existing benchmarks, which often focus on a single granularity and limited languages. The multi-granularity approach allows for a more nuanced understanding of model strengths and weaknesses. The inclusion of human-annotated test instances and contamination control further enhances the reliability of the evaluation. The paper's findings highlight performance differences across granularities, language-specific variations, and cross-language correlations, providing valuable insights for future research and model development.

Key Takeaways

•M2G-Eval is a new benchmark for evaluating code generation in LLMs across multiple granularities and languages.
•The benchmark reveals performance differences across different code scopes.
•The study highlights the challenges in generating complex, long-form code.
•The findings suggest that models learn transferable programming concepts.

Reference

“The paper reveals an apparent difficulty hierarchy, with Line-level tasks easiest and Class-level most challenging.”

Permalink ArXiv

M2G-Eval: A Multi-Granularity Benchmark for Code Generation Evaluation

Analysis

Key Takeaways

📬 Get AI News Delivered

Browse by Category

Trending Topics

📬 Get AI News Delivered

Browse by Category

Trending Topics