NL2Repo-Bench: Evaluating Long-Horizon Code Generation Agents

Research#Agent🔬 Research|Analyzed: Jan 10, 2026 11:23
Published: Dec 14, 2025 15:12
1 min read
ArXiv

Analysis

This ArXiv paper introduces NL2Repo-Bench, a new benchmark for evaluating coding agents. The benchmark focuses on assessing the performance of agents in generating complete and complex software repositories.
Reference / Citation
View Original
"NL2Repo-Bench aims to evaluate coding agents."
A
ArXivDec 14, 2025 15:12
* Cited for critical analysis under Article 32.