Research#Agent🔬 ResearchAnalyzed: Jan 10, 2026 11:23

NL2Repo-Bench: Evaluating Long-Horizon Code Generation Agents

Published:Dec 14, 2025 15:12
1 min read
ArXiv

Analysis

This ArXiv paper introduces NL2Repo-Bench, a new benchmark for evaluating coding agents. The benchmark focuses on assessing the performance of agents in generating complete and complex software repositories.

Reference

NL2Repo-Bench aims to evaluate coding agents.