SWE-RM: Execution-Free Feedback for Software Engineering Agents
Published:Dec 26, 2025 08:26
•1 min read
•ArXiv
Analysis
This paper addresses the limitations of execution-based feedback (like unit tests) in training software engineering agents, particularly in reinforcement learning (RL). It highlights the need for more fine-grained feedback and introduces SWE-RM, an execution-free reward model. The paper's significance lies in its exploration of factors crucial for robust reward model training, such as classification accuracy and calibration, and its demonstration of improved performance on both test-time scaling (TTS) and RL tasks. This is important because it offers a new approach to training agents that can solve software engineering tasks more effectively.
Key Takeaways
- •Execution-free feedback via reward models is a promising alternative to execution-based feedback for training SWE agents.
- •The paper identifies classification accuracy and calibration as crucial aspects for robust reward model training in RL.
- •SWE-RM, a mixture-of-experts model, achieves state-of-the-art performance on SWE-Bench Verified.
- •The research provides insights into factors like training data scale, policy mixtures, and data source composition for training effective reward models.
Reference
“SWE-RM substantially improves SWE agents on both TTS and RL performance. For example, it increases the accuracy of Qwen3-Coder-Flash from 51.6% to 62.0%, and Qwen3-Coder-Max from 67.0% to 74.6% on SWE-Bench Verified using TTS, achieving new state-of-the-art performance among open-source models.”