SWE-RM: Execution-Free Feedback for Software Engineering Agents
Analysis
This paper addresses the limitations of execution-based feedback (like unit tests) in training software engineering agents, particularly in reinforcement learning (RL). It highlights the need for more fine-grained feedback and introduces SWE-RM, an execution-free reward model. The paper's significance lies in its exploration of factors crucial for robust reward model training, such as classification accuracy and calibration, and its demonstration of improved performance on both test-time scaling (TTS) and RL tasks. This is important because it offers a new approach to training agents that can solve software engineering tasks more effectively.
Key Takeaways
- •Execution-free feedback via reward models is a promising alternative to execution-based feedback for training SWE agents.
- •The paper identifies classification accuracy and calibration as crucial aspects for robust reward model training in RL.
- •SWE-RM, a mixture-of-experts model, achieves state-of-the-art performance on SWE-Bench Verified.
- •The research provides insights into factors like training data scale, policy mixtures, and data source composition for training effective reward models.
Reference / Citation
View Original"SWE-RM substantially improves SWE agents on both TTS and RL performance. For example, it increases the accuracy of Qwen3-Coder-Flash from 51.6% to 62.0%, and Qwen3-Coder-Max from 67.0% to 74.6% on SWE-Bench Verified using TTS, achieving new state-of-the-art performance among open-source models."