RollArt: Accelerating Agentic RL Training with Disaggregated Infrastructure
Analysis
This paper addresses the challenge of efficiently training agentic Reinforcement Learning (RL) models, which are computationally demanding and heterogeneous. It proposes RollArc, a distributed system designed to optimize throughput on disaggregated infrastructure. The core contribution lies in its three principles: hardware-affinity workload mapping, fine-grained asynchrony, and statefulness-aware computation. The paper's significance is in providing a practical solution for scaling agentic RL training, which is crucial for enabling LLMs to perform autonomous decision-making. The results demonstrate significant training time reduction and scalability, validated by training a large MoE model on a large GPU cluster.
Key Takeaways
- •RollArc is a distributed system designed for efficient agentic RL training.
- •It utilizes hardware-affinity workload mapping, fine-grained asynchrony, and statefulness-aware computation.
- •RollArc achieves significant training time reduction compared to baseline methods.
- •The system demonstrates scalability by training a large MoE model on a large GPU cluster.
“RollArc effectively improves training throughput and achieves 1.35-2.05x end-to-end training time reduction compared to monolithic and synchronous baselines.”