Leveraging Suboptimal Human Interventions in Real-World RL
Analysis
This paper addresses a critical challenge in real-world reinforcement learning: how to effectively utilize potentially suboptimal human interventions to accelerate learning without being overly constrained by them. The proposed SiLRI algorithm offers a novel approach by formulating the problem as a constrained RL optimization, using a state-wise Lagrange multiplier to account for the uncertainty of human interventions. The results demonstrate significant improvements in learning speed and success rates compared to existing methods, highlighting the practical value of the approach for robotic manipulation.
Key Takeaways
- •Addresses the problem of suboptimal human interventions in real-world RL.
- •Proposes SiLRI, a state-wise Lagrangian reinforcement learning algorithm.
- •Formulates the problem as a constrained RL optimization.
- •Demonstrates significant improvements in learning speed and success rates.
- •Achieves 100% success on long-horizon manipulation tasks.
“SiLRI effectively exploits human suboptimal interventions, reducing the time required to reach a 90% success rate by at least 50% compared with the state-of-the-art RL method HIL-SERL, and achieving a 100% success rate on long-horizon manipulation tasks where other RL methods struggle to succeed.”