OBEYED-VLA: Robust Robotic Manipulation with Object-Centric Grounding
Analysis
This paper addresses the limitations of existing Vision-Language-Action (VLA) models in robotic manipulation, particularly their susceptibility to clutter and background changes. The authors propose OBEYED-VLA, a framework that explicitly separates perception and action reasoning using object-centric and geometry-aware grounding. This approach aims to improve robustness and generalization in real-world scenarios.
Key Takeaways
- •OBEYED-VLA disentangles perception and action reasoning for improved robustness.
- •The framework uses object-centric and geometry-aware grounding.
- •The approach demonstrates significant improvements in real-world robotic manipulation tasks.
- •Ablation studies confirm the importance of both semantic and geometry grounding.
“OBEYED-VLA substantially improves robustness over strong VLA baselines across four challenging regimes and multiple difficulty levels: distractor objects, absent-target rejection, background appearance changes, and cluttered manipulation of unseen objects.”