OBEYED-VLA: Robust Robotic Manipulation with Object-Centric Grounding

Published:Dec 27, 2025 08:31
1 min read
ArXiv

Analysis

This paper addresses the limitations of existing Vision-Language-Action (VLA) models in robotic manipulation, particularly their susceptibility to clutter and background changes. The authors propose OBEYED-VLA, a framework that explicitly separates perception and action reasoning using object-centric and geometry-aware grounding. This approach aims to improve robustness and generalization in real-world scenarios.

Reference

OBEYED-VLA substantially improves robustness over strong VLA baselines across four challenging regimes and multiple difficulty levels: distractor objects, absent-target rejection, background appearance changes, and cluttered manipulation of unseen objects.