Learning Surgical Robot Policies from Videos via World Modeling
Analysis
This paper addresses the data scarcity problem in surgical robotics by leveraging unlabeled surgical videos and world modeling. It introduces SurgWorld, a world model for surgical physical AI, and uses it to generate synthetic paired video-action data. This approach allows for training surgical VLA policies that outperform models trained on real demonstrations alone, offering a scalable path towards autonomous surgical skill acquisition.
Key Takeaways
Reference
““We demonstrate that a surgical VLA policy trained with these augmented data significantly outperforms models trained only on real demonstrations on a real surgical robot platform.””