Learning Skills from Action-Free Videos
Published:Dec 24, 2025 05:00
•1 min read
•ArXiv AI
Analysis
This paper introduces Skill Abstraction from Optical Flow (SOF), a novel framework for learning latent skills from action-free videos. The core innovation lies in using optical flow as an intermediate representation to bridge the gap between video dynamics and robot actions. By learning skills in this flow-based latent space, SOF facilitates high-level planning and simplifies the translation of skills into actionable commands for robots. The experimental results demonstrate improved performance in multitask and long-horizon settings, highlighting the potential of SOF to acquire and compose skills directly from raw visual data. This approach offers a promising avenue for developing generalist robots capable of learning complex behaviors from readily available video data, bypassing the need for extensive robot-specific datasets.
Key Takeaways
Reference
“Our key idea is to learn a latent skill space through an intermediate representation based on optical flow that captures motion information aligned with both video dynamics and robot actions.”