Act2Goal: Long-Horizon Robotic Manipulation with Visual Goals
Analysis
This paper addresses the challenge of long-horizon robotic manipulation by introducing Act2Goal, a novel goal-conditioned policy. It leverages a visual world model to generate a sequence of intermediate visual states, providing a structured plan for the robot. The integration of Multi-Scale Temporal Hashing (MSTH) allows for both fine-grained control and global task consistency. The paper's significance lies in its ability to achieve strong zero-shot generalization and rapid online adaptation, demonstrated by significant improvements in real-robot experiments. This approach offers a promising solution for complex robotic tasks.
Key Takeaways
- •Proposes Act2Goal, a goal-conditioned manipulation policy.
- •Integrates a goal-conditioned visual world model with multi-scale temporal control.
- •Utilizes Multi-Scale Temporal Hashing (MSTH) for robust execution.
- •Achieves strong zero-shot generalization and rapid online adaptation.
- •Demonstrates significant success rate improvements in real-robot experiments.
“Act2Goal achieves strong zero-shot generalization to novel objects, spatial layouts, and environments. Real-robot experiments demonstrate that Act2Goal improves success rates from 30% to 90% on challenging out-of-distribution tasks within minutes of autonomous interaction.”