Act2Goal: Long-Horizon Robotic Manipulation with Visual Goals

Published:Dec 29, 2025 15:28
1 min read
ArXiv

Analysis

This paper addresses the challenge of long-horizon robotic manipulation by introducing Act2Goal, a novel goal-conditioned policy. It leverages a visual world model to generate a sequence of intermediate visual states, providing a structured plan for the robot. The integration of Multi-Scale Temporal Hashing (MSTH) allows for both fine-grained control and global task consistency. The paper's significance lies in its ability to achieve strong zero-shot generalization and rapid online adaptation, demonstrated by significant improvements in real-robot experiments. This approach offers a promising solution for complex robotic tasks.

Reference

Act2Goal achieves strong zero-shot generalization to novel objects, spatial layouts, and environments. Real-robot experiments demonstrate that Act2Goal improves success rates from 30% to 90% on challenging out-of-distribution tasks within minutes of autonomous interaction.