AstraNav-World: Unified World Model for Embodied Navigation
Published:Dec 25, 2025 15:31
•1 min read
•ArXiv
Analysis
This paper introduces AstraNav-World, a novel end-to-end world model for embodied navigation. The key innovation lies in its unified probabilistic framework that jointly reasons about future visual states and action sequences. This approach, integrating a diffusion-based video generator with a vision-language policy, aims to improve trajectory accuracy and success rates in dynamic environments. The paper's significance lies in its potential to create more reliable and general-purpose embodied agents by addressing the limitations of decoupled 'envision-then-plan' pipelines and demonstrating strong zero-shot capabilities.
Key Takeaways
- •Proposes AstraNav-World, an end-to-end world model for embodied navigation.
- •Integrates a diffusion-based video generator with a vision-language policy.
- •Achieves improved trajectory accuracy and higher success rates in experiments.
- •Demonstrates exceptional zero-shot capabilities in real-world testing.
- •Unifies foresight vision and control within a single generative model.
Reference
“The bidirectional constraint makes visual predictions executable and keeps decisions grounded in physically consistent, task-relevant futures, mitigating cumulative errors common in decoupled 'envision-then-plan' pipelines.”