Research Paper#Vision-Language Models, Robotics, Diffusion Models🔬 ResearchAnalyzed: Jan 3, 2026 19:51
Dream-VL & Dream-VLA: Diffusion-Based Vision-Language Models for Robotics
Published:Dec 27, 2025 14:46
•1 min read
•ArXiv
Analysis
This paper introduces Dream-VL and Dream-VLA, novel Vision-Language and Vision-Language-Action models built upon diffusion-based large language models (dLLMs). The key innovation lies in leveraging the bidirectional nature of diffusion models to improve performance in visual planning and robotic control tasks, particularly action chunking and parallel generation. The authors demonstrate state-of-the-art results on several benchmarks, highlighting the potential of dLLMs over autoregressive models in these domains. The release of the models promotes further research.
Key Takeaways
- •Introduces Dream-VL and Dream-VLA, novel Vision-Language and Vision-Language-Action models.
- •Employs diffusion-based large language models (dLLMs) for improved performance in visual planning and robotic control.
- •Demonstrates state-of-the-art results on several benchmarks, surpassing existing models.
- •Highlights the benefits of dLLMs for action chunking and parallel generation.
- •Models are released to facilitate further research.
Reference
“Dream-VLA achieves top-tier performance of 97.2% average success rate on LIBERO, 71.4% overall average on SimplerEnv-Bridge, and 60.5% overall average on SimplerEnv-Fractal, surpassing leading models such as $π_0$ and GR00T-N1.”