Dream-VL & Dream-VLA: Diffusion-Based Vision-Language Models for Robotics

Research Paper#Vision-Language Models, Robotics, Diffusion Models🔬 Research|Analyzed: Jan 3, 2026 19:51
Published: Dec 27, 2025 14:46
1 min read
ArXiv

Analysis

This paper introduces Dream-VL and Dream-VLA, novel Vision-Language and Vision-Language-Action models built upon diffusion-based large language models (dLLMs). The key innovation lies in leveraging the bidirectional nature of diffusion models to improve performance in visual planning and robotic control tasks, particularly action chunking and parallel generation. The authors demonstrate state-of-the-art results on several benchmarks, highlighting the potential of dLLMs over autoregressive models in these domains. The release of the models promotes further research.
Reference / Citation
View Original
"Dream-VLA achieves top-tier performance of 97.2% average success rate on LIBERO, 71.4% overall average on SimplerEnv-Bridge, and 60.5% overall average on SimplerEnv-Fractal, surpassing leading models such as $π_0$ and GR00T-N1."
A
ArXivDec 27, 2025 14:46
* Cited for critical analysis under Article 32.