Diffusion Models for Transparent Object Perception

Published:Dec 29, 2025 18:59
1 min read
ArXiv

Analysis

This paper introduces a novel approach to depth and normal estimation for transparent objects, a notoriously difficult problem for computer vision. The authors leverage the generative capabilities of video diffusion models, which implicitly understand the physics of light interaction with transparent materials. They create a synthetic dataset (TransPhy3D) to train a video-to-video translator, achieving state-of-the-art results on several benchmarks. The work is significant because it demonstrates the potential of repurposing generative models for challenging perception tasks and offers a practical solution for real-world applications like robotic grasping.

Reference

"Diffusion knows transparency." Generative video priors can be repurposed, efficiently and label-free, into robust, temporally coherent perception for challenging real-world manipulation.