Learning 3D Representations from Videos Without 3D Scans
Published:Dec 28, 2025 18:59
•1 min read
•ArXiv
Analysis
This paper addresses the challenge of acquiring large-scale 3D data for self-supervised learning. It proposes a novel approach, LAM3C, that leverages video-generated point clouds from unlabeled videos, circumventing the need for expensive 3D scans. The creation of the RoomTours dataset and the noise-regularized loss are key contributions. The results, outperforming previous self-supervised methods, highlight the potential of videos as a rich data source for 3D learning.
Key Takeaways
- •Proposes LAM3C, a self-supervised framework for 3D learning from video-generated point clouds.
- •Introduces RoomTours, a video-generated point cloud dataset.
- •Employs a noise-regularized loss to improve representation learning.
- •Achieves state-of-the-art performance on indoor segmentation tasks without using real 3D scans.
Reference
“LAM3C achieves higher performance than the previous self-supervised methods on indoor semantic and instance segmentation.”