Video Gaussian Masked Autoencoders for Video Tracking
Published:Dec 27, 2025 06:16
•1 min read
•ArXiv
Analysis
This paper introduces a novel self-supervised approach, Video-GMAE, for video representation learning. The core idea is to represent a video as a set of 3D Gaussian splats that move over time. This inductive bias allows the model to learn meaningful representations and achieve impressive zero-shot tracking performance. The significant performance gains on Kinetics and Kubric datasets highlight the effectiveness of the proposed method.
Key Takeaways
Reference
“Mapping the trajectory of the learnt Gaussians onto the image plane gives zero-shot tracking performance comparable to state-of-the-art.”