Video Gaussian Masked Autoencoders for Video Tracking
Paper#Computer Vision🔬 Research|Analyzed: Jan 3, 2026 16:27•
Published: Dec 27, 2025 06:16
•1 min read
•ArXivAnalysis
This paper introduces a novel self-supervised approach, Video-GMAE, for video representation learning. The core idea is to represent a video as a set of 3D Gaussian splats that move over time. This inductive bias allows the model to learn meaningful representations and achieve impressive zero-shot tracking performance. The significant performance gains on Kinetics and Kubric datasets highlight the effectiveness of the proposed method.
Key Takeaways
Reference / Citation
View Original"Mapping the trajectory of the learnt Gaussians onto the image plane gives zero-shot tracking performance comparable to state-of-the-art."