Video Gaussian Masked Autoencoders for Video Tracking

Paper#Computer Vision🔬 Research|Analyzed: Jan 3, 2026 16:27
Published: Dec 27, 2025 06:16
1 min read
ArXiv

Analysis

This paper introduces a novel self-supervised approach, Video-GMAE, for video representation learning. The core idea is to represent a video as a set of 3D Gaussian splats that move over time. This inductive bias allows the model to learn meaningful representations and achieve impressive zero-shot tracking performance. The significant performance gains on Kinetics and Kubric datasets highlight the effectiveness of the proposed method.
Reference / Citation
View Original
"Mapping the trajectory of the learnt Gaussians onto the image plane gives zero-shot tracking performance comparable to state-of-the-art."
A
ArXivDec 27, 2025 06:16
* Cited for critical analysis under Article 32.