KV-Tracker: Real-Time Pose Tracking with Transformers
Published:Dec 27, 2025 13:02
•1 min read
•ArXiv
Analysis
This paper addresses the computational bottleneck of multi-view 3D geometry networks for real-time applications. It introduces KV-Tracker, a novel method that leverages key-value (KV) caching within a Transformer architecture to achieve significant speedups in 6-DoF pose tracking and online reconstruction from monocular RGB videos. The model-agnostic nature of the caching strategy is a key advantage, allowing for application to existing multi-view networks without retraining. The paper's focus on real-time performance and the ability to handle challenging tasks like object tracking and reconstruction without depth measurements or object priors are significant contributions.
Key Takeaways
- •Proposes KV-Tracker, a method for real-time 6-DoF pose tracking and online reconstruction.
- •Utilizes key-value (KV) caching within a Transformer architecture for speedup.
- •Achieves up to 15x speedup during inference.
- •Model-agnostic caching allows application to existing multi-view networks.
- •Demonstrates strong performance on various datasets, including object tracking without depth or priors.
Reference
“The caching strategy is model-agnostic and can be applied to other off-the-shelf multi-view networks without retraining.”