HAT: Adaptive Spatio-Temporal Alignment for 3D Perception
Analysis
This paper introduces HAT, a novel spatio-temporal alignment module for end-to-end 3D perception in autonomous driving. It addresses the limitations of existing methods that rely on attention mechanisms and simplified motion models. HAT's key innovation lies in its ability to adaptively decode the optimal alignment proposal from multiple hypotheses, considering both semantic and motion cues. The results demonstrate significant improvements in 3D temporal detectors, trackers, and object-centric end-to-end autonomous driving systems, especially under corrupted semantic conditions. This work is important because it offers a more robust and accurate approach to spatio-temporal alignment, a critical component for reliable autonomous driving perception.
Key Takeaways
- •Proposes HAT, a novel spatio-temporal alignment module for 3D perception.
- •HAT uses multiple motion models and multi-hypothesis decoding for optimal alignment.
- •Achieves state-of-the-art tracking results and improves perception accuracy in E2E AD.
- •Demonstrates robustness under corrupted semantic conditions.
“HAT consistently improves 3D temporal detectors and trackers across diverse baselines. It achieves state-of-the-art tracking results with 46.0% AMOTA on the test set when paired with the DETR3D detector.”