MoniRefer: A New Dataset for 3D Visual Grounding in Roadside Infrastructure

Research Paper #Computer Vision, 3D Visual Grounding, Roadside Infrastructure, Multi-modal Learning 🔬 Research|Analyzed: Jan 3, 2026 08:53•

Published: Dec 31, 2025 03:56

•

1 min read

•ArXiv

Analysis

This paper introduces a novel dataset, MoniRefer, for 3D visual grounding specifically tailored for roadside infrastructure. This is significant because existing datasets primarily focus on indoor or ego-vehicle perspectives, leaving a gap in understanding traffic scenes from a broader, infrastructure-level viewpoint. The dataset's large scale and real-world nature, coupled with manual verification, are key strengths. The proposed method, Moni3DVG, further contributes to the field by leveraging multi-modal data for improved object localization.