MoniRefer: A New Dataset for 3D Visual Grounding in Roadside Infrastructure

Analysis

This paper introduces a novel dataset, MoniRefer, for 3D visual grounding specifically tailored for roadside infrastructure. This is significant because existing datasets primarily focus on indoor or ego-vehicle perspectives, leaving a gap in understanding traffic scenes from a broader, infrastructure-level viewpoint. The dataset's large scale and real-world nature, coupled with manual verification, are key strengths. The proposed method, Moni3DVG, further contributes to the field by leveraging multi-modal data for improved object localization.
Reference / Citation
View Original
"“...the first real-world large-scale multi-modal dataset for roadside-level 3D visual grounding.”"
A
ArXivDec 31, 2025 03:56
* Cited for critical analysis under Article 32.