Search: 介绍了一种控制场景内摄像机的方法。 - ai.jp.net

Paper #Computer Vision, Natural Language Processing, 3D Scene Understanding 🔬 ResearchAnalyzed: Jan 3, 2026 08:39

2D-Trained Systems Adapt to 3D Scenes

Published:Dec 31, 2025 12:39

•

1 min read

•

ArXiv

Analysis

This paper addresses the challenge of applying 2D vision-language models to 3D scenes. The core contribution is a novel method for controlling an in-scene camera to bridge the dimensionality gap, enabling adaptation to object occlusions and feature differentiation without requiring pretraining or finetuning. The use of derivative-free optimization for regret minimization in mutual information estimation is a key innovation.

Key Takeaways

•Addresses the problem of applying 2D vision-language models to 3D scenes.
•Introduces a method for controlling an in-scene camera.
•Employs derivative-free optimization for improved mutual information estimation.
•Enables adaptation to object occlusions and feature differentiation.
•Avoids the need for pretraining or finetuning.

Reference

“Our algorithm enables off-the-shelf cross-modal systems trained on 2D visual inputs to adapt online to object occlusions and differentiate features.”

Permalink ArXiv

2D-Trained Systems Adapt to 3D Scenes

Analysis

Key Takeaways

📬 Get AI News Delivered

Browse by Category

Trending Topics

📬 Get AI News Delivered

Browse by Category

Trending Topics