CropVLM: 学习缩放以进行细粒度视觉语言感知

Research #llm 🔬 Research|分析: 2026年1月4日 10:38•

发布: 2025年11月25日 01:21

•

1分で読める

分析

本文介绍了CropVLM，一个专注于改进细粒度视觉语言理解的模型。其核心思想是使模型能够“放大”图像的相关部分，从而增强其将视觉细节与语言描述联系起来的能力。来源是ArXiv，表明这是一篇研究论文。

引用 / 来源

"CropVLM: Learning to Zoom for Fine-Grained Vision-Language Perception"

ArXiv2025年11月25日 01:21

* 根据版权法第32条进行合法引用。

Geometric-Photometric Event-based 3D Gaussian Ray Tracing

A Unified Thermo-Chemo-Mechanical Framework for Bulk and Frontal Polymerization: Coupled Kinetics and Front Stability