VideoZoomer: 用于长视频理解的动态时间聚焦

Paper #LLM 🔬 Research|分析: 2026年1月3日 20:19•

发布: 2025年12月26日 11:43

•

1分で読める

分析

本文介绍了VideoZoomer，一个解决多模态大型语言模型（MLLMs）在长视频理解方面局限性的新框架。通过强化学习代理实现动态时间聚焦，VideoZoomer克服了有限上下文窗口和静态帧选择的限制。结合监督微调和强化学习的两阶段训练策略是该方法的一个关键方面。结果表明，VideoZoomer在性能上优于现有模型，突出了所提出方法的有效性。

要点

引用 / 来源

查看原文

"VideoZoomer invokes a temporal zoom tool to obtain high-frame-rate clips at autonomously chosen moments, thereby progressively gathering fine-grained evidence in a multi-turn interactive manner."

ArXiv2025年12月26日 11:43

* 根据版权法第32条进行合法引用。

较旧

Magnetic and Transport Studies of the TbAgAl compound at high fields

较新

Interplay between electronic and phononic energy dissipation channels in the adsorption of CO on Cu(110)

VideoZoomer: 用于长视频理解的动态时间聚焦

分析

要点

相关分析

从未对齐图像即时进行3D场景编辑

基于选择策略的协调人形机器人操作

用于未来预测的LLM预测

📬 获取AI新闻

按类别浏览

热门话题

📬 获取AI新闻

按类别浏览

热门话题