Frozen LVLMs for Micro-Video Recommendation: A Systematic Study

Published:Dec 26, 2025 04:56
1 min read
ArXiv

Analysis

This paper addresses a critical gap in the application of Frozen Large Video Language Models (LVLMs) for micro-video recommendation. It provides a systematic empirical evaluation of different feature extraction and fusion strategies, which is crucial for practitioners. The study's findings offer actionable insights for integrating LVLMs into recommender systems, moving beyond treating them as black boxes. The proposed Dual Feature Fusion (DFF) Framework is a practical contribution, demonstrating state-of-the-art performance.

Reference

Intermediate hidden states consistently outperform caption-based representations.