Frozen LVLMs for Micro-Video Recommendation: A Systematic Study

Paper #LVLM, Recommendation Systems, Micro-Video 🔬 Research|Analyzed: Jan 3, 2026 23:58•

Published: Dec 26, 2025 04:56

•

1 min read

Analysis

This paper addresses a critical gap in the application of Frozen Large Video Language Models (LVLMs) for micro-video recommendation. It provides a systematic empirical evaluation of different feature extraction and fusion strategies, which is crucial for practitioners. The study's findings offer actionable insights for integrating LVLMs into recommender systems, moving beyond treating them as black boxes. The proposed Dual Feature Fusion (DFF) Framework is a practical contribution, demonstrating state-of-the-art performance.

Key Takeaways

•Intermediate hidden states from LVLMs are better feature extractors than caption-based representations for micro-video recommendation.
•Fusion of LVLM features with ID embeddings is superior to replacing ID embeddings with LVLM features.
•The effectiveness of different layers in LVLMs varies, highlighting the importance of multi-layer feature fusion.
•The proposed Dual Feature Fusion (DFF) Framework provides a state-of-the-art approach for integrating LVLMs into micro-video recommender systems.

Reference / Citation

"Intermediate hidden states consistently outperform caption-based representations."

A

ArXivDec 26, 2025 04:56

* Cited for critical analysis under Article 32.

MiniMax M2.1 quantization experience (Q6 vs. Q8)

DGX Spark: Independent LLM training benchmarks (Much slower than advertised?)

Related Analysis

Instant 3D Scene Editing from Unposed Images

Jan 3, 2026 06:10

Coordinated Humanoid Manipulation with Choice Policies

Jan 3, 2026 06:10

LLM Forecasting for Future Prediction

Jan 3, 2026 06:10