Search:
Match:
2 results

Analysis

This paper addresses the limitations of Large Video Language Models (LVLMs) in handling long videos. It proposes a training-free architecture, TV-RAG, that improves long-video reasoning by incorporating temporal alignment and entropy-guided semantics. The key contributions are a time-decay retrieval module and an entropy-weighted key-frame sampler, allowing for a lightweight and budget-friendly upgrade path for existing LVLMs. The paper's significance lies in its ability to improve performance on long-video benchmarks without requiring retraining, offering a practical solution for enhancing video understanding capabilities.
Reference

TV-RAG realizes a dual-level reasoning routine that can be grafted onto any LVLM without re-training or fine-tuning.

Paper#AI World Generation🔬 ResearchAnalyzed: Jan 3, 2026 20:11

Yume-1.5: Text-Controlled Interactive World Generation

Published:Dec 26, 2025 17:52
1 min read
ArXiv

Analysis

This paper addresses limitations in existing diffusion model-based interactive world generation, specifically focusing on large parameter sizes, slow inference, and lack of text control. The proposed framework, Yume-1.5, aims to improve real-time performance and enable text-based control over world generation. The core contributions lie in a long-video generation framework, a real-time streaming acceleration strategy, and a text-controlled event generation method. The availability of the codebase is a positive aspect.
Reference

The framework comprises three core components: (1) a long-video generation framework integrating unified context compression with linear attention; (2) a real-time streaming acceleration strategy powered by bidirectional attention distillation and an enhanced text embedding scheme; (3) a text-controlled method for generating world events.