Pretraining for Long Video Compression

Published:Dec 29, 2025 20:29
1 min read
ArXiv

Analysis

This paper introduces a novel pretraining method (PFP) for compressing long videos into shorter contexts, focusing on preserving high-frequency details of individual frames. This is significant because it addresses the challenge of handling long video sequences in autoregressive models, which is crucial for applications like video generation and understanding. The ability to compress a 20-second video into a context of ~5k length with preserved perceptual quality is a notable achievement. The paper's focus on pretraining and its potential for fine-tuning in autoregressive video models suggests a practical approach to improving video processing capabilities.

Reference

The baseline model can compress a 20-second video into a context at about 5k length, where random frames can be retrieved with perceptually preserved appearances.