Adaptive Visual Token Pruning for Long Context LMMs

Published:Dec 28, 2025 02:40
1 min read
ArXiv

Analysis

This paper addresses the computational cost issue in Large Multimodal Models (LMMs) when dealing with long context and multiple images. It proposes a novel adaptive pruning method, TrimTokenator-LC, that considers both intra-image and inter-image redundancy to reduce the number of visual tokens while maintaining performance. This is significant because it tackles a practical bottleneck in the application of LMMs, especially in scenarios involving extensive visual information.

Reference

The approach can reduce up to 80% of visual tokens while maintaining performance in long context settings.