DPAR: Dynamic Patchification for Efficient Image Generation

Published:Dec 26, 2025 05:03
1 min read
ArXiv

Analysis

This paper introduces DPAR, a novel approach to improve the efficiency of autoregressive image generation. It addresses the computational and memory limitations of fixed-length tokenization by dynamically aggregating image tokens into variable-sized patches. The core innovation lies in using next-token prediction entropy to guide the merging of tokens, leading to reduced token counts, lower FLOPs, faster convergence, and improved FID scores compared to baseline models. This is significant because it offers a way to scale autoregressive models to higher resolutions and potentially improve the quality of generated images.

Reference

DPAR reduces token count by 1.81x and 2.06x on Imagenet 256 and 384 generation resolution respectively, leading to a reduction of up to 40% FLOPs in training costs. Further, our method exhibits faster convergence and improves FID by up to 27.1% relative to baseline models.