Accelerating Diffusion Transformers with Fidelity Optimization
Analysis
This paper addresses the slow inference speed of Diffusion Transformers (DiT) in image and video generation. It introduces a novel fidelity-optimization plugin called CEM (Cumulative Error Minimization) to improve the performance of existing acceleration methods. CEM aims to minimize cumulative errors during the denoising process, leading to improved generation fidelity. The method is model-agnostic, easily integrated, and shows strong generalization across various models and tasks. The results demonstrate significant improvements in generation quality, outperforming original models in some cases.
Key Takeaways
- •Proposes CEM, a novel fidelity-optimization plugin for accelerating Diffusion Transformers.
- •CEM minimizes cumulative errors during denoising to improve generation fidelity.
- •Model-agnostic and easily integrated into existing acceleration methods.
- •Demonstrates significant improvements in generation quality across various models and tasks.
- •Outperforms original models in some cases.
“CEM significantly improves generation fidelity of existing acceleration models, and outperforms the original generation performance on FLUX.1-dev, PixArt-$α$, StableDiffusion1.5 and Hunyuan.”