CritiFusion: Improving Text-to-Image Generation Fidelity

Published:Dec 27, 2025 19:08
1 min read
ArXiv

Analysis

This paper introduces CritiFusion, a novel method to improve the semantic alignment and visual quality of text-to-image generation. It addresses the common problem of diffusion models struggling with complex prompts. The key innovation is a two-pronged approach: a semantic critique mechanism using vision-language and large language models to guide the generation process, and spectral alignment to refine the generated images. The method is plug-and-play, requiring no additional training, and achieves state-of-the-art results on standard benchmarks.

Reference

CritiFusion consistently boosts performance on human preference scores and aesthetic evaluations, achieving results on par with state-of-the-art reward optimization approaches.