ASemConsist: Training-Free Identity Consistency in Text-to-Image Generation
Published:Dec 29, 2025 07:06
•1 min read
•ArXiv
Analysis
This paper addresses the critical challenge of maintaining character identity consistency across multiple images generated from text prompts using diffusion models. It proposes a novel framework, ASemConsist, that achieves this without requiring any training, a significant advantage. The core contributions include selective text embedding modification, repurposing padding embeddings for semantic control, and an adaptive feature-sharing strategy. The introduction of the Consistency Quality Score (CQS) provides a unified metric for evaluating performance, addressing the trade-off between identity preservation and prompt alignment. The paper's focus on a training-free approach and the development of a new evaluation metric are particularly noteworthy.
Key Takeaways
- •Proposes ASemConsist, a training-free framework for identity-consistent image generation.
- •Introduces a novel semantic control strategy using padding embeddings.
- •Employs an adaptive feature-sharing strategy to handle textual ambiguity.
- •Develops the Consistency Quality Score (CQS) for unified evaluation.
- •Achieves state-of-the-art performance, overcoming trade-offs between identity and prompt alignment.
Reference
“ASemConsist achieves state-of-the-art performance, effectively overcoming prior trade-offs.”