DDSPO: Enhancing Diffusion Models with Self-Supervised Preference Learning
Analysis
This paper introduces Direct Diffusion Score Preference Optimization (DDSPO), a novel method for improving diffusion models by aligning outputs with user intent and enhancing visual quality. The key innovation is the use of per-timestep supervision derived from contrasting outputs of a pretrained reference model conditioned on original and degraded prompts. This approach eliminates the need for costly human-labeled datasets and explicit reward modeling, making it more efficient and scalable than existing preference-based methods. The paper's significance lies in its potential to improve the performance of diffusion models with less supervision, leading to better text-to-image generation and other generative tasks.
Key Takeaways
- •DDSPO is a novel method for preference-based training of diffusion models.
- •It uses per-timestep supervision derived from contrasting outputs of a pretrained reference model.
- •It eliminates the need for human-labeled data and explicit reward modeling.
- •DDSPO improves text-image alignment and visual quality.
- •It requires significantly less supervision compared to existing methods.
“DDSPO directly derives per-timestep supervision from winning and losing policies when such policies are available. In practice, we avoid reliance on labeled data by automatically generating preference signals using a pretrained reference model: we contrast its outputs when conditioned on original prompts versus semantically degraded variants.”