SIID: Scale Invariant Pixel-Space Diffusion Model for High-Resolution Digit Generation
Analysis
Key Takeaways
- •SIID is a novel diffusion model architecture designed for scale-invariant image generation.
- •It addresses limitations of UNet and DiT architectures in handling varying image resolutions.
- •The model is trained on 64x64 MNIST and generates readable 1024x1024 digits.
“UNet heavily relies on convolution kernels, and convolution kernels are trained to a certain pixel density. Change the pixel density (by increasing the resolution of the image via upscaling) and your feature detector can no longer detect those same features.”