Research#llm📝 BlogAnalyzed: Dec 26, 2025 19:53

[P] S2ID: Scale Invariant Image Diffuser - trained on standard MNIST, generates 1024x1024 digits and at arbitrary aspect ratios with almost no artifacts at 6.1M parameters

Published:Dec 26, 2025 19:51
1 min read
r/MachineLearning

Analysis

This post introduces S2ID, a novel diffusion architecture designed to address limitations in existing models like UNet and DiT. The core issue tackled is the sensitivity of convolution kernels in UNet to pixel density changes during upscaling, leading to artifacts. S2ID also aims to improve upon DiT models, which may not effectively compress context when handling upscaled images. The author argues that pixels, unlike tokens in LLMs, are not atomic, necessitating a different approach. The model achieves impressive results, generating high-resolution images with minimal artifacts using a relatively small parameter count. The author acknowledges the code's current state, focusing instead on the architectural innovations.

Reference

Tokens in LLMs are atomic, pixels are not.