Search:
Match:
1 results

Analysis

This paper addresses the limitations of using text-to-image diffusion models for single image super-resolution (SISR) in real-world scenarios, particularly for smartphone photography. It highlights the issue of hallucinations and the need for more precise conditioning features. The core contribution is the introduction of F2IDiff, a model that uses lower-level DINOv2 features for conditioning, aiming to improve SISR performance while minimizing undesirable artifacts.
Reference

The paper introduces an SISR network built on a FM with lower-level feature conditioning, specifically DINOv2 features, which we call a Feature-to-Image Diffusion (F2IDiff) Foundation Model (FM).