Search:
Match:
1 results

Training-Free Conditional Image Embedding with LVLMs

Published:Dec 26, 2025 04:51
1 min read
ArXiv

Analysis

This paper introduces DIOR, a novel, training-free method for generating conditional image embeddings using Large Vision-Language Models (LVLMs). The significance lies in its ability to focus image representations on specific textual conditions without requiring any additional training, making it a versatile and efficient solution. The paper's contribution is particularly noteworthy because it leverages the power of pre-trained LVLMs in a novel way, achieving superior performance compared to existing training-free baselines and even some methods that require training.
Reference

DIOR outperforms existing training-free baselines, including CLIP.