Search:
Match:
1 results

Analysis

This paper introduces a novel approach to monocular depth estimation using visual autoregressive (VAR) priors, offering an alternative to diffusion-based methods. It leverages a text-to-image VAR model and introduces a scale-wise conditional upsampling mechanism. The method's efficiency, requiring only 74K synthetic samples for fine-tuning, and its strong performance, particularly in indoor benchmarks, are noteworthy. The work positions autoregressive priors as a viable generative model family for depth estimation, emphasizing data scalability and adaptability to 3D vision tasks.
Reference

The method achieves state-of-the-art performance in indoor benchmarks under constrained training conditions.