Search:
Match:
1 results

Analysis

This paper introduces RAIR, a new benchmark dataset for evaluating the relevance of search results in e-commerce. It addresses the limitations of existing benchmarks by providing a more complex and comprehensive evaluation framework, including a long-tail subset and a visual salience subset. The paper's significance lies in its potential to standardize relevance assessment and provide a more challenging testbed for LLMs and VLMs in the e-commerce domain. The creation of a standardized framework and the inclusion of visual elements are particularly noteworthy.
Reference

RAIR presents sufficient challenges even for GPT-5, which achieved the best performance.