RAIR: A New Benchmark for E-commerce Relevance Assessment
Analysis
Key Takeaways
- •RAIR is a new Chinese dataset for e-commerce relevance assessment.
- •It includes a general subset, a long-tail subset, and a visual salience subset.
- •RAIR aims to standardize relevance evaluation and provide a more challenging benchmark.
- •Experiments show RAIR challenges even state-of-the-art models like GPT-5.
“RAIR presents sufficient challenges even for GPT-5, which achieved the best performance.”