DICE: A New Framework for Evaluating Retrieval-Augmented Generation Systems
Published:Dec 27, 2025 16:02
•1 min read
•ArXiv
Analysis
This paper introduces DICE, a novel framework for evaluating Retrieval-Augmented Generation (RAG) systems. It addresses the limitations of existing evaluation metrics by providing explainable, robust, and efficient assessment. The framework uses a two-stage approach with probabilistic scoring and a Swiss-system tournament to improve interpretability, uncertainty quantification, and computational efficiency. The paper's significance lies in its potential to enhance the trustworthiness and responsible deployment of RAG technologies by enabling more transparent and actionable system improvement.
Key Takeaways
- •DICE is a two-stage framework for RAG evaluation.
- •It uses probabilistic scoring (A, B, Tie) for transparent judgments.
- •Employs a Swiss-system tournament for computational efficiency.
- •Achieves high agreement with human experts.
- •Aims to improve trustworthiness and responsible deployment of RAG systems.
Reference
“DICE achieves 85.7% agreement with human experts, substantially outperforming existing LLM-based metrics such as RAGAS.”