DICE: A New Framework for Evaluating Retrieval-Augmented Generation Systems
Analysis
Key Takeaways
- •DICE is a two-stage framework for RAG evaluation.
- •It uses probabilistic scoring (A, B, Tie) for transparent judgments.
- •Employs a Swiss-system tournament for computational efficiency.
- •Achieves high agreement with human experts.
- •Aims to improve trustworthiness and responsible deployment of RAG systems.
“DICE achieves 85.7% agreement with human experts, substantially outperforming existing LLM-based metrics such as RAGAS.”