Search:
Match:
1 results

Analysis

This paper addresses a crucial issue in explainable recommendation systems: the factual consistency of generated explanations. It highlights a significant gap between the fluency of explanations (achieved through LLMs) and their factual accuracy. The authors introduce a novel framework for evaluating factuality, including a prompting-based pipeline for creating ground truth and statement-level alignment metrics. The findings reveal that current models, despite achieving high semantic similarity, struggle with factual consistency, emphasizing the need for factuality-aware evaluation and development of more trustworthy systems.
Reference

While models achieve high semantic similarity scores (BERTScore F1: 0.81-0.90), all our factuality metrics reveal alarmingly low performance (LLM-based statement-level precision: 4.38%-32.88%).