LLMs Excel at Paraphrasing: A New Benchmark for Temporal Accuracy!

research#llm🔬 Research|Analyzed: Feb 13, 2026 05:01
Published: Feb 13, 2026 05:00
1 min read
ArXiv NLP

Analysis

This research introduces RECOM, a novel benchmark to evaluate how well 大规模言語モデル (LLMs) understand and respond to recent information. The findings reveal a fascinating semantic-lexical paradox, showing LLMs excel at maintaining meaning through paraphrasing. This innovative approach pushes the boundaries of how we assess the accuracy of AI.
Reference / Citation
View Original
"Our central finding is a striking semantic-lexical paradox: all models achieve over 99% cosine similarity with references despite less than 8% BLEU-1 overlap, a 90+ percentage point gap indicating that models preserve meaning through extensive paraphrasing rather than lexical reproduction."
A
ArXiv NLPFeb 13, 2026 05:00
* Cited for critical analysis under Article 32.