GPT-5 在医疗评估中性能下降

发布: 2025年8月21日 22:52

•

1分で読める

分析

这篇文章报告了一个令人惊讶的发现：GPT-5 在医疗保健评估 (MedHELM) 中表现出相对于 GPT-4 的轻微退步。这表明较新的模型并不总是更好，并强调了在不同领域进行严格评估的重要性。提供的 PDF 链接允许更深入地研究具体结果和方法。

引用 / 来源

"The author found a slight regression in GPT-5 performance compared to GPT-4 era models."

Hacker News2025年8月21日 22:52

* 根据版权法第32条进行合法引用。

GPT4 and the Multi-Modal, Multi-Model, Multi-Everything Future of AGI

Strengthening America’s AI leadership with the U.S. National Laboratories