使用脑肿瘤的真实放射学报告与放射科医生对基于 GPT-4 的 ChatGPT 的诊断性能进行比较分析。
Comparative analysis of GPT-4-based ChatGPT's diagnostic performance with radiologists using real-world radiology reports of brain tumors.
发表日期:2024 Aug 28
作者:
Yasuhito Mitsuyama, Hiroyuki Tatekawa, Hirotaka Takita, Fumi Sasaki, Akane Tashiro, Satoshi Oue, Shannon L Walston, Yuta Nonomiya, Ayumi Shintani, Yukio Miki, Daiju Ueda
来源:
EUROPEAN RADIOLOGY
摘要:
像 GPT-4 这样的大型语言模型已经展示了放射学诊断的潜力。之前调查这种潜力的研究主要利用学术期刊的测验。本研究旨在使用脑肿瘤的实际临床放射学报告来评估基于 GPT-4 的聊天生成预训练变压器(ChatGPT)的诊断能力,并将其性能与神经放射科医生和普通放射科医生的性能进行比较。我们收集了以2017 年 1 月至 2021 年 12 月期间,来自两个机构的术前脑肿瘤患者的日语数据。MRI 报告由放射科医生翻译成英文。 GPT-4 和五名放射科医生收到了报告中相同的文本发现,并被要求提出鉴别诊断和最终诊断建议。切除肿瘤的病理诊断是事实。采用 McNemar 检验和 Fisher 精确检验进行统计分析。在一项分析 150 份放射学报告的研究中,GPT-4 的最终诊断准确率达到 73%,而放射科医生的准确率在 65% 至 79% 之间。使用神经放射科医生的报告,GPT-4 的最终诊断准确率高达 80%,而使用普通放射科医生的报告的最终诊断准确率为 60%。在鉴别诊断领域,GPT-4 的准确率为 94%,而放射科医生的准确率则在 73% 至 89% 之间。值得注意的是,对于这些鉴别诊断,无论报告来自神经放射科医生还是普通放射科医生,GPT-4 的准确性保持一致。GPT-4 在区分脑肿瘤和 MRI 报告方面表现出良好的诊断能力,与神经放射科医生相当。 GPT-4 可以成为神经放射科医生最终诊断的第二意见,也是普通放射科医生和住院医师的指导工具。这项研究使用来自脑肿瘤病例的真实临床 MRI 报告评估了基于 GPT-4 的 ChatGPT 的诊断能力,揭示了其准确性在从 MRI 结果解释脑肿瘤方面与放射科医生相比具有竞争力。我们使用真实世界的脑肿瘤临床 MRI 报告研究了 GPT-4 的诊断准确性。 GPT-4 的最终诊断和鉴别诊断准确性可与神经放射科医生相媲美。 GPT-4 有潜力改善临床放射学的诊断过程。© 2024。作者。
Large language models like GPT-4 have demonstrated potential for diagnosis in radiology. Previous studies investigating this potential primarily utilized quizzes from academic journals. This study aimed to assess the diagnostic capabilities of GPT-4-based Chat Generative Pre-trained Transformer (ChatGPT) using actual clinical radiology reports of brain tumors and compare its performance with that of neuroradiologists and general radiologists.We collected brain MRI reports written in Japanese from preoperative brain tumor patients at two institutions from January 2017 to December 2021. The MRI reports were translated into English by radiologists. GPT-4 and five radiologists were presented with the same textual findings from the reports and asked to suggest differential and final diagnoses. The pathological diagnosis of the excised tumor served as the ground truth. McNemar's test and Fisher's exact test were used for statistical analysis.In a study analyzing 150 radiological reports, GPT-4 achieved a final diagnostic accuracy of 73%, while radiologists' accuracy ranged from 65 to 79%. GPT-4's final diagnostic accuracy using reports from neuroradiologists was higher at 80%, compared to 60% using those from general radiologists. In the realm of differential diagnoses, GPT-4's accuracy was 94%, while radiologists' fell between 73 and 89%. Notably, for these differential diagnoses, GPT-4's accuracy remained consistent whether reports were from neuroradiologists or general radiologists.GPT-4 exhibited good diagnostic capability, comparable to neuroradiologists in differentiating brain tumors from MRI reports. GPT-4 can be a second opinion for neuroradiologists on final diagnoses and a guidance tool for general radiologists and residents.This study evaluated GPT-4-based ChatGPT's diagnostic capabilities using real-world clinical MRI reports from brain tumor cases, revealing that its accuracy in interpreting brain tumors from MRI findings is competitive with radiologists.We investigated the diagnostic accuracy of GPT-4 using real-world clinical MRI reports of brain tumors. GPT-4 achieved final and differential diagnostic accuracy that is comparable with neuroradiologists. GPT-4 has the potential to improve the diagnostic process in clinical radiology.© 2024. The Author(s).