AutoCriteria：由大型语言模型提供支持的通用临床试验资格标准提取系统。

AutoCriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models.

Original text

发表日期：2023 Nov 11

作者： Surabhi Datta, Kyeryoung Lee, Hunki Paek, Frank J Manion, Nneka Ofoegbu, Jingcheng Du, Ying Li, Liang-Chin Huang, Jingqi Wang, Bin Lin, Hua Xu, Xiaoyan Wang

来源： Disease Models & Mechanisms

摘要：

我们的目标是建立一个通用的信息提取系统，利用大型语言模型从自由文本临床试验协议文档中提取不同疾病的细粒度资格标准信息。我们研究了该模型提取标准实体以及上下文属性（包括值、时间性和修饰符）的能力，并展示了该系统的优点和局限性。临床试验数据来自 https://ClinicalTrials.gov/。我们开发了一个系统AutoCriteria，它包括以下模块：预处理、知识摄取、基于GPT的提示建模、后处理和中期评估。最终的系统评估是对涵盖 9 种疾病的 180 项手动注释试验进行了定量和定性评估。AutoCriteria 在提取标准实体时，在所有 9 种疾病中获得了 89.42 的总体 F1 分数，其中非酒精性脂肪性肝炎最高为 95.44，最低为 95.44。乳腺癌的得分为 84.10。在识别所有疾病的所有背景信息方面，其总体准确率为 78.95%。我们的主题分析表明，对标准的准确逻辑解释是 AutoCriteria 的优点之一，而忽略主要标准则是 AutoCriteria 的缺点之一。AutoCriteria 展示了从试验文档中提取细粒度资格标准信息而无需手动注释的强大潜力。为 AutoCriteria 开发的提示可以很好地推广到不同的疾病领域。我们的评估表明，该系统可以处理复杂的场景，包括多种手臂条件和逻辑。AutoCriteria 目前涵盖多种疾病，并且有潜力在未来扩展到更多疾病。这意味着一种通用且可扩展的解决方案，有望解决现实世界中临床试验应用的复杂性。© 作者 2023。由牛津大学出版社代表美国医学信息学协会出版。

We aim to build a generalizable information extraction system leveraging large language models to extract granular eligibility criteria information for diverse diseases from free text clinical trial protocol documents. We investigate the model's capability to extract criteria entities along with contextual attributes including values, temporality, and modifiers and present the strengths and limitations of this system.The clinical trial data were acquired from https://ClinicalTrials.gov/. We developed a system, AutoCriteria, which comprises the following modules: preprocessing, knowledge ingestion, prompt modeling based on GPT, postprocessing, and interim evaluation. The final system evaluation was performed, both quantitatively and qualitatively, on 180 manually annotated trials encompassing 9 diseases.AutoCriteria achieves an overall F1 score of 89.42 across all 9 diseases in extracting the criteria entities, with the highest being 95.44 for nonalcoholic steatohepatitis and the lowest of 84.10 for breast cancer. Its overall accuracy is 78.95% in identifying all contextual information across all diseases. Our thematic analysis indicated accurate logic interpretation of criteria as one of the strengths and overlooking/neglecting the main criteria as one of the weaknesses of AutoCriteria.AutoCriteria demonstrates strong potential to extract granular eligibility criteria information from trial documents without requiring manual annotations. The prompts developed for AutoCriteria generalize well across different disease areas. Our evaluation suggests that the system handles complex scenarios including multiple arm conditions and logics.AutoCriteria currently encompasses a diverse range of diseases and holds potential to extend to more in the future. This signifies a generalizable and scalable solution, poised to address the complexities of clinical trial application in real-world settings.© The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association.