以临床文本报告为基础进行语言模型的实证评估，以确定癌症预后。

Empirical evaluation of language modeling to ascertain cancer outcomes from clinical text reports.

Original text

发表日期：2023 Sep 02

作者： Haitham A Elmarakeby, Pavel S Trukhanov, Vidal M Arroyo, Irbaz Bin Riaz, Deborah Schrag, Eliezer M Van Allen, Kenneth L Kehl

来源： Disease Models & Mechanisms

摘要：

临床研究中关于癌症关键结果（如治疗反应和疾病进展）的纵向数据在标准癌症登记报告中没有收录。从非结构化的电子健康记录中手动提取此类结果是一项耗时和资源密集型的工作。自然语言处理（NLP）方法可以加速结果注释，但需要大量的标记数据。基于语言建模的迁移学习，特别是使用Transformer架构，已经在NLP性能上取得了提升。然而，在从非结构化文本中提取癌症结果方面，尚未进行系统性评估NLP模型训练策略。我们评估了九个NLP模型在单个学术中心的非小细胞肺癌患者的影像报告中识别癌症反应和癌症进展的表现。我们根据不同条件训练了分类模型，包括训练样本大小、分类架构和语言模型预训练。训练涉及到一组有标签的包含14,218份影像报告的数据集，涵盖1112名肺癌患者。其中的一部分模型基于预训练的语言模型DFCI-ImagingBERT，该模型是使用包含662,579份来自我们中心的27,483名患者的无标签数据集对BERT模型进行进一步预训练而创建的。基于我们的DFCI-ImagingBERT模型的分类器，在超过200名患者上训练获得了大部分实验中最好的结果；但这些结果与简单的“词袋”或卷积神经网络模型相比，成果略为突出。在开发用于临床癌症研究的从影像报告中提取结果的AI模型时，如果计算资源丰富而标记训练数据有限，可以使用大型语言模型进行零或少次训练以达到合理的性能。当计算资源较有限但有充足的已标记训练数据时，即使是简单的机器学习架构也能为此类任务取得良好的性能。© 2023. BioMed Central Ltd., part of Springer Nature.

Longitudinal data on key cancer outcomes for clinical research, such as response to treatment and disease progression, are not captured in standard cancer registry reporting. Manual extraction of such outcomes from unstructured electronic health records is a slow, resource-intensive process. Natural language processing (NLP) methods can accelerate outcome annotation, but they require substantial labeled data. Transfer learning based on language modeling, particularly using the Transformer architecture, has achieved improvements in NLP performance. However, there has been no systematic evaluation of NLP model training strategies on the extraction of cancer outcomes from unstructured text.We evaluated the performance of nine NLP models at the two tasks of identifying cancer response and cancer progression within imaging reports at a single academic center among patients with non-small cell lung cancer. We trained the classification models under different conditions, including training sample size, classification architecture, and language model pre-training. The training involved a labeled dataset of 14,218 imaging reports for 1112 patients with lung cancer. A subset of models was based on a pre-trained language model, DFCI-ImagingBERT, created by further pre-training a BERT-based model using an unlabeled dataset of 662,579 reports from 27,483 patients with cancer from our center. A classifier based on our DFCI-ImagingBERT, trained on more than 200 patients, achieved the best results in most experiments; however, these results were marginally better than simpler "bag of words" or convolutional neural network models.When developing AI models to extract outcomes from imaging reports for clinical cancer research, if computational resources are plentiful but labeled training data are limited, large language models can be used for zero- or few-shot learning to achieve reasonable performance. When computational resources are more limited but labeled training data are readily available, even simple machine learning architectures can achieve good performance for such tasks.© 2023. BioMed Central Ltd., part of Springer Nature.