研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

一种端到端的自然语言处理系统,可自动从临床文本中提取放疗事件:NLP从文本中提取放疗事件。

An end-to-end natural language processing system for automatically extracting radiotherapy events from clinical texts: NLP to extract radiotherapy events from text.

发表日期:2023 Mar 27
作者: Danielle S Bitterman, Eli Goldner, Sean Finan, David Harris, Eric B Durbin, Harry Hochheiser, Jeremy L Warner, Raymond H Mak, Timothy Miller, Guergana K Savova
来源: Int J Radiat Oncol

摘要:

放射治疗证据的真实世界几乎没有,因为通常仅在临床叙述中有记录。我们开发了一种自然语言处理系统,可以从文本中自动提取详细的放射治疗事件,以支持临床表型分析。使用了一个多机构的数据集,包括96个医生笔记,129个北美中央癌症登记协会(NAACCR)癌症摘要和270份HemOnc.org的放射治疗处方,并将其分成训练,开发和测试集。对文档进行了放射治疗事件和相关属性的注释:剂量,分数频率,分数数目,日期,治疗部位和增强。通过微调BioClinicalBERT和RoBERTa transformer模型,开发了属性的命名实体识别(NER)模型。开发了一个多类别RoBERTa-based的关系抽取模型,以将每个剂量提及与同一事件中的每个属性链接起来。将模型与符号规则相结合,创建了一个混合的端到端管道,用于全面的放射治疗事件提取。NER模型在保留的测试集上进行了评估,F1结果为剂量(0.96),分数频率(0.88),分数数目(0.94),日期(0.88),治疗部位(0.67)和增强(0.94)。当输入为黄金标记实体时,关系模型实现了平均F1 0.86的结果。端到端系统F1结果为0.81。端到端系统在NAACCR摘要(平均F1 0.90)上表现最佳,大部分内容来自医生笔记的复制粘贴。我们开发了放射治疗事件提取的方法和混合的端到端系统,这是这项任务的第一个自然语言处理系统。该系统提供了真实世界放射治疗数据收集的概念证明,并有望为未来NLP方法支持临床护理提供潜力。版权所有©2023 Elsevier Inc。
Real-world evidence for radiotherapy (RT) is limited because it is often documented only in the clinical narrative. We developed a natural language processing system for automated extraction of detailed RT events from text to support clinical phenotyping.A multi-institutional dataset of 96 clinician notes, 129 North American Association of Central Cancer Registries (NAACCR) cancer abstracts, and 270 RT prescriptions from HemOnc.org was used, and divided into train, development, and test sets. Documents were annotated for RT events and associated properties: Dose, Fraction Frequency, Fraction Number, Date, Treatment Site, and Boost. Named entity recognition (NER) models for properties were developed by fine-tuning BioClinicalBERT and RoBERTa transformer models. A multi-class RoBERTa-based relation extraction model was developed to link each dose mention with each property in the same event. Models were combined with symbolic rules to create a hybrid end-to-end pipeline for comprehensive RT event extraction.NER models were evaluated on the held-out test set with F1 results of 0.96, 0.88, 0.94, 0.88, 0.67, and 0.94 for Dose, Fraction Frequency, Fraction Number, Date, Treatment Site, and Boost, respectively. The relation model achieved an average F1 of 0.86 when the input was gold-labeled entities. The end-to-end system F1 result was 0.81. The end-to-end system performed best on NAACCR abstracts (average F1 0.90), which are mostly copy-paste content from clinician notes.We developed methods and a hybrid end-to-end system for RT event extraction, which is the first natural language processing system for this task. This system provides proof-of-concept for real-world RT data collection for research and is promising for the future potential of NLP methods to support clinical care.Copyright © 2023. Published by Elsevier Inc.