研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

放射肿瘤学全部门商业人工智能自动轮廓评估和实施框架。

Framework for Radiation Oncology Department-wide Evaluation and Implementation of Commercial AI Auto-contouring.

发表日期:2023 Nov 05
作者: Dominic Maes, Evan D H Gates, Juergen Meyer, John Kang, Bao-Ngoc Thi Nguyen, Myra Lavilla, Dustin Melancon, Emily S Weg, Yolanda D Tseng, Andrew Lim, Stephen R Bowen
来源: Brain Structure & Function

摘要:

放射肿瘤学中基于人工智能 (AI) 的自动轮廓具有标准化和节省时间等潜在优势。然而,商业人工智能解决方案在临床集成之前需要仔细评估。我们开发了一种多维评估方法,用于在诊所网络中测试预先训练的 AI 自动轮廓解决方案。整理的数据包括 121 名患者计划 CT(计算机断层扫描)扫描,以及来自四个诊所的总共 859 个经临床批准的用于治疗的轮廓。感兴趣区域 (ROI) 是通过三种基于人工智能的商业自动轮廓软件解决方案(AI1、AI2、AI3)生成的,涵盖以下疾病部位:大脑、头颈、胸部、腹部和骨盆。 AI 生成的轮廓与临床轮廓之间的定量一致性通过 Dice 相似系数 (DSC) 和 Hausdorff 距离 (HD) 来测量。定性评估由多位专家使用李克特量表对盲态人工智能轮廓进行评分。还进行了工作流程和可用性调查。AI1/AI2/AI3轮廓在27.8/32.8/34.1%的病例中具有高度的定量一致性(DSC>0.9),在骨盆(中位DSC=0.86/0.88/0.91)和胸部(DSC>0.91)中表现良好。中位 DSC=0.91/0.89/0.91)。所有三种解决方案在 7.4/8.8/6.1% 的病例中具有较低的定量一致性(DSC<0.5),在大脑中表现较差(中位 DSC=0.65/0.78/0.75)和 ​​H
Artificial intelligence (AI) based auto-contouring in radiation oncology has potential benefits such as standardization and time savings. However, commercial AI solutions require careful evaluation prior to clinical integration. We developed a multidimensional evaluation method to test pre-trained AI-automated contouring solutions across a network of clinics.Curated data included 121 patient planning CT (computed tomography) scans with a total of 859 clinically approved contours used for treatment from four clinics. Regions of interest (ROIs) were generated with three commercial AI-based automated contouring software solutions (AI1, AI2, AI3) spanning the following disease sites: brain, head-and-neck, thorax, abdomen, and pelvis. Quantitative agreement between AI-generated and clinical contours was measured by Dice similarity coefficient (DSC) and Hausdorff distance (HD). Qualitative assessment was performed by multiple experts scoring blinded AI-contours using a Likert scale. Workflow and usability surveying was also conducted.AI1/AI2/AI3 contours had high quantitative agreement in 27.8/32.8/34.1% of cases (DSC>0.9), performing well in pelvis (median DSC = 0.86/0.88/0.91) and thorax (median DSC = 0.91/0.89/0.91). All three solutions had low quantitative agreement in 7.4/8.8/6.1% of cases (DSC<0.5), performing worse in brain (median DSC=0.65/0.78/0.75) and H&N (median DSC=0.76/0.80/0.81). Qualitatively, AI1/AI2 contours were acceptable (rated 1-2) with at most minor edits in 70.7/74.6% of ROIs (2,906 ratings), higher for abdomen (AI1: 79.2%) and thorax (AI2: 90.2%), and lower for H&N (29.0/35.6%). An end-user survey showed strong user preference for full automation and mixed preferences for accuracy versus total number of structures generated.Our evaluation method provided a comprehensive analysis of both quantitative and qualitative measures of commercially available pre-trained AI auto-contouring algorithms. The evaluation framework served as a roadmap for clinical integration that aligned with user workflow preference.Copyright © 2023. Published by Elsevier Inc.