口咽癌总体生存风险分层的人工智能应用:对ProgTOOL的验证。
Application of artificial intelligence for overall survival risk stratification in oropharyngeal carcinoma: A validation of ProgTOOL.
发表日期:2023 Apr 06
作者:
Rasheed Omobolaji Alabi, Anni Sjöblom, Timo Carpén, Mohammed Elmusrati, Ilmo Leivo, Alhadi Almangush, Antti A Mäkitie
来源:
BIOMEDICINE & PHARMACOTHERAPY
摘要:
近年来,基于机器学习的模型在肿瘤学中用于诊断和预测结果方面出现了激增。然而,人们对模型的可重复性和对独立患者队列的泛化性(即外部验证)存在一些担忧。本研究主要为一个最近推出的公开可用的基于机器学习的预后工具(ProgTOOL)进行验证研究,用于口咽鳞状细胞癌(OPSCC)总生存风险分层的预测。此外,我们还回顾了已使用机器学习进行OPSCC预后评估的发表过的研究,以检查这些模型中有多少被外部验证,外部验证类型、外部数据集的特征以及内部验证(IV)和外部验证(EV)数据集上的诊断性能特征,并进行提取和比较。我们使用了来自赫尔辛基大学医院的163例OPSCC患者来对ProgTOOL进行泛化性验证。此外,按照系统评价和Meta分析(PRISMA)指南系统地搜索了PubMed、OvidMedline、Scopus和Web of Science数据库。ProgTOOL为OPSCC患者的总体生存分层预测提供了86.5%的平衡准确度、0.78的马修斯相关系数、净效益(0.7)和Brier分数(0.06)的预测性能。此外,在发现的31项使用了机器学习进行OPSCC预测结果的研究中,只有七项(22.6%)报告了某种形式的EV,其中三项(42.9%)分别使用了时间EV或地理EV,只有一项研究(14.2%)使用了专家作为EV的形式。大多数研究报告了在外部验证时性能下降的情况。该验证研究中的模型性能表明,它可能具有泛化性,因此将模型推荐用于临床评估更接近现实。然而,基于机器学习的外部验证模型的数量仍相对较少。这严重限制了将这些模型用于临床评估的转移,并随之减少了这些模型在日常临床实践中使用的可能性。作为一个黄金标准,我们建议使用地理上的EV和验证研究来揭示这些模型的偏见和过度拟合。这些建议有助于促进这些模型在临床实践中的实施。版权所有©2023作者。由Elsevier B.V.出版。保留所有权利。
In recent years, there has been a surge in machine learning-based models for diagnosis and prognostication of outcomes in oncology. However, there are concerns relating to the model's reproducibility and generalizability to a separate patient cohort (i.e., external validation).This study primarily provides a validation study for a recently introduced and publicly available machine learning (ML) web-based prognostic tool (ProgTOOL) for overall survival risk stratification of oropharyngeal squamous cell carcinoma (OPSCC). Additionally, we reviewed the published studies that have utilized ML for outcome prognostication in OPSCC to examine how many of these models were externally validated, type of external validation, characteristics of the external dataset, and diagnostic performance characteristics on the internal validation (IV) and external validation (EV) datasets were extracted and compared.We used a total of 163 OPSCC patients obtained from the Helsinki University Hospital to externally validate the ProgTOOL for generalizability. In addition, PubMed, OvidMedline, Scopus, and Web of Science databases were systematically searched according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.The ProgTOOL produced a predictive performance of 86.5% balanced accuracy, Mathew's correlation coefficient of 0.78, Net Benefit (0.7) and Brier score (0.06) for overall survival stratification of OPSCC patients as either low-chance or high-chance. In addition, out of a total of 31 studies found to have used ML for the prognostication of outcomes in OPSCC, only seven (22.6%) reported a form of EV. Three studies (42.9%) each used either temporal EV or geographical EV while only one study (14.2%) used expert as a form of EV. Most of the studies reported a reduction in performance when externally validated.The performance of the model in this validation study indicates that it may be generalized, therefore, bringing recommendations of the model for clinical evaluation closer to reality. However, the number of externally validated ML-based models for OPSCC is still relatively small. This significantly limits the transfer of these models for clinical evaluation and subsequently reduces the likelihood of the use of these models in daily clinical practice. As a gold standard, we recommend the use of geographical EV and validation studies to reveal biases and overfitting of these models. These recommendations are poised to facilitate the implementation of these models in clinical practice.Copyright © 2023 The Author(s). Published by Elsevier B.V. All rights reserved.