基于美国全国范围内的基因组分析数据,对13,522名患者进行了原发肿瘤类型的预测。
Primary tumor type prediction based on US nationwide genomic profiling data in 13,522 patients.
发表日期:2023
作者:
Yunru Huang, Shannon M Pfeiffer, Qing Zhang
来源:
Computational and Structural Biotechnology Journal
摘要:
及时准确的原发肿瘤诊断十分重要,误诊和延误可能导致过大的健康和经济负担。为了基于来自一个去身份化美国全国级临床基因组数据库(CGDB)的基因组数据预测原发肿瘤类型,我们开发了基于XGBoost的临床基因组机器学习模型(XC-GeM),该模型基于CGDB中来自12,060名患者的数据,这些数据来自例行临床全面基因组分析(CGP)测试和图表确认的电子健康记录(EHR)。我们使用SHapley Additive exPlanations方法解释模型预测结果。在独立验证队列的955名患者中,XC-GeM在验证数据集中达到了0.954的AUC和0.733的MCC,而且对非小细胞肺癌(NSCLC)的患者77%的预测准确,对结直肠癌的患者86%的预测准确,对乳腺癌的患者84%的预测准确。整体模型的关键预测因子(例如肿瘤突变负荷(TMB),性别和KRAS突变)以及特定肿瘤类型的预测因子(例如NSCLC的TMB和EGFR突变)得到了已发布研究的支持。XC-GeM在507名确诊缺失的患者中也取得了出色的0.880的AUC和0.540的MCC。XC-GeM是首个使用美国全国范围内例行CGP测试和图表确认EHR数据预测原发肿瘤类型的算法,展示了很有希望的性能。它可以增强癌症诊断的准确性和效率,使治疗选择更及时,并潜在地导致更好的治疗效果。
© 2023 作者们。
Timely and accurate primary tumor diagnosis is critical, and misdiagnoses and delays may cause undue health and economic burden. To predict primary tumor types based on genomics data from a de-identified US nationwide clinico-genomic database (CGDB), the XGBoost-based Clinico-Genomic Machine Learning Model (XC-GeM) was developed to predict 13 primary tumor types based on data from 12,060 patients in the CGDB, derived from routine clinical comprehensive genomic profiling (CGP) testing and chart-confirmed electronic health records (EHRs). The SHapley Additive exPlanations method was used to interpret model predictions. XC-GeM reached an outstanding area under the curve (AUC) of 0.965 and Matthew's correlation coefficient (MCC) of 0.742 in the holdout validation dataset. In the independent validation cohort of 955 patients, XC-GeM reached 0.954 AUC and 0.733 MCC and made correct predictions in 77% of non-small cell lung cancer (NSCLC), 86% of colorectal cancer, and 84% of breast cancer patients. Top predictors for the overall model (e.g. tumor mutational burden (TMB), gender, and KRAS alteration), and for specific tumor types (e.g., TMB and EGFR alteration for NSCLC) were supported by published studies. XC-GeM also achieved an excellent AUC of 0.880 and positive MCC of 0.540 in 507 patients with missing primary diagnosis. XC-GeM is the first algorithm to predict primary tumor type using US nationwide data from routine CGP testing and chart-confirmed EHRs, showing promising performance. It may enhance the accuracy and efficiency of cancer diagnoses, enabling more timely treatment choices and potentially leading to better outcomes.© 2023 The Authors.