研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

利用机器学习对第二原发性乳腺癌患者进行生存预测:SEER 数据库分析。

Survival prediction in second primary breast cancer patients with machine learning: An analysis of SEER database.

发表日期:2024 Jun 25
作者: Yafei Wu, Yaheng Zhang, Siyu Duan, Chenming Gu, Chongtao Wei, Ya Fang
来源: Comput Meth Prog Bio

摘要:

研究发现,第一原发性癌症(FPC)幸存者患第二原发性乳腺癌(SPBC)的风险很高。然而,缺乏专门针对 SPBC 患者的预后研究。这项回顾性研究使用了来自监测、流行病学和最终结果计划的数据。我们从 12 个登记处(1998 年 1 月至 2018 年 12 月)中选择了诊断为 SPBC 的女性 FPC 幸存者来构建预后模型。同时,从另外5个登记处(2010年1月至2018年12月)选择的SPBC患者作为验证集来测试模型的泛化能力。构建了四种机器学习模型和 Cox 比例风险回归 (CoxPH) 来预测 SPBC 患者的总体生存率。单变量和多变量 Cox 回归分析用于特征选择。使用 ROC 曲线下时间依赖性面积 (t-AUC) 和综合 Brier 评分 (iBrier) 评估模型性能。总共纳入了 10,321 名患有 SPBC 的女性 FPC 幸存者(平均年龄 [SD]:66.03 [11.17])。模型构建。这些患者被随机分为训练组(平均年龄 [SD]:65.98 [11.15])和测试组(平均年龄 [SD]:66.15 [11.23]),比例为 7:3。在验证集中,最终总共纳入了 3,638 名 SPBC 患者(平均年龄 [SD]:66.28 [10.68])。通过单变量和多变量 Cox 回归分析选择了 16 个特征用于模型构建。在五个模型中,随机生存森林模型表现出了优异的性能,在测试集上的 t-AUC 为 0.805(95%CI:0.803 - 0.807),iBrier 为 0.123(95%CI:0.122 - 0.124)。 -验证集上的 AUC 为 0.803(95%CI:0.801 - 0.807),iBrier 为 0.098(95%CI:0.096 - 0.103)。通过特征重要性排序,确定了随机生存森林模型的前一项和其他前五项关键预测特征,即年龄、分期、区域节点阳性、潜伏期、放疗和手术。随机生存森林模型优于CoxPH等机器预测 SPBC 患者总体生存率的学习模型,有助于高危人群的监测。版权所有 © 2024。Elsevier B.V. 出版。
Studies have found that first primary cancer (FPC) survivors are at high risk of developing second primary breast cancer (SPBC). However, there is a lack of prognostic studies specifically focusing on patients with SPBC.This retrospective study used data from Surveillance, Epidemiology and End Results Program. We selected female FPC survivors diagnosed with SPBC from 12 registries (from January 1998 to December 2018) to construct prognostic models. Meanwhile, SPBC patients selected from another five registries (from January 2010 to December 2018) were used as the validation set to test the model's generalization ability. Four machine learning models and a Cox proportional hazards regression (CoxPH) were constructed to predict the overall survival of SPBC patients. Univariate and multivariate Cox regression analyses were used for feature selection. Model performance was assessed using time-dependent area under the ROC curve (t-AUC) and integrated Brier score (iBrier).A total of 10,321 female FPC survivors with SPBC (mean age [SD]: 66.03 [11.17]) were included for model construction. These patients were randomly split into a training set (mean age [SD]: 65.98 [11.15]) and a test set (mean age [SD]: 66.15 [11.23]) with a ratio of 7:3. In validation set, a total of 3,638 SPBC patients (mean age [SD]: 66.28 [10.68]) were finally enrolled. Sixteen features were selected for model construction through univariate and multivariable Cox regression analyses. Among five models, random survival forest model showed excellent performance with a t-AUC of 0.805 (95 %CI: 0.803 - 0.807) and an iBrier of 0.123 (95 %CI: 0.122 - 0.124) on testing set, as well as a t-AUC of 0.803 (95 %CI: 0.801 - 0.807) and an iBrier of 0.098 (95 %CI: 0.096 - 0.103) on validation set. Through feature importance ranking, the top one and other top five key predictive features of the random survival forest model were identified, namely age, stage, regional nodes positive, latency, radiotherapy, and surgery.The random survival forest model outperformed CoxPH and other machine learning models in predicting the overall survival of patients with SPBC, which was helpful for the monitoring of high-risk populations.Copyright © 2024. Published by Elsevier B.V.