统计模型与机器学习在竞争风险中的应用：预后模型的开发和验证。

Statistical models versus machine learning for competing risks: development and validation of prognostic models.

Original text

发表日期：2023 Feb 24

作者： Georgios Kantidakis, Hein Putter, Saskia Litière, Marta Fiocco

来源： BMC Medical Research Methodology

摘要：

在健康研究中，一些慢性疾病容易发生竞争性风险（CRs）。最初，统计模型（SM）被开发用于在CRs存在的情况下估计事件的累积发生率。随着越来越多的人对运用机器学习（ML）进行临床预测的兴趣增加，这些技术也已经被扩展到了CRs的模型，但文献有限。本研究的目的是调查ML和SM在非复杂数据（小/中样本量、低维环境）中对CRs的潜在作用。使用3826名四肢软组织肉瘤（eSTS）的回顾性患者数据集和9个预测因素来评估模型的预测性能，包括鉴别和校准。在简单的临床环境中，比较了两种SM（因果Cox、Fine-Gray）和三种ML技术用于CRs。ML模型包括用于CRs的初始部分逻辑人工神经网络（PLANNCR原始）、在架构方面具有新规格的PLANNCR扩展，并用于CRs的随机存活森林（RSFCR）。临床终点是手术和疾病进展（有兴趣的事件）或死亡（竞争事件）之间的年限。感兴趣的时间点为2、5和10年。根据原始eSTS数据，抽取了100个bootstrap训练数据集。通过使用Brier分数和带有CRs的曲线下面积（AUC）作为衡量标准来评估最终模型在验证数据上的性能（留样本）。还估计了误差校准（绝对精度误差）。结果显示，在2、5和10年的Brier分数和AUC方面，ML模型能够达到与SM相当的性能（95％置信区间重叠）。然而，SM通常更加校准。总体而言，ML技术较不实用，因为它们需要大量的实施时间（数据预处理、超参数调整、计算强度），而回归方法可以在不需要额外的模型训练工作负荷的情况下表现得很好。因此，对于非复杂的现实生存数据，这些技术应该作为探索性工具来补充SM的应用。需要更多注意模型校准。©2023年。作者。

In health research, several chronic diseases are susceptible to competing risks (CRs). Initially, statistical models (SM) were developed to estimate the cumulative incidence of an event in the presence of CRs. As recently there is a growing interest in applying machine learning (ML) for clinical prediction, these techniques have also been extended to model CRs but literature is limited. Here, our aim is to investigate the potential role of ML versus SM for CRs within non-complex data (small/medium sample size, low dimensional setting).A dataset with 3826 retrospectively collected patients with extremity soft-tissue sarcoma (eSTS) and nine predictors is used to evaluate model-predictive performance in terms of discrimination and calibration. Two SM (cause-specific Cox, Fine-Gray) and three ML techniques are compared for CRs in a simple clinical setting. ML models include an original partial logistic artificial neural network for CRs (PLANNCR original), a PLANNCR with novel specifications in terms of architecture (PLANNCR extended), and a random survival forest for CRs (RSFCR). The clinical endpoint is the time in years between surgery and disease progression (event of interest) or death (competing event). Time points of interest are 2, 5, and 10 years.Based on the original eSTS data, 100 bootstrapped training datasets are drawn. Performance of the final models is assessed on validation data (left out samples) by employing as measures the Brier score and the Area Under the Curve (AUC) with CRs. Miscalibration (absolute accuracy error) is also estimated. Results show that the ML models are able to reach a comparable performance versus the SM at 2, 5, and 10 years regarding both Brier score and AUC (95% confidence intervals overlapped). However, the SM are frequently better calibrated.Overall, ML techniques are less practical as they require substantial implementation time (data preprocessing, hyperparameter tuning, computational intensity), whereas regression methods can perform well without the additional workload of model training. As such, for non-complex real life survival data, these techniques should only be applied complementary to SM as exploratory tools of model's performance. More attention to model calibration is urgently needed.© 2023. The Author(s).