研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

使用简洁的集成机器学习模型评估肺癌筛查的资格:一项开发和验证研究。

Assessing eligibility for lung cancer screening using parsimonious ensemble machine learning models: A development and validation study.

发表日期:2023 Oct
作者: Thomas Callender, Fergus Imrie, Bogdan Cebere, Nora Pashayan, Neal Navani, Mihaela van der Schaar, Sam M Janes
来源: PLOS MEDICINE

摘要:

目前,一些国家正在考虑对肺癌进行基于风险的筛查;然而,确定资格的最佳方法仍不清楚。集成机器学习可以支持高度简约的预测模型的开发,这些模型保持更复杂模型的性能,同时最大限度地提高简单性和通用性,支持个性化筛选的广泛采用。在这项工作中,我们旨在开发和验证集成机器学习模型,以确定基于风险的肺癌筛查的资格。在模型开发中,我们使用了 2006 年至 2010 年间招募到英国生物银行前瞻性队列的 216,714 名曾经吸烟者的数据和 26,616 名曾经吸烟者的数据。 2002 年至 2004 年间招募的高危吸烟者加入美国国家肺部筛查 (NLST) 随机对照试验的对照组。 NLST 试验将来自美国 33 个中心、至少有 30 包年吸烟史且戒烟时间少于 15 年的高危吸烟者随机分组,每年进行肺癌 CT 或胸部 X 线检查。我们在胸部 X 光检查组的 49,593 名参与者以及美国前列腺、肺、结直肠和卵巢 (PLCO) 筛查试验的所有 80,659 名曾经吸烟的参与者中对我们的模型进行了外部验证。 PLCO 试验于 1993 年至 2001 年进行招募,分析了胸部 X 光检查或不进行胸部 X 光检查对肺癌筛查的影响。我们主要在 PLCO 胸部放射成像臂中进行验证,以便我们可以针对 PLCO 控制臂内开发的比较器模型进行基准测试。开发模型是为了预测基线 5 年内出现两种结果的风险:肺癌诊断和肺癌死亡。我们通过决策曲线分析评估了模型辨别力(受试者工作曲线下面积,AUC)、校准(校准曲线和预期/观察到的比率)、整体性能(Brier 评分)和净收益。使用 3 个变量(年龄、吸烟时间和吸烟年数)预测肺癌死亡 (UCL-D) 和发病率 (UCL-I) 的模型在辨别力、整体表现和净收益方面与目前使用的比较器实现或超过了同等水平,尽管只需要四分之一的预测变量。在 PLCO 试验的外部验证中,UCL-D 的 AUC 为 0.803(95% CI:0.783,0.824),并且经过良好校准,预期/观察(E/O)比为 1.05(95% CI:0.95,1.19) )。 UCL-I 的 AUC 为 0.787(95% CI:0.771,0.802),E/O 比为 1.0(95% CI:0.92,1.07)。 UCL-D 的敏感性为 85.5%,UCL-I 的敏感性为 83.9%,5 年风险阈值分别为 0.68% 和 1.17%,比相同特异性下的 USPSTF-2021 标准高 7.9% 和 6.2%。这项研究的主要局限性是这些模型尚未在英国和美国队列之外得到验证。我们提出了简约的集成机器学习模型来预测曾经吸烟者患肺癌的风险,展示了一种可以简化实施的新方法在多种环境下进行基于风险的肺癌筛查。版权所有:© 2023 Callender 等人。这是一篇根据知识共享署名许可条款分发的开放获取文章,允许在任何媒体上不受限制地使用、分发和复制,前提是注明原始作者和来源。
Risk-based screening for lung cancer is currently being considered in several countries; however, the optimal approach to determine eligibility remains unclear. Ensemble machine learning could support the development of highly parsimonious prediction models that maintain the performance of more complex models while maximising simplicity and generalisability, supporting the widespread adoption of personalised screening. In this work, we aimed to develop and validate ensemble machine learning models to determine eligibility for risk-based lung cancer screening.For model development, we used data from 216,714 ever-smokers recruited between 2006 and 2010 to the UK Biobank prospective cohort and 26,616 high-risk ever-smokers recruited between 2002 and 2004 to the control arm of the US National Lung Screening (NLST) randomised controlled trial. The NLST trial randomised high-risk smokers from 33 US centres with at least a 30 pack-year smoking history and fewer than 15 quit-years to annual CT or chest radiography screening for lung cancer. We externally validated our models among 49,593 participants in the chest radiography arm and all 80,659 ever-smoking participants in the US Prostate, Lung, Colorectal and Ovarian (PLCO) Screening Trial. The PLCO trial, recruiting from 1993 to 2001, analysed the impact of chest radiography or no chest radiography for lung cancer screening. We primarily validated in the PLCO chest radiography arm such that we could benchmark against comparator models developed within the PLCO control arm. Models were developed to predict the risk of 2 outcomes within 5 years from baseline: diagnosis of lung cancer and death from lung cancer. We assessed model discrimination (area under the receiver operating curve, AUC), calibration (calibration curves and expected/observed ratio), overall performance (Brier scores), and net benefit with decision curve analysis. Models predicting lung cancer death (UCL-D) and incidence (UCL-I) using 3 variables-age, smoking duration, and pack-years-achieved or exceeded parity in discrimination, overall performance, and net benefit with comparators currently in use, despite requiring only one-quarter of the predictors. In external validation in the PLCO trial, UCL-D had an AUC of 0.803 (95% CI: 0.783, 0.824) and was well calibrated with an expected/observed (E/O) ratio of 1.05 (95% CI: 0.95, 1.19). UCL-I had an AUC of 0.787 (95% CI: 0.771, 0.802), an E/O ratio of 1.0 (95% CI: 0.92, 1.07). The sensitivity of UCL-D was 85.5% and UCL-I was 83.9%, at 5-year risk thresholds of 0.68% and 1.17%, respectively, 7.9% and 6.2% higher than the USPSTF-2021 criteria at the same specificity. The main limitation of this study is that the models have not been validated outside of UK and US cohorts.We present parsimonious ensemble machine learning models to predict the risk of lung cancer in ever-smokers, demonstrating a novel approach that could simplify the implementation of risk-based lung cancer screening in multiple settings.Copyright: © 2023 Callender et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.