研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

提高多个临床中心甲状旁腺激素相关肽检测利用率机器学习模型的普适性。

Generalizability of a Machine Learning Model for Improving Utilization of Parathyroid Hormone-Related Peptide Testing across Multiple Clinical Centers.

发表日期:2023 Sep 21
作者: He S Yang, Weishen Pan, Yingheng Wang, Mark A Zaydman, Nicholas C Spies, Zhen Zhao, Theresa A Guise, Qing H Meng, Fei Wang
来源: CLINICAL CHEMISTRY

摘要:

测量甲状旁腺激素相关肽(PTHrP)有助于诊断恶性肿瘤的体液性高钙血症,但往往被用于低先验概率的患者,导致测试利用率不佳。手动检查结果以确认PTHrP申请是否不当的过程繁琐。我们使用来自一个单独研究机构的1330名患者的数据集,开发了一个机器学习(ML)模型来预测异常的PTHrP结果。然后我们评估了模型在两个外部数据集上的表现。我们研究了不同策略(模型传输、重新训练、重建和微调)来改进模型的泛化能力。最大均值差异(MMD)被引入以量化不同数据集之间数据分布的偏移程度。该模型在研发队列中实现了0.936的受试者工作特征曲线下面积(AUROC),以及0.900敏感性下的0.842特异性。将该模型直接传输到两个外部数据集导致AUROC分别下降至0.838和0.737,后者的MMD更大,说明与原始数据集相比的数据偏移更大。使用特定研究机构的数据重建模型将AUROC分别提高至0.891和0.837。当外部数据不足以进行重新训练时,微调策略也能改善模型的效用。机器学习有望改善PTHrP测试的利用率,并减轻手动审核的负担。将现成的模型传输到外部数据集可能导致性能下降,因为数据分布发生了偏移。当有足够的数据时,重新训练或重建模型可以提高泛化能力。当特定研究机构数据有限时,微调策略可能更有利。© 美国诊断与实验室医学协会2023.版权所有。欲获授权,请发送电子邮件至:journals.permissions@oup.com。
Measuring parathyroid hormone-related peptide (PTHrP) helps diagnose the humoral hypercalcemia of malignancy, but is often ordered for patients with low pretest probability, resulting in poor test utilization. Manual review of results to identify inappropriate PTHrP orders is a cumbersome process.Using a dataset of 1330 patients from a single institute, we developed a machine learning (ML) model to predict abnormal PTHrP results. We then evaluated the performance of the model on two external datasets. Different strategies (model transporting, retraining, rebuilding, and fine-tuning) were investigated to improve model generalizability. Maximum mean discrepancy (MMD) was adopted to quantify the shift of data distributions across different datasets.The model achieved an area under the receiver operating characteristic curve (AUROC) of 0.936, and a specificity of 0.842 at 0.900 sensitivity in the development cohort. Directly transporting this model to two external datasets resulted in a deterioration of AUROC to 0.838 and 0.737, with the latter having a larger MMD corresponding to a greater data shift compared to the original dataset. Model rebuilding using site-specific data improved AUROC to 0.891 and 0.837 on the two sites, respectively. When external data is insufficient for retraining, a fine-tuning strategy also improved model utility.ML offers promise to improve PTHrP test utilization while relieving the burden of manual review. Transporting a ready-made model to external datasets may lead to performance deterioration due to data distribution shift. Model retraining or rebuilding could improve generalizability when there are enough data, and model fine-tuning may be favorable when site-specific data is limited.© Association for Diagnostics & Laboratory Medicine 2023. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.