使用机器学习对胸腔积液进行鉴别诊断。
Differential Diagnosis of Pleural Effusion Using Machine Learning.
发表日期:2023 Oct 03
作者:
Na Young Kim, Boa Jang, Kang-Mo Gu, Young Sik Park, Young-Gon Kim, Jaeyoung Cho
来源:
Annals of the American Thoracic Society
摘要:
胸腔积液的鉴别诊断在临床实践中具有挑战性。我们旨在开发一种机器学习模型来对胸腔积液的五种常见原因进行分类。这项回顾性研究收集了接受诊断的成年患者的临床信息、血液和胸腔积液的 49 个特征2013年10月至2018年12月期间进行胸腔穿刺术。胸腔积液分为以下五类:漏出性、恶性、肺炎旁、结核性和其他。通过五个不同的分类器,包括多项式逻辑回归、支持向量机、随机森林、极限梯度提升和轻梯度提升机(LGB),在准确度和接受者操作曲线下面积(AUC)方面评估了五种不同分类器的性能。 -折叠交叉验证。应用混合特征选择来确定胸腔积液分类最相关的特征。我们分析了 2,253 名患者(训练集,n=1,459;验证集,n=365;额外验证集,n=429),发现 LGB 模型在验证集和额外验证集上都取得了最佳性能。特征选择后,具有所选 18 个特征的 LGB 模型的准确率与具有所有 49 个特征的准确率相当,在验证集和额外验证集中分别为 0.818 ± 0.012 和 0.777 ± 0.007。该模型在验证组和额外验证组中的平均 AUC 高达 0.930 ± 0.042 和 0.916 ± 0.044。在我们的模型中,胸膜乳酸脱氢酶、蛋白质和腺苷脱氨酶水平是胸腔积液分类的最重要因素。我们的 LGB 模型在胸腔积液常见原因的鉴别诊断方面表现出令人满意的性能。该模型可以为临床医生提供有关胸膜疾病的主要鉴别诊断的有价值的信息。
Differential diagnosis of pleural effusion is challenging in clinical practice.We aimed to develop a machine learning model to classify the five common causes of pleural effusions.This retrospective study collected 49 features from clinical information, blood, and pleural fluid of adult patients who underwent diagnostic thoracentesis between October 2013 and December 2018. Pleural effusions were classified into the following five categories: transudative, malignant, parapneumonic, tuberculous, and other. The performance of five different classifiers, including multinomial logistic regression, support vector machine, random forest, extreme gradient boosting, and light gradient boosting machine (LGB), was evaluated in terms of accuracy and area under the receiver operating curve (AUC) through five-fold cross validation. Hybrid feature selection was applied to determine the most relevant features for classifying pleural effusion.We analyzed 2,253 patients (training set, n=1,459; validation set, n=365; extra-validation set, n=429) and found that the LGB model achieved the best performance in both validation and extra-validation sets. After feature selection, the accuracy of the LGB model with the selected 18 features was equivalent to that with all 49 features, 0.818 ± 0.012 and 0.777 ± 0.007 in the validation and extra-validation sets, respectively. The model's mean AUC was as high as 0.930 ± 0.042 and 0.916 ± 0.044 in the validation and extra-validation sets. In our model, pleural lactate dehydrogenase, protein, and adenosine deaminase levels were the most important factors for classifying pleural effusions.Our LGB model showed satisfactory performance for differential diagnosis of the common causes of pleural effusions. This model could provide clinicians with valuable information regarding the major differential diagnoses of pleural diseases.