研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

使用集成贝叶斯网络结合SMOTE-ENN和Boruta方法进行糖尿病早期预警和因素分析。

Diabetes mellitus early warning and factor analysis using ensemble Bayesian networks with SMOTE-ENN and Boruta.

发表日期:2023 Aug 05
作者: Xuchun Wang, Jiahui Ren, Hao Ren, Wenzhu Song, Yuchao Qiao, Ying Zhao, Liqin Linghu, Yu Cui, Zhiyang Zhao, Limin Chen, Lixia Qiu
来源: DIABETES & METABOLISM

摘要:

糖尿病(DM)已成为肿瘤、心血管疾病和脑血管疾病之后影响患者的第三种慢性非传染性疾病,成为全球主要的公共卫生问题之一。发现DM早期预警风险因素对于预防DM至关重要,这已成为一些先前研究的重点。因此,从居民自我管理和预防的角度出发,本研究在中国山西省利用贝叶斯网络(BNs)结合特征筛选和多重重新抽样技术构建了对于在慢性疾病监测计划中的DM监测数据存在类别不平衡的DM的风险因素检测和风险预测。首先,利用单变量分析和Boruta特征选择算法对所有包含的风险因素进行初步筛选。然后,采用SMOTE、Borderline-SMOTE(BL-SMOTE)和SMOTE-ENN三种重新抽样技术来处理数据不平衡问题。最后,使用经过处理的数据构建了由三种算法(Tabu、Hill-climbing 和 MMHC)开发的BNs,以找出与DM强相关的预警因素。结果显示,通过使用经过处理的数据构建的BNs,DM分类的准确性显著提高。特别是,与SMOTE-ENN重新抽样相结合的BNs提高最多,并且与Hill-climbing和MMHC算法相比,由Tabu算法构建的BNs获得了最佳的分类性能。最佳联合的Boruta-SMOTE-ENN-Tabu模型表明,DM的风险因素包括家族史、年龄、中心性肥胖、高血脂、减盐、职业、心率和BMI。© 2023 Springer Nature Limited.
Diabetes mellitus (DM) has become the third chronic non-infectious disease affecting patients after tumor, cardiovascular and cerebrovascular diseases, becoming one of the major public health issues worldwide. Detection of early warning risk factors for DM is key to the prevention of DM, which has been the focus of some previous studies. Therefore, from the perspective of residents' self-management and prevention, this study constructed Bayesian networks (BNs) combining feature screening and multiple resampling techniques for DM monitoring data with a class imbalance in Shanxi Province, China, to detect risk factors in chronic disease monitoring programs and predict the risk of DM. First, univariate analysis and Boruta feature selection algorithm were employed to conduct the preliminary screening of all included risk factors. Then, three resampling techniques, SMOTE, Borderline-SMOTE (BL-SMOTE) and SMOTE-ENN, were adopted to deal with data imbalance. Finally, BNs developed by three algorithms (Tabu, Hill-climbing and MMHC) were constructed using the processed data to find the warning factors that strongly correlate with DM. The results showed that the accuracy of DM classification is significantly improved by the BNs constructed by processed data. In particular, the BNs combined with the SMOTE-ENN resampling improved the most, and the BNs constructed by the Tabu algorithm obtained the best classification performance compared with the hill-climbing and MMHC algorithms. The best-performing joint Boruta-SMOTE-ENN-Tabu model showed that the risk factors of DM included family history, age, central obesity, hyperlipidemia, salt reduction, occupation, heart rate, and BMI.© 2023. Springer Nature Limited.