研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

一种混合机器学习特征选择模型-HMLFSM,用于增强应用于多个结肠癌数据集的基因分类。

A hybrid machine learning feature selection model-HMLFSM to enhance gene classification applied to multiple colon cancers dataset.

发表日期:2023
作者: Murad Al-Rajab, Joan Lu, Qiang Xu, Mohamed Kentour, Ahlam Sawsa, Emad Shuweikeh, Mike Joy, Ramesh Arasaradnam
来源: GENES & DEVELOPMENT

摘要:

结肠癌是一个重大的全球健康问题,早期发现对于提高生存率至关重要。传统的检测方法,例如结肠镜检查,可能是侵入性的并且会让患者感到不舒服。机器学习 (ML) 算法已成为利用遗传数据或患者人口统计数据和病史进行非侵入性结肠癌分类的一种有前途的方法。一种方法是使用机器学习来分析遗传数据或患者人口统计数据和病史,以预测结肠癌的可能性。然而,由于可变基因表达和癌症相关数据集的高维性带来的挑战,传统的转导式机器学习应用的准确性有限,并且存在过度拟合的风险。在本文中,我们提出了一种新的混合特征选择模型,称为 HMLFSM-混合机器学习特征选择模型,以改进结肠癌基因分类。我们开发了一种多滤波器混合模型,包括两阶段特征选择方法,结合信息增益 (IG) 和遗传算法 (GA),以及最小冗余最大相关性 (mRMR) 与粒子群优化 (PSO) 的耦合。我们在三个结肠癌遗传数据集上严格测试了我们的模型,发现新框架优于其他模型,准确率显着提高(数据集 1、2 和 3 的准确率分别为 95%、~97% 和~94%)。结果表明,我们的方法通过突出重要且相关的基因、消除不相关的基因并揭示对分类过程有直接影响的基因,提高了结肠癌检测的分类准确性。对于结肠癌基因分析,以及我们的实验和文献综述,我们发现在特征选择之前选择性输入特征提取对于提高预测性能至关重要。版权所有:© 2023 Al-Rajab 等人。这是一篇根据知识共享署名许可条款分发的开放获取文章,允许在任何媒体上不受限制地使用、分发和复制,前提是注明原始作者和来源。
Colon cancer is a significant global health problem, and early detection is critical for improving survival rates. Traditional detection methods, such as colonoscopies, can be invasive and uncomfortable for patients. Machine Learning (ML) algorithms have emerged as a promising approach for non-invasive colon cancer classification using genetic data or patient demographics and medical history. One approach is to use ML to analyse genetic data, or patient demographics and medical history, to predict the likelihood of colon cancer. However, due to the challenges imposed by variable gene expression and the high dimensionality of cancer-related datasets, traditional transductive ML applications have limited accuracy and risk overfitting. In this paper, we propose a new hybrid feature selection model called HMLFSM-Hybrid Machine Learning Feature Selection Model to improve colon cancer gene classification. We developed a multifilter hybrid model including a two-phase feature selection approach, combining Information Gain (IG) and Genetic Algorithms (GA), and minimum Redundancy Maximum Relevance (mRMR) coupling with Particle Swarm Optimization (PSO). We critically tested our model on three colon cancer genetic datasets and found that the new framework outperformed other models with significant accuracy improvements (95%, ~97%, and ~94% accuracies for datasets 1, 2, and 3 respectively). The results show that our approach improves the classification accuracy of colon cancer detection by highlighting important and relevant genes, eliminating irrelevant ones, and revealing the genes that have a direct influence on the classification process. For colon cancer gene analysis, and along with our experiments and literature review, we found that selective input feature extraction prior to feature selection is essential for improving predictive performance.Copyright: © 2023 Al-Rajab et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.