使用基于交互的特征聚类和改进的二进制 Bat 算法进行基因选择和癌症分类。
Gene selection and cancer classification using interaction-based feature clustering and improved-binary Bat algorithm.
发表日期:2024 Aug 27
作者:
Ahmad Esfandiari, Niki Nasiri
来源:
COMPUTERS IN BIOLOGY AND MEDICINE
摘要:
在高维基因表达数据中,选择最佳基因子集对于实现高分类精度和可靠的疾病诊断至关重要。本文提出了一种基于聚类和群体智能算法的两阶段混合基因选择模型,以高精度识别信息最丰富的基因。首先,执行基于聚类的多元过滤方法来探索特征之间的相互作用并消除任何冗余或不相关的特征。然后,通过控制二值Bat算法中的早熟收敛问题,通过蒙特卡罗交叉验证数据划分模型,使用不同的分类器确定最优基因子集。通过与文献中最近发布的其他算法进行比较,我们使用八个基因表达数据集评估了我们提出的框架的有效性。实验证实,在八个数据集中的七个中,所提出的方法可以在分类精度和基因子集大小方面取得优异的结果。特别是,它在淋巴瘤和卵巢数据集中实现了 100% 的分类准确率,在其余数据集中以最少的基因数量实现了 97.4% 以上的分类准确率。结果表明,我们提出的算法有潜力解决高维数据集不同应用中的特征选择问题。版权所有 © 2024 Elsevier Ltd。保留所有权利。
In high-dimensional gene expression data, selecting an optimal subset of genes is crucial for achieving high classification accuracy and reliable diagnosis of diseases. This paper proposes a two-stage hybrid model for gene selection based on clustering and a swarm intelligence algorithm to identify the most informative genes with high accuracy. First, a clustering-based multivariate filter approach is performed to explore the interactions between the features and eliminate any redundant or irrelevant ones. Then, by controlling for the problem of premature convergence in the binary Bat algorithm, the optimal gene subset is determined using different classifiers with the Monte Carlo cross-validation data partitioning model. The effectiveness of our proposed framework is evaluated using eight gene expression datasets, by comparison with other recently published algorithms in the literature. Experiments confirm that in seven out of eight datasets, the proposed method can achieve superior results in terms of classification accuracy and gene subset size. In particular, it achieves a classification accuracy of 100% in Lymphoma and Ovarian datasets and above 97.4% in the rest with a minimum number of genes. The results demonstrate that our proposed algorithm has the potential to solve the feature selection problem in different applications with high-dimensional datasets.Copyright © 2024 Elsevier Ltd. All rights reserved.