利用结构学习和非凸规则化的子空间学习:基因选择中的蘑菇繁殖优化的混合技术。
Subspace learning using structure learning and non-convex regularization: Hybrid technique with mushroom reproduction optimization in gene selection.
发表日期:2023 Jul 31
作者:
Amir Moslemi, Mahdi Bidar, Arash Ahmadian
来源:
COMPUTERS IN BIOLOGY AND MEDICINE
摘要:
过去十年来,作为一个高维问题,基因选择在机器学习和计算生物学领域引起了广泛关注。在癌症数据集的基因选择领域中,已经开发了不同类型的特征选择技术,包括策略(过滤器、封装器和嵌入式方法)和标签信息(有监督、无监督和半监督)。然而,使用混合特征选择仍然可以提高性能。本文提出了一种基于过滤器和封装器策略的混合特征选择方法。在过滤器阶段,我们基于非凸正则化非负矩阵分解和结构学习开发了一种无监督特征选择方法,称为NCNMFSL。 在封装器阶段,我们首次利用蘑菇生殖优化(MRO)来获取最具信息量的特征子集。在这种混合特征选择方法中,通过NCNMFSL筛选掉了不相关的特征,并通过MRO选择了最有区分性的特征。为了展示所提出方法的有效性和熟练性,我们在Breast、Heart、Colon、Leukemia、Prostate、Tox-171和GLI-85基准数据集上进行了数值实验。使用支持向量机和决策树分类器分析所提出的技术,得到的最高准确率分别为0.97、0.84、0.98、0.95、0.98、0.87和0.85,分别对应于Breast、Heart、Colon、Leukemia、Prostate、Tox-171和GLI-85。计算结果显示,与最先进的特征选择技术相比,所提出的方法的有效性得到了证实。版权所有 © 2023 Elsevier Ltd.
Gene selection as a problem with high dimensions has drawn considerable attention in machine learning and computational biology over the past decade. In the field of gene selection in cancer datasets, different types of feature selection techniques in terms of strategy (filter, wrapper and embedded) and label information (supervised, unsupervised, and semi-supervised) have been developed. However, using hybrid feature selection can still improve the performance. In this paper, we propose a hybrid feature selection based on filter and wrapper strategies. In the filter-phase, we develop an unsupervised features selection based on non-convex regularized non-negative matrix factorization and structure learning, which we deem NCNMFSL. In the wrapper-phase, for the first time, mushroom reproduction optimization (MRO) is leveraged to obtain the most informative features subset. In this hybrid feature selection method, irrelevant features are filtered-out through NCNMFSL, and most discriminative features are selected by MRO. To show the effectiveness and proficiency of the proposed method, numerical experiments are conducted on Breast, Heart, Colon, Leukemia, Prostate, Tox-171 and GLI-85 benchmark datasets. SVM and decision tree classifiers are leveraged to analyze proposed technique and top accuracy are 0.97, 0.84, 0.98, 0.95, 0.98, 0.87 and 0.85 for Breast, Heart, Colon, Leukemia, Prostate, Tox-171 and GLI-85, respectively. The computational results show the effectiveness of the proposed method in comparison with state-of-art feature selection techniques.Copyright © 2023. Published by Elsevier Ltd.