利用微阵列数据基于MOPSO算法对癌细胞进行分类及基因筛选。
Classification of cancer cells and gene selection based on microarray data using MOPSO algorithm.
发表日期:2023 Aug 27
作者:
Mohammad Reza Rahimi, Dorna Makarem, Sliva Sarspy, Sobhan Akhavan Mahdavi, Mustafa Fahem Albaghdadi, Seyed Mostafa Armaghan
来源:
Journal of cancer research and clinical oncology
摘要:
微阵列信息对于恶性组织的识别和分类至关重要。微阵列中非常有限的样本量一直是癌症研究中分类设计的挑战。因此,在分类之前,采用基因预处理选择方法和缺乏信息的基因删除微阵列数据。本质上,适当的基因选择技术可以显著提高疾病(癌症)分类的准确性。本研究提出了一种基于多目标粒子群优化(MOPSO)混合模型的高维微阵列数据分类新方法。首先,随机提出了表示每个粒子位置的二进制向量。每个位表示一个基因。位0表示未选择对应的特征(基因),而位1表示选择了该基因。因此,每个粒子的位置表示一组基因,线性贝叶斯判别分析分类算法计算每个粒子选择的基因集的适应度,以评估基因集的质量。建议的方法应用于四个不同的癌症数据库集,并与目前使用的其他方法进行对比。所提出的算法已应用于四个癌症数据库集,并与其他现有方法进行了比较。实施结果表明,与基于四个数据库的其他方法相比,所提出的算法的分类准确性平均提高了25.84%。血液癌症数据库提高了18.63%,肺癌数据库提高了24.25%,乳腺癌数据库提高了27.73%,前列腺癌数据库提高了32.80%。因此,所提出的算法能够选择增加分类准确性的包含信息的少量基因集。我们提出的解决方案用于数据分类,同时提高了分类准确性。这是因为MOPSO模型通过考虑基因之间的相关性来消除冗余并减少冗余基因的数量。© 2023年作者,独家许可给德国施普林格出版社及其一部分属于施普林格自然出版集团。
Microarray information is crucial for the identification and categorisation of malignant tissues. The very limited sample size in the microarray has always been a challenge for classification design in cancer research. As a result, by pre-processing gene selection approaches and genes lacking their information, the microarray data are deleted prior to categorisation. In essence, an appropriate gene selection technique can significantly increase the accuracy of illness (cancer) classification.For the classification of high-dimensional microarray data, a novel approach based on the hybrid model of multi-objective particle swarm optimisation (MOPSO) is proposed in this research. First, a binary vector representing each particle's position is presented at random. A gene is represented by each bit. Bit 0 denotes the absence of selection of the characteristic (gene) corresponding to it, while bit 1 denotes the selection of the gene. Therefore, the position of each particle represents a set of genes, and the linear Bayesian discriminant analysis classification algorithm calculates each particle's degree of fitness to assess the quality of the gene set that particle has chosen. The suggested methodology is applied to four different cancer database sets, and the results are contrasted with those of other approaches currently in use.The proposed algorithm has been applied on four sets of cancer database and its results have been compared with other existing methods. The results of the implementation show that the improvement of classification accuracy in the proposed algorithm compared to other methods for four sets of databases is 25.84% on average. So that it has improved by 18.63% in the blood cancer database, 24.25% in the lung cancer database, 27.73% in the breast cancer database, and 32.80% in the prostate cancer database. Therefore, the proposed algorithm is able to identify a small set of genes containing information in a way choose to increase the classification accuracy.Our proposed solution is used for data classification, which also improves classification accuracy. This is possible because the MOPSO model removes redundancy and reduces the number of redundant and redundant genes by considering how genes are correlated with each other.© 2023. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.