利用关联规则挖掘算法揭示甲状腺乳头状癌早期和晚期的关键分子机制。
Uncovering key molecular mechanisms in the early and late-stage of papillary thyroid carcinoma using association rule mining algorithm.
发表日期:2023
作者:
Seyed Mahdi Hosseiniyan Khatibi, Sepideh Zununi Vahed, Hamed Homaei Rad, Manijeh Emdadi, Zahra Akbarpour, Mohammad Teshnehlab, Saeed Pirmoradi, Effat Alizadeh
来源:
MOLECULAR & CELLULAR PROTEOMICS
摘要:
甲状腺癌(TC)是最常见的内分泌恶性肿瘤。它是全世界女性癌症的第六大原因。通过识别早期和晚期的控制分子机制可以加快治疗过程,这有助于加速治疗方案和改善患者的生存结果。在这项工作中,我们通过机器学习算法研究了甲状腺乳头状癌 (PTC) 早期和晚期的重要 mRNA。在我们的研究过程中,我们研究了各种方法和技术以获得合适的结果。我们遵循的程序顺序包括组织数据、使用嵌套交叉验证、数据清理和初始阶段的标准化。接下来,为了应用特征选择,选择采用 t 检验和二元非支配排序遗传算法 II (NSGAII)。随后,在分析阶段,使用机器学习和深度学习算法评估所选特征的判别力。最后,我们考虑了所选择的特征,并利用关联规则挖掘算法来识别对于改善 PTC 早期和晚期阶段主导分子机制的解码最重要的特征。SVM 分类器能够区分早期和晚期类别基于识别的 mRNA,准确度为 83.5%,AUC 为 0.78。与 PTC 早期和晚期相关的最重要基因分别被确定为(例如,ZNF518B、DTD2、CCAR1)和(例如,lnc-DNAJB6-7:7、RP11-484D2.3、MSL3P1)。当前研究揭示了潜在候选基因的清晰图景,这些基因不仅在早期阶段而且在整个晚期阶段都可以发挥重要作用。因此,这些发现可能有助于确定治疗靶点,从而更有效地开发 PTC 药物。版权所有:© 2023 Hosseiniyan Khatibi 等人。这是一篇根据知识共享署名许可条款分发的开放获取文章,允许在任何媒体上不受限制地使用、分发和复制,前提是注明原始作者和来源。
Thyroid Cancer (TC) is the most frequent endocrine malignancy neoplasm. It is the sixth cause of cancer in women worldwide. The treatment process could be expedited by identifying the controlling molecular mechanisms at the early and late stages, which can contribute to the acceleration of treatment schemes and the improvement of patient survival outcomes. In this work, we study the significant mRNAs through Machine Learning Algorithms in both the early and late stages of Papillary Thyroid Cancer (PTC).During the course of our study, we investigated various methods and techniques to obtain suitable results. The sequence of procedures we followed included organizing data, using nested cross-validation, data cleaning, and normalization at the initial stage. Next, to apply feature selection, a t-test and binary Non-Dominated Sorting Genetic Algorithm II (NSGAII) were chosen to be employed. Later on, during the analysis stage, the discriminative power of the selected features was evaluated using machine learning and deep learning algorithms. Finally, we considered the selected features and utilized Association Rule Mining algorithm to identify the most important ones for improving the decoding of dominant molecular mechanisms in PTC through its early and late stages.The SVM classifier was able to distinguish between early and late-stage categories with an accuracy of 83.5% and an AUC of 0.78 based on the identified mRNAs. The most significant genes associated with the early and late stages of PTC were identified as (e.g., ZNF518B, DTD2, CCAR1) and (e.g., lnc-DNAJB6-7:7, RP11-484D2.3, MSL3P1), respectively.Current study reveals a clear picture of the potential candidate genes that could play a major role not only in the early stage, but also throughout the late one. Hence, the findings could be of help to identify therapeutic targets for more effective PTC drug developments.Copyright: © 2023 Hosseiniyan Khatibi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.