研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

基于深度神经网络和LightGBM的集成模型预测肺癌和神经母细胞瘤的潜在lncRNA生物标志物。

Predicting potential lncRNA biomarkers for lung cancer and neuroblastoma based on an ensemble of a deep neural network and LightGBM.

发表日期:2023
作者: Zhenguo Su, Huihui Lu, Yan Wu, Zejun Li, Lian Duan
来源: Frontiers in Genetics

摘要:

引言:肺癌是全球最常见的恶性肿瘤之一,每年约有220万新病例和180万死亡病例。程序性死亡配体-1(PDL1)的表达水平与肺癌呈复杂的关联。神经母细胞瘤是一种高危恶性肿瘤,主要发生在儿童患者中。发现这两种疾病的新生物标志物可以显著促进其诊断和治疗。然而,发现潜在生物标志物的体内实验昂贵且繁重。因此,人工智能技术,尤其是机器学习方法,为发现各种疾病的新生物标志物提供了强大的途径。方法:我们开发了一种基于机器学习的方法,名为LDAenDL,使用深度神经网络和LightGBM的集合来检测肺癌和神经母细胞瘤的潜在长非编码RNA(lncRNA)生物标志物。LDAenDL首先计算lncRNA的高斯核相似性和功能相似性以及疾病的高斯核相似性和语义相似性,以获取它们的相似网络。接下来,LDAenDL结合了图卷积网络、图注意力网络和卷积神经网络,基于相似网络学习lncRNA和疾病的生物学特征。第三,这些特征被串联并提供给由深度神经网络和LightGBM组成的集合模型,以发现新的lncRNA-疾病关联(LDAs)。最后,将提出的LDAenDL方法应用于识别与肺癌和神经母细胞瘤相关的潜在lncRNA生物标志物。结果:实验结果表明,在数据集1上,LDAenDL对lncRNA、疾病和lncRNA-疾病对的交叉验证下计算出最佳的AUC值分别为0.8701、0.8953和0.9110,在数据集2上分别为0.9490、0.9157和0.9708。此外,在数据集1上进行了三次交叉验证,获得的AUPR值分别为0.8903、0.9061和0.9166,在数据集2上分别为0.9582、0.9122和0.9743。结果表明,LDAenDL在预测LDA方面明显优于其他四种经典的LDA预测方法(即SDLDA、LDNFSGB、IPCAF和LDASR)。案例研究表明,CCDC26和IFNG-AS1可能是肺癌的新生物标志物,SNHG3可能与肺癌的PDL1相关,HOTAIR和BDNF-AS可能是神经母细胞瘤的潜在生物标志物。结论:我们希望提出的LDAenDL方法能够帮助开发针对这两种疾病的靶向治疗。版权所有© 2023年苏,卢,吴,李和段。
Introduction: Lung cancer is one of the most frequent neoplasms worldwide with approximately 2.2 million new cases and 1.8 million deaths each year. The expression levels of programmed death ligand-1 (PDL1) demonstrate a complex association with lung cancer. Neuroblastoma is a high-risk malignant tumor and is mainly involved in childhood patients. Identification of new biomarkers for these two diseases can significantly promote their diagnosis and therapy. However, in vivo experiments to discover potential biomarkers are costly and laborious. Consequently, artificial intelligence technologies, especially machine learning methods, provide a powerful avenue to find new biomarkers for various diseases. Methods: We developed a machine learning-based method named LDAenDL to detect potential long noncoding RNA (lncRNA) biomarkers for lung cancer and neuroblastoma using an ensemble of a deep neural network and LightGBM. LDAenDL first computes the Gaussian kernel similarity and functional similarity of lncRNAs and the Gaussian kernel similarity and semantic similarity of diseases to obtain their similar networks. Next, LDAenDL combines a graph convolutional network, graph attention network, and convolutional neural network to learn the biological features of the lncRNAs and diseases based on their similarity networks. Third, these features are concatenated and fed to an ensemble model composed of a deep neural network and LightGBM to find new lncRNA-disease associations (LDAs). Finally, the proposed LDAenDL method is applied to identify possible lncRNA biomarkers associated with lung cancer and neuroblastoma. Results: The experimental results show that LDAenDL computed the best AUCs of 0.8701, 107 0.8953, and 0.9110 under cross-validation on lncRNAs, diseases, and lncRNA-disease pairs on Dataset 1, respectively, and 0.9490, 0.9157, and 0.9708 on Dataset 2, respectively. Furthermore, AUPRs of 0.8903, 0.9061, and 0.9166 under three cross-validations were obtained on Dataset 1, and 0.9582, 0.9122, and 0.9743 on Dataset 2. The results demonstrate that LDAenDL significantly outperformed the other four classical LDA prediction methods (i.e., SDLDA, LDNFSGB, IPCAF, and LDASR). Case studies demonstrate that CCDC26 and IFNG-AS1 may be new biomarkers of lung cancer, SNHG3 may associate with PDL1 for lung cancer, and HOTAIR and BDNF-AS may be potential biomarkers of neuroblastoma. Conclusion: We hope that the proposed LDAenDL method can help the development of targeted therapies for these two diseases.Copyright © 2023 Su, Lu, Wu, Li and Duan.