SMG:用于癌症基因识别的自监督掩蔽图学习。
SMG: self-supervised masked graph learning for cancer gene identification.
发表日期:2023 Sep 22
作者:
Yan Cui, Zhikang Wang, Xiaoyu Wang, Yiwen Zhang, Ying Zhang, Tong Pan, Zhe Zhang, Shanshan Li, Yuming Guo, Tatsuya Akutsu, Jiangning Song
来源:
BIOMEDICINE & PHARMACOTHERAPY
摘要:
癌症基因组学致力于阐明有助于癌症进展和发展的基因和途径。识别与癌症发生和进展相关的癌症基因(CG)对于表征癌症研究中的分子水平机制至关重要。近年来,高通量分子数据的不断增加和深度学习技术的进步使得基因组数据中复杂的相互作用和拓扑信息的建模成为可能。然而,由于标记数据有限,从大量潜在突变中精确定位 CG 仍然是一项极具挑战性的任务。为了解决这个问题,我们提出了一种新颖的深度学习框架,称为自监督掩模图学习(SMG),其中包括SMG重建(借口任务)和特定于任务的微调(下游任务)。在借口任务中,多组学特征的蛋白质-蛋白质相互作用(PPI)网络的节点被随机替换为定义的掩码标记。然后使用基于图神经网络 (GNN) 的自动编码器重建 PPI 网络,该自动编码器以自我预测的方式探索节点相关性。在下游任务中,预先训练的 GNN 编码器将输入网络嵌入到特征图中,而特定于任务的层则进行最终预测。为了评估所提出的 SMG 方法的性能,在八个 PPI 网络中对三个节点级任务(CG、必需基因和健康驱动基因的识别)和一个图级任务(疾病子网络的识别)进行了基准测试实验。基准测试实验以及与现有最先进方法的性能比较证明了 SMG 在多组学特征工程方面的优越性。© 作者 2023。由牛津大学出版社出版。
Cancer genomics is dedicated to elucidating the genes and pathways that contribute to cancer progression and development. Identifying cancer genes (CGs) associated with the initiation and progression of cancer is critical for characterization of molecular-level mechanism in cancer research. In recent years, the growing availability of high-throughput molecular data and advancements in deep learning technologies has enabled the modelling of complex interactions and topological information within genomic data. Nevertheless, because of the limited labelled data, pinpointing CGs from a multitude of potential mutations remains an exceptionally challenging task. To address this, we propose a novel deep learning framework, termed self-supervised masked graph learning (SMG), which comprises SMG reconstruction (pretext task) and task-specific fine-tuning (downstream task). In the pretext task, the nodes of multi-omic featured protein-protein interaction (PPI) networks are randomly substituted with a defined mask token. The PPI networks are then reconstructed using the graph neural network (GNN)-based autoencoder, which explores the node correlations in a self-prediction manner. In the downstream tasks, the pre-trained GNN encoder embeds the input networks into feature graphs, whereas a task-specific layer proceeds with the final prediction. To assess the performance of the proposed SMG method, benchmarking experiments are performed on three node-level tasks (identification of CGs, essential genes and healthy driver genes) and one graph-level task (identification of disease subnetwork) across eight PPI networks. Benchmarking experiments and performance comparison with existing state-of-the-art methods demonstrate the superiority of SMG on multi-omic feature engineering.© The Author(s) 2023. Published by Oxford University Press.