基于特征线性关系和图卷积网络的组学数据分析方法。
An omics data analysis method based on feature linear relationship and graph convolutional network.
发表日期:2023 Aug 25
作者:
Yanhui Zhang, Xiaohui Lin, Zhenbo Gao, Tianxiang Wang, Kunjie Dong, Jianjun Zhang
来源:
JOURNAL OF BIOMEDICAL INFORMATICS
摘要:
已知生物网络具有高度模块化特性,网络模块的功能紊乱可能导致疾病。从组学数据中定义关键模块并建立分类模型有助于促进疾病诊断和预测的研究。然而,在应用模块进行疾病状态判别的下游分析时,大多数方法只利用节点信息,忽略节点间的相互作用或拓扑信息,这可能导致误报和限制模型性能。本文提出了一种基于特征线性关系和图卷积网络(LCNet)的组学数据分析方法。在LCNet中,我们采用了一种应用特征线性关系差异来表征生理和病理变化并构建差异线性关系网络的方法,从特征线性关系的角度来看它是简单且可解释的。我们开发了一种贪婪策略来搜索具有较强判别能力的高度互动模块。为了充分利用检测到的模块信息,我们定义了基于模块的个性化子图用于每个样本,并使用图卷积网络(GCN)分类器训练来预测样本标签。公共数据集上的实验结果表明,LCNet在分类性能上具有优越性。对于乳腺癌代谢数据,LCNet确定的代谢物涉及重要通路。因此,LCNet可以通过特征线性关系和贪婪策略来识别模块生物标记物,并通过个性化子图和GCN为样本标记。它为更好地进行疾病分类提供了一种利用定义的模块中的节点(分子)信息和拓扑信息的新方式。版权所有 © 2023,由Elsevier Inc.出版。
Biological networks are known to be highly modular, and the dysfunction of network modules may cause diseases. Defining the key modules from the omics data and establishing the classification model is helpful in promoting the research of disease diagnosis and prognosis. However, for applying modules in downstream analysis such as disease states discrimination, most methods only utilize the node information, and ignore the node interactions or topological information, which may lead to false positives and limit the model performance. In this study, we propose an omics data analysis method based on feature linear relationship and graph convolutional network (LCNet). In LCNet, we adopt a way of applying the difference of feature linear relationships during disease development to characterize physiological and pathological changes and construct the differential linear relation network, which is simple and interpretable from the perspective of feature linear relationship. A greedy strategy is developed for searching the highly interactive modules with a strong discrimination ability. To fully utilize the information of the detected modules, the personalized sub-graphs for each sample based on the modules are defined, and the graph convolutional network (GCN) classifiers are trained to predict the sample labels. The experimental results on public datasets show the superiority of LCNet in classification performance. For Breast Cancer metabolic data, the identified metabolites by LCNet involve important pathways. Thus, LCNet can identify the module biomarkers by feature linear relationship and a greedy strategy, and label samples by personalized sub-graphs and GCN. It provides a new manner of utilizing node (molecule) information and topological information in the defined modules for better disease classification.Copyright © 2023. Published by Elsevier Inc.