使用增强型GraphSAGE从人类肠道微生物组数据进行自动疾病预测。
Automatic disease prediction from human gut metagenomic data using boosting GraphSAGE.
发表日期:2023 Mar 31
作者:
K Syama, J Angel Arul Jothi, Namita Khanna
来源:
Disease Models & Mechanisms
摘要:
人类微生物组在维持人类健康方面起着关键作用。由于高通量测序技术的最新进展,人体内的微生物组文件已经公开可用。因此,许多工作已经进行了分析人类微生物组文件。这些研究已经发现对于不同疾病,健康和患病个体存在不同的微生物组文件。最近,几种计算方法已利用微生物组文件自动诊断和分类宿主表型。在这项工作中,提出了一种基于提升GraphSAGE的新型深度学习框架,用于从宏基因组数据自动预测疾病。所提出的框架有两个主要组成部分:(a)宏基因组疾病图构建模块,(b)疾病预测网络(DP-Net)模块。图构建模块通过将每个宏基因组样本视为图中的节点来构建图。该图利用接近度测量捕捉样本之间的关系。DP-Net包括一个提升GraphSAGE模型,该模型将样本的状态预测为患病或健康。使用与炎症性肠病和结肠癌等疾病相对应的真实和合成数据集验证了所提出方法的有效性。所提出的模型在真实炎症性肠病数据集中获得了最高的AUC为93%,准确率为95%,F1值为95%,AUPRC为95%,最佳的AUC为90%,准确率为91%,F1值为87%,AUPRC为93%,在真实结肠癌数据集上表现最佳。所提出的框架在分类准确性、AUC、F1值和AUPRC方面优于其他机器学习和深度学习模型,适用于合成和真实的宏基因组数据集。© 2023作者(S)。
The human microbiome plays a critical role in maintaining human health. Due to the recent advances in high-throughput sequencing technologies, the microbiome profiles present in the human body have become publicly available. Hence, many works have been done to analyze human microbiome profiles. These works have identified that different microbiome profiles are present in healthy and sick individuals for different diseases. Recently, several computational methods have utilized the microbiome profiles to automatically diagnose and classify the host phenotype.In this work, a novel deep learning framework based on boosting GraphSAGE is proposed for automatic prediction of diseases from metagenomic data. The proposed framework has two main components, (a). Metagenomic Disease graph (MD-graph) construction module, (b). Disease prediction Network (DP-Net) module. The graph construction module constructs a graph by considering each metagenomic sample as a node in the graph. The graph captures the relationship between the samples using a proximity measure. The DP-Net consists of a boosting GraphSAGE model which predicts the status of a sample as sick or healthy. The effectiveness of the proposed method is verified using real and synthetic datasets corresponding to diseases like inflammatory bowel disease and colorectal cancer. The proposed model achieved a highest AUC of 93%, Accuracy of 95%, F1-score of 95%, AUPRC of 95% for the real inflammatory bowel disease dataset and a best AUC of 90%, Accuracy of 91%, F1-score of 87% and AUPRC of 93% for the real colorectal cancer dataset.The proposed framework outperforms other machine learning and deep learning models in terms of classification accuracy, AUC, F1-score and AUPRC for both synthetic and real metagenomic data.© 2023. The Author(s).