基因组学数据分析通过谱形和拓扑。
Genomics data analysis via spectral shape and topology.
发表日期:2023
作者:
Erik J Amézquita, Farzana Nasrin, Kathleen M Storey, Masato Yoshizawa
来源:
GENES & DEVELOPMENT
摘要:
Mapper是一种拓扑算法,通常被用作探索性工具来构建数据的图形表示。这种表示可以帮助我们更好地理解高维基因组数据的固有形状,并保留可能在标准降维算法中遗失的信息。我们提出了一种新的工作流程来处理和分析来自肿瘤和健康对象的RNA-seq数据,集成Mapper、差异基因表达和谱形分析。具体地说,我们展示了高斯混合逼近方法可用于产生能够成功分离肿瘤和健康对象并生成两个肿瘤对象子群的图形结构。使用DESeq2进行进一步分析,DESeq2是一种用于检测差异表达基因的流行工具,结果显示这两个肿瘤细胞子群具有两种不同的基因调节方式,提示形成肺癌的两条离散路径,这是其他流行聚类方法(包括t-分布随机邻居嵌入)不能突出的。尽管Mapper在分析高维数据方面表现出了很多潜力,但现有文献中用于统计分析Mapper图形结构的工具非常有限。本文提出了一种使用热核签名的评分方法,为假设检验、敏感性分析和相关性分析等统计推断提供了实证设置。版权所有:©2023年Amézquita等。这是在知识共享署名许可下分发的开放获取文章,只要原作者和来源被署名,就允许在任何媒介中进行无限制的使用、分发和再生产。
Mapper, a topological algorithm, is frequently used as an exploratory tool to build a graphical representation of data. This representation can help to gain a better understanding of the intrinsic shape of high-dimensional genomic data and to retain information that may be lost using standard dimension-reduction algorithms. We propose a novel workflow to process and analyze RNA-seq data from tumor and healthy subjects integrating Mapper, differential gene expression, and spectral shape analysis. Precisely, we show that a Gaussian mixture approximation method can be used to produce graphical structures that successfully separate tumor and healthy subjects, and produce two subgroups of tumor subjects. A further analysis using DESeq2, a popular tool for the detection of differentially expressed genes, shows that these two subgroups of tumor cells bear two distinct gene regulations, suggesting two discrete paths for forming lung cancer, which could not be highlighted by other popular clustering methods, including t-distributed stochastic neighbor embedding (t-SNE). Although Mapper shows promise in analyzing high-dimensional data, tools to statistically analyze Mapper graphical structures are limited in the existing literature. In this paper, we develop a scoring method using heat kernel signatures that provides an empirical setting for statistical inferences such as hypothesis testing, sensitivity analysis, and correlation analysis.Copyright: © 2023 Amézquita et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.