地理加权线性组合测试用于基因集分析的连续空间表型,如应用于肿瘤内非均质性。
Geographically weighted linear combination test for gene-set analysis of a continuous spatial phenotype as applied to intratumor heterogeneity.
发表日期:2023
作者:
Payam Amini, Morteza Hajihosseini, Saumyadipta Pyne, Irina Dinu
来源:
Frontiers in Cell and Developmental Biology
摘要:
背景: 基因集对于空间表型的影响在癌症组织的不同位置上不一定均匀。该研究引入了一个计算平台GWCLT,将基因集分析与空间数据建模相结合,为从输入的肿瘤样本中收集的空间单细胞RNA测序数据提供了一种新的统计测试方法,以研究表型和分子途径在空间上的位置特异性关联。 方法:GWCLT的主要优点在于超越全局显着性的分析,允许基因集和表型之间的关联在肿瘤空间中发生变化。在每个位置上,使用地理加权收缩协方差矩阵和核函数找到最显着的线性组合。是否采用固定或自适应带宽是基于交叉验证过程来确定的。我们的方法与全局线性组合测试(LCT)、基于批量和随机森林的基因集富集分析在通过Visium空间基因表达技术创建的侵袭性乳腺癌组织样本以及144种不同的模拟场景中进行了比较。 结果:在一个说明性示例中,新的地理加权线性组合测试GWLCT识别了癌症标志基因集,这些基因集与由不同众所周知的癌相关成纤维细胞标记定义的肿瘤中的五个空间连续表型背景在每个位置上显著相关。扫描统计学显示显著基因集数量的聚类。还生成了所有选定基因集的综合显著性的空间热图。广泛的模拟研究表明,我们提出的方法在考虑到空间关联增加时,优于其他方法。 结论:我们提出的方法考虑了基因表达的空间协方差,以检测影响连续表型的最显着的基因集。它揭示了组织空间中的空间详细信息,因此可以在理解癌细胞的情境异质性方面发挥关键作用。版权所有©2023 Amini,Hajihosseini,Pyne和Dinu。
Background: The impact of gene-sets on a spatial phenotype is not necessarily uniform across different locations of cancer tissue. This study introduces a computational platform, GWLCT, for combining gene set analysis with spatial data modeling to provide a new statistical test for location-specific association of phenotypes and molecular pathways in spatial single-cell RNA-seq data collected from an input tumor sample. Methods: The main advantage of GWLCT consists of an analysis beyond global significance, allowing the association between the gene-set and the phenotype to vary across the tumor space. At each location, the most significant linear combination is found using a geographically weighted shrunken covariance matrix and kernel function. Whether a fixed or adaptive bandwidth is determined based on a cross-validation cross procedure. Our proposed method is compared to the global version of linear combination test (LCT), bulk and random-forest based gene-set enrichment analyses using data created by the Visium Spatial Gene Expression technique on an invasive breast cancer tissue sample, as well as 144 different simulation scenarios. Results: In an illustrative example, the new geographically weighted linear combination test, GWLCT, identifies the cancer hallmark gene-sets that are significantly associated at each location with the five spatially continuous phenotypic contexts in the tumors defined by different well-known markers of cancer-associated fibroblasts. Scan statistics revealed clustering in the number of significant gene-sets. A spatial heatmap of combined significance over all selected gene-sets is also produced. Extensive simulation studies demonstrate that our proposed approach outperforms other methods in the considered scenarios, especially when the spatial association increases. Conclusion: Our proposed approach considers the spatial covariance of gene expression to detect the most significant gene-sets affecting a continuous phenotype. It reveals spatially detailed information in tissue space and can thus play a key role in understanding the contextual heterogeneity of cancer cells.Copyright © 2023 Amini, Hajihosseini, Pyne and Dinu.