单细胞数据的稳健性差异组成和变异性分析。
sccomp: Robust differential composition and variability analysis for single-cell data.
发表日期:2023 Aug 15
作者:
Stefano Mangiola, Alexandra J Roth-Schulze, Marie Trussart, Enrique Zozaya-Valdés, Mengyao Ma, Zijie Gao, Alan F Rubin, Terence P Speed, Heejung Shim, Antony T Papenfuss
来源:
Cellular & Molecular Immunology
摘要:
单细胞基因组学、蛋白质组学和微生物组学等细胞组学技术可以对组织和微生物群落的组成进行表征,从而可以通过对比不同状况下的数据来鉴定生物学驱动因素。该策略对揭示疾病进展的标志物,如癌症和病原体感染,至关重要。目前,缺乏一种专用于细胞组学数据的差异变异性分析的统计方法,而现有的差异组成分析方法也没有对一些组成数据特性进行建模,这表明模型性能有待改进。在这里,我们引入了一种名为sccomp的方法,用于差异组成和差异变异性分析,该方法同时对数据计数分布、组成性、组特异性变异性和比例均方差关联进行建模,并考虑离群值。sccomp提供了一个全面的分析框架,可以进行真实数据模拟和跨研究知识转移。在本研究中,我们证明了均方差关联在各种技术中普遍存在,突显了非常流行的狄利克雷-多项式分布的不足。我们展示了sccomp能够准确地拟合实验数据,并且在性能上明显优于最先进的算法。使用sccomp,我们在原发性乳腺癌的微环境中鉴定出了差异约束和组成。
Cellular omics such as single-cell genomics, proteomics, and microbiomics allow the characterization of tissue and microbial community composition, which can be compared between conditions to identify biological drivers. This strategy has been critical to revealing markers of disease progression, such as cancer and pathogen infection. A dedicated statistical method for differential variability analysis is lacking for cellular omics data, and existing methods for differential composition analysis do not model some compositional data properties, suggesting there is room to improve model performance. Here, we introduce sccomp, a method for differential composition and variability analyses that jointly models data count distribution, compositionality, group-specific variability, and proportion mean-variability association, being aware of outliers. sccomp provides a comprehensive analysis framework that offers realistic data simulation and cross-study knowledge transfer. Here, we demonstrate that mean-variability association is ubiquitous across technologies, highlighting the inadequacy of the very popular Dirichlet-multinomial distribution. We show that sccomp accurately fits experimental data, significantly improving performance over state-of-the-art algorithms. Using sccomp, we identified differential constraints and composition in the microenvironment of primary breast cancer.