研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

蛋白质基因组数据分析的统计和计算方法。

Statistical and Computational Methods for Proteogenomic Data Analysis.

发表日期:2023
作者: Xiaoyu Song
来源: GENOMICS PROTEOMICS & BIOINFORMATICS

摘要:

蛋白质是几乎所有细胞和生物过程的功能分子,也是大多数药物的靶点。蛋白质采用复杂的多级调控,因此它们的丰度水平与其mRNA表达水平不太相关。蛋白质的结构、活性和功能作用受到翻译后修饰(PTM)的影响,而这些修饰与mRNA表达水平的相关性甚至比蛋白质丰度还要低。全面的蛋白质组数据表征对于理解生物系统的分子和细胞机制和开发新治疗方法至关重要。目前大规模的蛋白质组分析技术,如质谱法,提供了相对鉴定肽和蛋白质的方法,但是数据容易受到异常值、批次效应和非随机缺失的影响。为了进行高质量的蛋白质组数据分析,我们将首先介绍数据预处理和质量控制流程,包括归一化、异常值检测和去除、批次效应识别和处理以及缺失数据插补。然后,我们将描述几种统计方法,利用良好处理的蛋白质组数据生成科学发现,特别是与基因组和转录组的整合。这些方法涵盖了联合分析、网络构建、聚类和细胞类型分离等主题。为了证明这些方法,我们将使用临床蛋白质组肿瘤分析联盟肺鳞状细胞癌研究的蛋白质组学数据,并提供数据访问和分析的示例代码。 ©2023年作者。在Springer Nature旗下的Springer Science + Business Media,LLC独家许可下发表。
Proteins are the functional molecules for almost all cellular and biological processes. They are also the targets of most drugs. Proteins employ complex, multilevel regulations, so their abundance levels do not well correlated with their mRNA expression levels. The structure, activity, and functional roles of proteins are affected by posttranslational modifications (PTM), which are even less correlated with mRNA expression levels than protein abundances. Comprehensive characterization of the proteomics data is critical for understanding the molecular and cellular mechanisms of biological systems and developing news therapeutics. Current large-scale proteomic profiling technologies, such as mass spectrometry, provide relative identification of peptides and proteins, with data vulnerable to outliers, batch effects, and nonrandom missingness. In order to perform high-quality proteomic data analysis, we will first introduce a data preprocessing and quality control pipeline that includes normalization, outlier detection and removal, batch effect identification and handling, and missing data imputation. Then, we will describe several statistical methods that leverage well-processed proteomic data to generate scientific discoveries, especially with an integration with genomics and transcriptomics. These methods cover topics like association analysis, network construction, clustering, and cell-type deconvolution. To demonstrate these methods, we will use the proteogenomic data from the lung squamous cell carcinoma study of the Clinical Proteomic Tumor Analysis Consortium and provide sample codes for data access and analyses.© 2023. The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature.