PEACOCK: 一种机器学习方法,用于评估细胞类型特异性增强子-基因调控关系的有效性。
PEACOCK: a machine learning approach to assess the validity of cell type-specific enhancer-gene regulatory relationships.
发表日期:2023 Apr 03
作者:
Caitlin Mills, Crystal N Marconett, Juan Pablo Lewinger, Huaiyu Mi
来源:
npj Systems Biology and Applications
摘要:
基因组范围关联研究所发现的与疾病相关变异的绝大多数都映射到强有力的调控元件——增强子上。这些元件能在细胞类型和时序下协调转录复合物被招募至目标基因的启动子上,从而上调转录。这些变异使得许多常见遗传疾病,包括近乎所有癌症,与数千个增强子相关联。但是,由于绝大多数增强子的调控靶基因未知,所以这些疾病的大部分发病机制仍然未知。因此,尽可能地鉴定出多个增强子的靶基因对于学习增强子的调控活动在疾病中的功能和贡献至关重要。我们基于从科学文献中筛选出的实验结果和机器学习方法,开发了一种基于细胞类型的评分体系,用于预测增强子和靶基因之间的关系。我们对每个可能的顺式增强子-基因组合进行了基因组范围的计分,并在四个广泛使用的细胞系中验证了其预测能力。借助一个跨多个细胞类型训练的汇总最终模型,我们对所有可能的顺式基因-增强子调控联系(约17M)进行了评分,并将其添加到公开的PEREGRINE数据库中( www.peregrineproj.org )。这些评分为增强子-靶基因调控预测提供了一个数量化的框架,并可纳入下游的统计分析中。© 2023. 作者。
The vast majority of disease-associated variants identified in genome-wide association studies map to enhancers, powerful regulatory elements which orchestrate the recruitment of transcriptional complexes to their target genes' promoters to upregulate transcription in a cell type- and timing-dependent manner. These variants have implicated thousands of enhancers in many common genetic diseases, including nearly all cancers. However, the etiology of most of these diseases remains unknown because the regulatory target genes of the vast majority of enhancers are unknown. Thus, identifying the target genes of as many enhancers as possible is crucial for learning how enhancer regulatory activities function and contribute to disease. Based on experimental results curated from scientific publications coupled with machine learning methods, we developed a cell type-specific score predictive of an enhancer targeting a gene. We computed the score genome-wide for every possible cis enhancer-gene pair and validated its predictive ability in four widely used cell lines. Using a pooled final model trained across multiple cell types, all possible gene-enhancer regulatory links in cis (~17 M) were scored and added to the publicly available PEREGRINE database ( www.peregrineproj.org ). These scores provide a quantitative framework for the enhancer-gene regulatory prediction that can be incorporated into downstream statistical analyses.© 2023. The Author(s).