全基因组测序数据中融合转录本的快速敏感验证。
Fast and sensitive validation of fusion transcripts in whole-genome sequencing data.
发表日期:2023 Sep 23
作者:
Völundur Hafstað, Jari Häkkinen, Helena Persson
来源:
GENES & DEVELOPMENT
摘要:
在癌症中,基因组重排可以创建融合基因,这些融合基因要么将两个不同合作基因的蛋白编码序列组合在一起,要么将一个基因置于另一个基因的启动子控制下。这些融合基因可以作为肿瘤发展中的致癌驱动因子,并且已成功将几个涉及激酶的融合利用为药物靶点。在RNA测序(RNA-Seq)数据中可以识别表达的融合基因,但融合预测软件通常具有高比例的假阳性融合转录本预测。这对于研究和临床应用都具有问题。
我们描述了一种验证RNA-Seq检测到的融合转录本的方法,该方法使用匹配的全基因组测序(WGS)数据中的不一致的读对来识别支持的融合事件,并通过分析软剪切读对齐来确定基因组断点。我们在肿瘤和癌症细胞系的匹配RNA-Seq和WGS数据上进行了测试,并显示它可以用于验证新的预测基因融合和实验证实融合事件。与专门用于在全基因组范围内检测许多不同类型结构变异的BreakDancer和Manta软件相比,该方法更快且更敏感。
我们开发了一种快速且非常敏感的流程,用于验证RNA-Seq检测到的基因融合,该流程使用匹配的WGS数据。它可用于识别高质量的基因融合,以进一步进行生物信息学和实验证实研究,包括验证基因组断点和研究产生融合的机制。在临床环境中,它可以帮助找到用于个体化治疗的表达基因融合。
© 2023. BioMed Central Ltd., part of Springer Nature.
In cancer, genomic rearrangements can create fusion genes that either combine protein-coding sequences from two different partner genes or place one gene under the control of the promoter of another gene. These fusion genes can act as oncogenic drivers in tumor development and several fusions involving kinases have been successfully exploited as drug targets. Expressed fusions can be identified in RNA sequencing (RNA-Seq) data, but fusion prediction software often has a high fraction of false positive fusion transcript predictions. This is problematic for both research and clinical applications.We describe a method for validation of fusion transcripts detected by RNA-Seq in matched whole-genome sequencing (WGS) data. Our pipeline uses discordant read pairs to identify supported fusion events and analyzes soft-clipped read alignments to determine genomic breakpoints. We have tested it on matched RNA-Seq and WGS data for both tumors and cancer cell lines and show that it can be used to validate both new predicted gene fusions and experimentally validated fusion events. It was considerably faster and more sensitive than using BreakDancer and Manta, software that is instead designed to detect many different types of structural variants on a genome-wide scale.We have developed a fast and very sensitive pipeline for validation of gene fusions detected by RNA-Seq in matched WGS data. It can be used to identify high-quality gene fusions for further bioinformatic and experimental studies, including validation of genomic breakpoints and studies of the mechanisms that generate fusions. In a clinical setting, it could help find expressed gene fusions for personalized therapy.© 2023. BioMed Central Ltd., part of Springer Nature.