基于RNA测序的跨队列计算框架,用于追踪肿瘤组织起源
A cross-cohort computational framework to trace tumor tissue-of-origin based on RNA sequencing.
发表日期:2023 Sep 16
作者:
Binsheng He, Hongmei Sun, Meihua Bao, Haigang Li, Jianjun He, Geng Tian, Bo Wang
来源:
GENES & DEVELOPMENT
摘要:
原发部位不明的癌症(CUP)是一种转移性癌症,其原发组织(TOO)无法通过传统方法确定。CUP患者通常预后不良,但针对原发癌组织的治疗可以显著改善患者预后。因此,开发精确的计算方法推断癌症原发组织是至关重要的。虽然qPCR或基于微阵列的方法在推断大多数癌症类型的原发组织中是有效的,但整体预测准确性仍有待提高。在本研究中,我们提出了一个基于RNA测序(RNA-seq)的跨队列计算框架,以追踪32种癌症类型的原发组织。具体而言,我们使用逻辑回归模型为每种癌症类型选择了80个基因,创建了一个包含1356个基因的组合基因集合,该集合基于来自癌症基因组图谱(TCGA)的9911个组织样本的转录组数据,涵盖了32种已知原发组织的癌症类型。选择的基因在组织特异性和组织通用功能上丰富。我们的框架在所有癌症类型中的交叉验证准确性达到97.50%。此外,我们在TCGA转移性数据集和国际癌症基因组联盟(ICGC)数据集上测试了我们模型的性能,分别达到91.09%和82.67%的准确性,尽管实验程序和流程存在差异。总之,我们开发了一种准确且稳健的计算框架用于识别原发组织,具有临床应用的前景。我们的代码可在http://github.com/wangbo00129/classifybysklearn 上获取。© 2023. Springer Nature Limited.
Carcinoma of unknown primary (CUP) is a type of metastatic cancer with tissue-of-origin (TOO) unidentifiable by traditional methods. CUP patients typically have poor prognosis but therapy targeting the original cancer tissue can significantly improve patients' prognosis. Thus, it's critical to develop accurate computational methods to infer cancer TOO. While qPCR or microarray-based methods are effective in inferring TOO for most cancer types, the overall prediction accuracy is yet to be improved. In this study, we propose a cross-cohort computational framework to trace TOO of 32 cancer types based on RNA sequencing (RNA-seq). Specifically, we employed logistic regression models to select 80 genes for each cancer type to create a combined 1356-gene set, based on transcriptomic data from 9911 tissue samples covering the 32 cancer types with known TOO from the Cancer Genome Atlas (TCGA). The selected genes are enriched in both tissue-specific and tissue-general functions. The cross-validation accuracy of our framework reaches 97.50% across all cancer types. Furthermore, we tested the performance of our model on the TCGA metastatic dataset and International Cancer Genome Consortium (ICGC) dataset, achieving an accuracy of 91.09% and 82.67%, respectively, despite the differences in experiment procedures and pipelines. In conclusion, we developed an accurate yet robust computational framework for identifying TOO, which holds promise for clinical applications. Our code is available at http://github.com/wangbo00129/classifybysklearn .© 2023. Springer Nature Limited.