将基于GAN的分类器应用于改善基于转录组的乳腺癌预后。
Applying a GAN-based classifier to improve transcriptome-based prognostication in breast cancer.
发表日期:2023 Apr 03
作者:
Cristiano Guttà, Christoph Morhard, Markus Rehm
来源:
PLoS Computational Biology
摘要:
基于有限的转录本的已建立的预后测试可以识别高风险的乳腺癌患者,但仅适用于具有特定临床特征或疾病特征的个体。深度学习算法可以潜在地根据完整的转录组数据分层患者队列,但是,在omcs数据集中变量的数量通常远远超过患者数量,从而影响健壮的分类器的开发。为了克服这个障碍,我们提出了一个分类器,其基于数据增强管道,包括具有梯度惩罚的Wasserstein生成对抗网络(GAN)和一个嵌入式辅助分类器,以获得经过训练的GAN鉴别器(T-GAN-D)。在METABRIC乳腺癌队列的1244例患者中应用此分类器,该分类器在分离低风险患者和高风险患者(特定疾病死亡、进展或初步诊断后10年内复发)方面优于已建立的乳腺癌生物标志物。重要的是,T-GAN-D在独立的、分别的转录组数据集(METABRIC和TCGA-BRCA队列)上也表现出色,合并数据可以改善整体患者分层。总之,迭代的GAN基础训练过程允许生成一个健壮的分类器,该分类器能够根据完整的转录组数据和独立的、异质的乳腺癌队列分层低风险患者和高风险患者。版权:©2023 Guttà等人。本文是根据创作共用许可证发布的开放获取文章,只要在文章作者和来源原件的情况下,可以在任何媒体上不限制地使用、发布和复制。
Established prognostic tests based on limited numbers of transcripts can identify high-risk breast cancer patients yet are approved only for individuals presenting with specific clinical features or disease characteristics. Deep learning algorithms could hold potential for stratifying patient cohorts based on full transcriptome data, yet the development of robust classifiers is hampered by the number of variables in omics datasets typically far exceeding the number of patients. To overcome this hurdle, we propose a classifier based on a data augmentation pipeline consisting of a Wasserstein generative adversarial network (GAN) with gradient penalty and an embedded auxiliary classifier to obtain a trained GAN discriminator (T-GAN-D). Applied to 1244 patients of the METABRIC breast cancer cohort, this classifier outperformed established breast cancer biomarkers in separating low- from high-risk patients (disease specific death, progression or relapse within 10 years from initial diagnosis). Importantly, the T-GAN-D also performed across independent, merged transcriptome datasets (METABRIC and TCGA-BRCA cohorts), and merging data improved overall patient stratification. In conclusion, the reiterative GAN-based training process allowed generating a robust classifier capable of stratifying low- vs high-risk patients based on full transcriptome data and across independent and heterogeneous breast cancer cohorts.Copyright: © 2023 Guttà et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.