基于注意力的深度聚类方法用于 scRNA-seq 细胞类型识别。
Attention-based deep clustering method for scRNA-seq cell type identification.
发表日期:2023 Nov 10
作者:
Shenghao Li, Hui Guo, Simai Zhang, Yizhou Li, Menglong Li
来源:
GENES & DEVELOPMENT
摘要:
单细胞测序 (scRNA-seq) 技术提供比批量 RNA 测序更高分辨率的细胞差异,并揭示生物学研究中的异质性。 scRNA-seq 数据集的分析以亚群分配为前提。当没有适当的参考(例如特定标记基因和单细胞参考图谱)时,无监督聚类方法成为主要选择。然而,scRNA-seq 数据集固有的稀疏性和高维性给传统聚类方法带来了特定的分析挑战。因此,人们提出了各种基于深度学习的方法来应对这些挑战。由于每种方法都有部分改进,因此需要提出一种综合方法。在本文中,我们提出了一种新颖的 scRNA-seq 数据聚类方法,名为 AttentionAE-sc(Attention fusion AutoEncoder for single-cell)。两种不同的 scRNA-seq 聚类策略通过注意力机制结合在一起,包括处理丢失事件影响的基于零膨胀负二项式 (ZINB) 的方法和依赖邻居信息指导的基于图自动编码器 (GAE) 的方法维数减少。基于去噪和拓扑嵌入之间的迭代融合,AttentionAE-sc可以轻松获得聚类友好的细胞表示,即相似的细胞在隐藏嵌入中更接近。与几种最先进的基线方法相比,AttentionAE-sc 在 16 个真实的 scRNA-seq 数据集上展示了出色的聚类性能,而无需指定组数。此外,AttentionAE-sc 学习了改进的细胞表征,并表现出增强的稳定性和鲁棒性。此外,AttentionAE-sc 在乳腺癌单细胞图谱数据集中实现了显着的识别,并为不同细胞亚型之间的异质性提供了有价值的见解。版权所有:© 2023 Li 等人。这是一篇根据知识共享署名许可条款分发的开放获取文章,允许在任何媒体上不受限制地使用、分发和复制,前提是注明原始作者和来源。
Single-cell sequencing (scRNA-seq) technology provides higher resolution of cellular differences than bulk RNA sequencing and reveals the heterogeneity in biological research. The analysis of scRNA-seq datasets is premised on the subpopulation assignment. When an appropriate reference is not available, such as specific marker genes and single-cell reference atlas, unsupervised clustering approaches become the predominant option. However, the inherent sparsity and high-dimensionality of scRNA-seq datasets pose specific analytical challenges to traditional clustering methods. Therefore, a various deep learning-based methods have been proposed to address these challenges. As each method improves partially, a comprehensive method needs to be proposed. In this article, we propose a novel scRNA-seq data clustering method named AttentionAE-sc (Attention fusion AutoEncoder for single-cell). Two different scRNA-seq clustering strategies are combined through an attention mechanism, that include zero-inflated negative binomial (ZINB)-based methods dealing with the impact of dropout events and graph autoencoder (GAE)-based methods relying on information from neighbors to guide the dimension reduction. Based on an iterative fusion between denoising and topological embeddings, AttentionAE-sc can easily acquire clustering-friendly cell representations that similar cells are closer in the hidden embedding. Compared with several state-of-art baseline methods, AttentionAE-sc demonstrated excellent clustering performance on 16 real scRNA-seq datasets without the need to specify the number of groups. Additionally, AttentionAE-sc learned improved cell representations and exhibited enhanced stability and robustness. Furthermore, AttentionAE-sc achieved remarkable identification in a breast cancer single-cell atlas dataset and provided valuable insights into the heterogeneity among different cell subtypes.Copyright: © 2023 Li et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.