STGNNks:基于图神经网络、去噪自编码器和k-和聚类的空间转录组数据中的细胞类型识别。
STGNNks: Identifying cell types in spatial transcriptomics data based on graph neural network, denoising auto-encoder, and k-sums clustering.
发表日期:2023 Sep 09
作者:
Lihong Peng, Xianzhi He, Xinhuai Peng, Zejun Li, Li Zhang
来源:
COMPUTERS IN BIOLOGY AND MEDICINE
摘要:
空间转录组技术充分利用了空间位置信息、组织形态特征和转录表达谱。整合这些数据可以极大地推进我们对细胞生物学及形态学背景下的理解。我们开发了一种创新的空间聚类方法,称为STGNNks,通过结合图神经网络、去噪自编码器和k-和聚类,首先对空间解析转录组数据进行预处理,构建一个混合邻接矩阵。接下来,通过基于深层图信息最大化的图卷积网络对基因表达和空间上下文进行整合,学习斑点的嵌入特征。然后,通过基于零膨胀负二项分布(ZINB)的去噪自编码器将学习到的特征映射到低维空间。接下来,结合k-均值聚类和比值割聚类算法,开发了一种k-和聚类算法来识别空间域。最后,在六个10x Genomics Visium数据集上基于伪时空方法实现了空间轨迹推断、空间可变基因识别和差异表达基因检测。我们将我们提出的STGNNks方法与其他五种空间聚类方法CCST、Seurat、stLearn、Scanpy和SEDR进行了比较。首次使用了四个机器学习领域的内部指标(轮廓系数、Davies-Bouldin指数、Calinski-Harabasz指数和S_Dbw指数)来衡量STGNNks与CCST、Seurat、stLearn、Scanpy和SEDR在五个无标签的空间转录组数据集上的聚类性能,并使用调整兰德指数(ARI)和归一化互信息(NMI)来评估以上六种方法在带有真实标签的Human Breast Cancer (Block A Section 1)上的性能。比较实验证明,STGNNks具有最小的Davies-Bouldin和S_Dbw值,最大的轮廓系数、Calinski-Harabasz、ARI和NMI,在上述五种空间转录组分析算法中表现显著优越。此外,我们在上述五个无标签数据集中检测到了每个聚类中的前六个空间可变基因和前五个差异表达基因。通过层次布局的伪时空树图绘制,展示了Human Breast Cancer (Block A Section 1)在三个浸润性导管癌区域到多个原位导管癌亚簇之间的进展流程。我们预计STGNNks能够高效改进空间转录组数据分析,并进一步促进相关疾病的诊断和治疗。代码公开可用,网址为https://github.com/plhhnu/STGNNks。版权所有 © 2023. Elsevier Ltd. 发布。
Spatial transcriptomics technologies fully utilize spatial location information, tissue morphological features, and transcriptional profiles. Integrating these data can greatly advance our understanding about cell biology in the morphological background.We developed an innovative spatial clustering method called STGNNks by combining graph neural network, denoising auto-encoder, and k-sums clustering. First, spatial resolved transcriptomics data are preprocessed and a hybrid adjacency matrix is constructed. Next, gene expressions and spatial context are integrated to learn spots' embedding features by a deep graph infomax-based graph convolutional network. Third, the learned features are mapped to a low-dimensional space through a zero-inflated negative binomial (ZINB)-based denoising auto-encoder. Fourth, a k-sums clustering algorithm is developed to identify spatial domains by combining k-means clustering and the ratio-cut clustering algorithms. Finally, it implements spatial trajectory inference, spatially variable gene identification, and differentially expressed gene detection based on the pseudo-space-time method on six 10x Genomics Visium datasets.We compared our proposed STGNNks method with five other spatial clustering methods, CCST, Seurat, stLearn, Scanpy and SEDR. For the first time, four internal indicators in the area of machine learning, that is, silhouette coefficient, the Davies-Bouldin index, the Caliniski-Harabasz index, and the S_Dbw index, were used to measure the clustering performance of STGNNks with CCST, Seurat, stLearn, Scanpy and SEDR on five spatial transcriptomics datasets without labels (i.e., Adult Mouse Brain (FFPE), Adult Mouse Kidney (FFPE), Human Breast Cancer (Block A Section 2), Human Breast Cancer (FFPE), and Human Lymph Node). And two external indicators including adjusted Rand index (ARI) and normalized mutual information (NMI) were applied to evaluate the performance of the above six methods on Human Breast Cancer (Block A Section 1) with real labels. The comparison experiments elucidated that STGNNks obtained the smallest Davies-Bouldin and S_Dbw values and the largest Silhouette Coefficient, Caliniski-Harabasz, ARI and NMI, significantly outperforming the above five spatial transcriptomics analysis algorithms. Furthermore, we detected the top six spatially variable genes and the top five differentially expressed genes in each cluster on the above five unlabeled datasets. And the pseudo-space-time tree plot with hierarchical layout demonstrated a flow of Human Breast Cancer (Block A Section 1) progress in three clades branching from three invasive ductal carcinoma regions to multiple ductal carcinoma in situ sub-clusters.We anticipate that STGNNks can efficiently improve spatial transcriptomics data analysis and further boost the diagnosis and therapy of related diseases. The codes are publicly available at https://github.com/plhhnu/STGNNks.Copyright © 2023. Published by Elsevier Ltd.