多尺度高效图变换器用于整个切片图像分类。
Multi-scale Efficient Graph-Transformer for Whole Slide Image Classification.
发表日期:2023 Sep 19
作者:
Saisai Ding, Juncheng Li, Jun Wang, Shihui Ying, Jun Shi
来源:
IEEE Journal of Biomedical and Health Informatics
摘要:
整个切片图像(WSIs)之间的多尺度信息对于癌症诊断至关重要。尽管现有的多尺度视觉Transformer已经证明了其在学习多尺度图像表示方面的有效性,但由于WSIs具有极大的图像尺寸,它在处理千亿像素级别的WSIs上仍然效果不佳。为此,我们提出了一种新颖的用于WSI分类的多尺度高效图Transformer(MEGT)框架。MEGT的关键思想是采用两个独立的高效基于图的Transformer(EGT)分支分别处理WSIs的低分辨率和高分辨率的补丁嵌入(即Transformer中的记号),然后通过多尺度特征融合模块(MFFM)融合这些记号。具体而言,我们设计了一个EGT来高效学习补丁记号的局部-全局信息,该方法将图表示集成到Transformer中以捕捉WSIs的空间相关信息。同时,我们提出了一种新颖的MFFM来减轻特征融合过程中不同分辨率补丁之间的语义差距,通过交叉注意机制,为每个分支创建一个非补丁记号代理来与另一个分支交换信息。此外,为了加速网络训练,我们在EGT中开发了一种新的记号剪枝模块,以减少冗余的记号。对TCGA-RCC和CAMELYON16数据集进行的大量实验证明了所提出的MEGT的有效性。
The multi-scale information among the whole slide images (WSIs) is essential for cancer diagnosis. Although the existing multi-scale vision Transformer has shown its effectiveness for learning multi-scale image representation, it still cannot work well on the gigapixel WSIs due to their extremely large image sizes. To this end, we propose a novel Multi-scale Efficient Graph-Transformer (MEGT) framework for WSI classification. The key idea of MEGT is to adopt two independent efficient Graph-based Transformer (EGT) branches to process the low-resolution and high-resolution patch embeddings (i.e., tokens in a Transformer) of WSIs, respectively, and then fuse these tokens via a multi-scale feature fusion module (MFFM). Specifically, we design an EGT to efficiently learn the local-global information of patch tokens, which integrates the graph representation into Transformer to capture spatial-related information of WSIs. Meanwhile, we propose a novel MFFM to alleviate the semantic gap among different resolution patches during feature fusion, which creates a non-patch token for each branch as an agent to exchange information with another branch by cross-attention mechanism. In addition, to expedite network training, a new token pruning module is developed in EGT to reduce the redundant tokens. Extensive experiments on both TCGA-RCC and CAMELYON16 datasets demonstrate the effectiveness of the proposed MEGT.