多尺度高效图变换器用于整个切片图像分类。

Multi-scale Efficient Graph-Transformer for Whole Slide Image Classification.

Original text

发表日期：2023 Sep 19

作者： Saisai Ding, Juncheng Li, Jun Wang, Shihui Ying, Jun Shi

来源： IEEE Journal of Biomedical and Health Informatics

摘要：

整个切片图像（WSIs）之间的多尺度信息对于癌症诊断至关重要。尽管现有的多尺度视觉Transformer已经证明了其在学习多尺度图像表示方面的有效性，但由于WSIs具有极大的图像尺寸，它在处理千亿像素级别的WSIs上仍然效果不佳。为此，我们提出了一种新颖的用于WSI分类的多尺度高效图Transformer（MEGT）框架。MEGT的关键思想是采用两个独立的高效基于图的Transformer（EGT）分支分别处理WSIs的低分辨率和高分辨率的补丁嵌入（即Transformer中的记号），然后通过多尺度特征融合模块（MFFM）融合这些记号。具体而言，我们设计了一个EGT来高效学习补丁记号的局部-全局信息，该方法将图表示集成到Transformer中以捕捉WSIs的空间相关信息。同时，我们提出了一种新颖的MFFM来减轻特征融合过程中不同分辨率补丁之间的语义差距，通过交叉注意机制，为每个分支创建一个非补丁记号代理来与另一个分支交换信息。此外，为了加速网络训练，我们在EGT中开发了一种新的记号剪枝模块，以减少冗余的记号。对TCGA-RCC和CAMELYON16数据集进行的大量实验证明了所提出的MEGT的有效性。

The multi-scale information among the whole slide images (WSIs) is essential for cancer diagnosis. Although the existing multi-scale vision Transformer has shown its effectiveness for learning multi-scale image representation, it still cannot work well on the gigapixel WSIs due to their extremely large image sizes. To this end, we propose a novel Multi-scale Efficient Graph-Transformer (MEGT) framework for WSI classification. The key idea of MEGT is to adopt two independent efficient Graph-based Transformer (EGT) branches to process the low-resolution and high-resolution patch embeddings (i.e., tokens in a Transformer) of WSIs, respectively, and then fuse these tokens via a multi-scale feature fusion module (MFFM). Specifically, we design an EGT to efficiently learn the local-global information of patch tokens, which integrates the graph representation into Transformer to capture spatial-related information of WSIs. Meanwhile, we propose a novel MFFM to alleviate the semantic gap among different resolution patches during feature fusion, which creates a non-patch token for each branch as an agent to exchange information with another branch by cross-attention mechanism. In addition, to expedite network training, a new token pruning module is developed in EGT to reduce the redundant tokens. Extensive experiments on both TCGA-RCC and CAMELYON16 datasets demonstrate the effectiveness of the proposed MEGT.