TGMIL：基于Transformer和Graph Attention Network的肾细胞癌全切片图像分类的混合多实例学习模型。

TGMIL: A hybrid multi-instance learning model based on the Transformer and the Graph Attention Network for whole-slide images classification of renal cell carcinoma.

Original text

发表日期：2023 Sep 03

作者： Xinhuan Sun, Wuchao Li, Bangkang Fu, Yunsong Peng, Junjie He, Lihui Wang, Tongyin Yang, Xue Meng, Jin Li, Jinjing Wang, Ping Huang, Rongpin Wang

来源： Comput Meth Prog Bio

摘要：

肾细胞癌的病理诊断对治疗至关重要。目前，多实例学习方法常用于肾细胞癌全切片图像分类，主要基于独立相同分布的假设。但这与诊断过程中考虑不同实例间的相关性的需求不一致。此外，病理图像资源消耗高的问题仍急需解决。因此，我们提出了一种新的多实例学习方法来解决这个问题。在本研究中，我们提出了一种基于Transformer和Graph Attention Network的混合多实例学习模型，命名为TGMIL，以实现肾细胞癌全切片图像的分类，无需像素级标注或感兴趣区域提取。我们的方法分为三个步骤。首先，我们设计了一个由多个低倍镜全切片图像组成的特征金字塔，命名为MMFP。它使模型融入了更丰富的信息，并且相比最高倍镜，减少了内存消耗和训练时间。其次，TGMIL将Transformer和Graph Attention的能力相结合，巧妙地解决了实例上下文和空间丧失的问题。在Graph Attention网络流中，采用最大池化和平均池化的简单高效方法生成了图邻接矩阵，无需额外内存消耗。最后，TGMIL的两个流的输出被聚合以实现肾细胞癌的分类。在TCGA-RCC肾细胞癌公共数据集上，TGMIL的受试者工作特征曲线下面积（AUC）和准确率分别为0.98±0.0015、0.9191±0.0062。在肾细胞癌病理图像私有验证集上，达到了0.9386±0.0162的AUC和0.9197±0.0124的准确率，显示出显著的性能。此外，在公共乳腺癌全切片图像测试数据集CAMELYON 16上，我们的模型展示了良好的分类性能，准确率为0.8792。TGMIL模拟了病理学家的诊断过程，并在多个数据集上展现了良好的分类性能。同时，MMFP模块有效减少了资源需求，提供了一种探索计算病理图像的新角度。版权所有 © 2023 Elsevier B.V. 发表

The pathological diagnosis of renal cell carcinoma is crucial for treatment. Currently, the multi-instance learning method is commonly used for whole-slide image classification of renal cell carcinoma, which is mainly based on the assumption of independent identical distribution. But this is inconsistent with the need to consider the correlation between different instances in the diagnosis process. Furthermore, the problem of high resource consumption of pathology images is still urgent to be solved. Therefore, we propose a new multi-instance learning method to solve this problem.In this study, we proposed a hybrid multi-instance learning model based on the Transformer and the Graph Attention Network, called TGMIL, to achieve whole-slide image of renal cell carcinoma classification without pixel-level annotation or region of interest extraction. Our approach is divided into three steps. First, we designed a feature pyramid with the multiple low magnifications of whole-slide image named MMFP. It makes the model incorporates richer information, and reduces memory consumption as well as training time compared to the highest magnification. Second, TGMIL amalgamates the Transformer and the Graph Attention's capabilities, adeptly addressing the loss of instance contextual and spatial. Within the Graph Attention network stream, an easy and efficient approach employing max pooling and mean pooling yields the graph adjacency matrix, devoid of extra memory consumption. Finally, the outputs of two streams of TGMIL are aggregated to achieve the classification of renal cell carcinoma.On the TCGA-RCC validation set, a public dataset for renal cell carcinoma, the area under a receiver operating characteristic (ROC) curve (AUC) and accuracy of TGMIL were 0.98±0.0015,0.9191±0.0062, respectively. It showcased remarkable proficiency on the private validation set of renal cell carcinoma pathology images, attaining AUC of 0.9386±0.0162 and ACC of 0.9197±0.0124. Furthermore, on the public breast cancer whole-slide image test dataset, CAMELYON 16, our model showed good classification performance with an accuracy of 0.8792.TGMIL models the diagnostic process of pathologists and shows good classification performance on multiple datasets. Concurrently, the MMFP module efficiently diminishes resource requirements, offering a novel angle for exploring computational pathology images.Copyright © 2023. Published by Elsevier B.V.