提炼知识从多个视觉变换器集合中,以改进乳腺超声分类的能力。
Distilling Knowledge From an Ensemble of Vision Transformers for Improved Classification of Breast Ultrasound.
发表日期:2023 Sep 02
作者:
George Zhou, Bobak Mosadegh
来源:
ACADEMIC RADIOLOGY
摘要:
开发一种用于自动分类乳腺超声图像良性和恶性的深度学习模型。具体而言,本文研究了将视觉转换器、集成学习和知识蒸馏应用于乳腺超声分类。从公开可获取的乳腺超声图像(BUSI)数据集中整理了单视图的B模式超声图像,该数据集的放射科医生为每个图像分配了良性或恶性的类别标签,并通过活检确认了恶性病例。比较了视觉转换器(ViT)和卷积神经网络(CNN)的性能,随后比较了有监督、无监督和随机初始化的ViT之间的性能。然后,将独立训练的10个ViT模型组合成集成模型,其中集成模型是每个独立模型输出的无权平均值,并将其与每个ViT模型的性能进行比较。最后,使用知识蒸馏训练单个ViT模型以模拟集成ViT模型的性能。
在使用五折交叉验证对该数据集进行训练的情况下,ViT优于CNN,而无监督的ViT优于有监督和随机初始化的ViT。集成模型在测试集上的受试者工作特征曲线下面积(AuROC)和精确率-召回率曲线下面积(AuPRC)分别为0.977和0.965,优于独立训练的ViT模型的平均AuROC和AuPRC(0.958 ± 0.05和0.931 ± 0.016)。蒸馏的ViT在测试集上的AuROC和AuPRC分别为0.972和0.960。
迁移学习和集成学习各自独立地提供了性能的提升,并且可以顺序地结合在一起,共同改善最终模型的性能。此外,可以通过知识蒸馏训练单个视觉转换器以达到与一组视觉转换器集成模型相匹配的性能。
版权所有 © 2023 北美大学放射科研究协会。Elsevier Inc.发表并保留所有权利。
To develop a deep learning model for the automated classification of breast ultrasound images as benign or malignant. More specifically, the application of vision transformers, ensemble learning, and knowledge distillation is explored for breast ultrasound classification.Single view, B-mode ultrasound images were curated from the publicly available Breast Ultrasound Image (BUSI) dataset, which has categorical ground truth labels (benign vs malignant) assigned by radiologists and malignant cases confirmed by biopsy. The performance of vision transformers (ViT) is compared to convolutional neural networks (CNN), followed by a comparison between supervised, self-supervised, and randomly initialized ViT. Subsequently, the ensemble of 10 independently trained ViT, where the ensemble model is the unweighted average of the output of each individual model is compared to the performance of each ViT alone. Finally, we train a single ViT to emulate the ensembled ViT using knowledge distillation.On this dataset that was trained using five-fold cross validation, ViT outperforms CNN, while self-supervised ViT outperform supervised and randomly initialized ViT. The ensemble model achieves an area under the receiver operating characteristics curve (AuROC) and area under the precision recall curve (AuPRC) of 0.977 and 0.965 on the test set, outperforming the average AuROC and AuPRC of the independently trained ViTs (0.958 ± 0.05 and 0.931 ± 0.016). The distilled ViT achieves an AuROC and AuPRC of 0.972 and 0.960.Both transfer learning and ensemble learning can each offer increased performance independently and can be sequentially combined to collectively improve the performance of the final model. Furthermore, a single vision transformer can be trained to match the performance of an ensemble of a set of vision transformers using knowledge distillation.Copyright © 2023 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.