基于人工智能的全身骨显像分析：寻找最佳的深度学习算法并与人类观察者表现进行比较。

Artificial intelligence-based analysis of whole-body bone scintigraphy: The quest for the optimal deep learning algorithm and comparison with human observer performance.

Original text

发表日期：2023 Mar 15

作者： Ghasem Hajianfar, Maziar Sabouri, Yazdan Salimi, Mehdi Amini, Soroush Bagheri, Elnaz Jenabi, Sepideh Hekmat, Mehdi Maghsudi, Zahra Mansouri, Maziar Khateri, Mohammad Hosein Jamshidi, Esmail Jafari, Ahmad Bitarafan Rajabi, Majid Assadi, Mehrdad Oveisi, Isaac Shiri, Habib Zaidi

来源： Zeitschrift fur Medizinische Physik

摘要：

全身骨扫描是诊断恶性骨病在早期的最常用方法之一。然而，这个过程费时、需要精力和经验。此外，在疾病早期解读骨扫描的结果可能是具有挑战性的，因为图像的模式通常反映正常外观，容易主观解释。为了简化骨扫描的艰苦、主观和易于出错的解读任务，我们开发了深度学习模型，自动化两种主要分析，即（i）将扫描分类为正常和异常和（ii）区分恶性和非肿瘤性骨疾病，并将它们的性能与人类观察者进行了比较。在从三个不同中心招募的7188名患者中应用我们的排除标准后，分别招募了3772名和2248名患者进行第一和第二分析。数据分成两部分，包括训练和测试，训练数据的一部分用于验证。应用了10个不同的卷积神经网络模型，分别用于单视图和双视图输入（后视和前视）模式，以找到每种分析的最佳模型。此外，还使用了三种不同的方法，包括压缩和激励（SE），空间金字塔池化（SPP）和注意力增强（AA），来聚合双视图输入模型的特征。模型性能通过接受者操作特征（ROC）曲线下面积（AUC），准确性，灵敏度和特异性进行报告，并与ROC曲线上应用的DeLong测试进行比较。三位核医学医师（NMP）对测试数据集进行评估，以比较AI和人类观察者的性能。DenseNet121_AA（采用AA聚合的双视图输入的DensNet121）和InceptionResNetV2_SPP分别在第一和第二分析中取得了最高性能（AUC = 0.72）。此外，在第一分析中，Inception V3和InceptionResNetV2卷积神经网络模型以及采用AA聚合方法的双视图输入表现出优异的性能。此外，在第二分析中，DenseNet121和InceptionResNetV2作为卷积方法和采用AA聚合方法的双视图输入取得了最佳结果。相反，AI模型的性能显著高于第一分析的人类观察者，而它们的性能在第二分析中是可比的，尽管AI模型评估扫描的时间大大缩短。使用本研究设计的模型，可以朝着改进和优化WBS解读迈出积极的一步。通过训练具有更大和更多样化队列的DL模型，AI可以潜在地用于协助医师评估WBS图像。版权所有©2023作者。由Elsevier GmbH出版。保留所有权利。

Whole-body bone scintigraphy (WBS) is one of the most widely used modalities in diagnosing malignant bone diseases during the early stages. However, the procedure is time-consuming and requires vigour and experience. Moreover, interpretation of WBS scans in the early stages of the disorders might be challenging because the patterns often reflect normal appearance that is prone to subjective interpretation. To simplify the gruelling, subjective, and prone-to-error task of interpreting WBS scans, we developed deep learning (DL) models to automate two major analyses, namely (i) classification of scans into normal and abnormal and (ii) discrimination between malignant and non-neoplastic bone diseases, and compared their performance with human observers.After applying our exclusion criteria on 7188 patients from three different centers, 3772 and 2248 patients were enrolled for the first and second analyses, respectively. Data were split into two parts, including training and testing, while a fraction of training data were considered for validation. Ten different CNN models were applied to single- and dual-view input (posterior and anterior views) modes to find the optimal model for each analysis. In addition, three different methods, including squeeze-and-excitation (SE), spatial pyramid pooling (SPP), and attention-augmented (AA), were used to aggregate the features for dual-view input models. Model performance was reported through area under the receiver operating characteristic (ROC) curve (AUC), accuracy, sensitivity, and specificity and was compared with the DeLong test applied to ROC curves. The test dataset was evaluated by three nuclear medicine physicians (NMPs) with different levels of experience to compare the performance of AI and human observers.DenseNet121_AA (DensNet121, with dual-view input aggregated by AA) and InceptionResNetV2_SPP achieved the highest performance (AUC = 0.72) for the first and second analyses, respectively. Moreover, on average, in the first analysis, Inception V3 and InceptionResNetV2 CNN models and dual-view input with AA aggregating method had superior performance. In addition, in the second analysis, DenseNet121 and InceptionResNetV2 as CNN methods and dual-view input with AA aggregating method achieved the best results. Conversely, the performance of AI models was significantly higher than human observers for the first analysis, whereas their performance was comparable in the second analysis, although the AI model assessed the scans in a drastically lower time.Using the models designed in this study, a positive step can be taken toward improving and optimizing WBS interpretation. By training DL models with larger and more diverse cohorts, AI could potentially be used to assist physicians in the assessment of WBS images.Copyright © 2023 The Author(s). Published by Elsevier GmbH.. All rights reserved.