皮肤组织学图像的注释协议和众包多实例学习分类:CR-AI4SkIN 数据集。
Annotation protocol and crowdsourcing multiple instance learning classification of skin histological images: The CR-AI4SkIN dataset.
发表日期:2023 Nov
作者:
Rocío Del Amor, Jose Pérez-Cano, Miguel López-Pérez, Liria Terradez, Jose Aneiros-Fernandez, Sandra Morales, Javier Mateos, Rafael Molina, Valery Naranjo
来源:
ARTIFICIAL INTELLIGENCE IN MEDICINE
摘要:
数字病理学(DP)近年来经历了显着增长,已成为肿瘤诊断和预后的重要工具。全切片图像 (WSI) 的可用性和深度学习 (DL) 算法的实施为支持诊断过程的人工智能 (AI) 系统的出现铺平了道路。这些系统需要广泛且多样化的数据才能成功进行培训。然而,在组织病理学中创建标记数据集既费力又耗时。我们开发了一种众包多实例标记/学习协议,应用于 CR-AI4SkIN 数据集的创建和使用。2 CR-AI4SkIN 包含 7 种皮肤梭形细胞 (CSC) 肿瘤的 271 个 WSI,带有专家和非专家标签在区域和 WSI 层面。这是此类肿瘤的第一个数据集。专家选择的区域用于学习从 WSI 中自动提取感兴趣区域 (ROI)。为了产生每个 WSI 的嵌入,使用对比学习方法获得 ROI 内斑块的表示,然后进行组合。最后,它们被输入基于高斯过程的众包分类器,该分类器利用嘈杂的非专家 WSI 标签。我们在 CR-AI4SkIN 数据集中验证了我们的众包多实例学习方法,解决了二元分类问题(恶性与良性)。该方法在测试集上获得了 0.7911 的 F1 分数,优于三种广泛使用的众包任务聚合方法。此外,我们的众包方法在测试集上也优于带有专家标签的监督模型(F1-score = 0.6035)。有希望的结果支持所提出的众包多实例学习注释协议。它还验证了兴趣区域的自动提取以及使用对比嵌入和高斯过程分类来执行众包分类任务。版权所有 © 2023 Elsevier B.V. 保留所有权利。
Digital Pathology (DP) has experienced a significant growth in recent years and has become an essential tool for diagnosing and prognosis of tumors. The availability of Whole Slide Images (WSIs) and the implementation of Deep Learning (DL) algorithms have paved the way for the appearance of Artificial Intelligence (AI) systems that support the diagnosis process. These systems require extensive and varied data for their training to be successful. However, creating labeled datasets in histopathology is laborious and time-consuming. We have developed a crowdsourcing-multiple instance labeling/learning protocol that is applied to the creation and use of the CR-AI4SkIN dataset.2 CR-AI4SkIN contains 271 WSIs of 7 Cutaneous Spindle Cell (CSC) neoplasms with expert and non-expert labels at region and WSI levels. It is the first dataset of these types of neoplasms made available. The regions selected by the experts are used to learn an automatic extractor of Regions of Interest (ROIs) from WSIs. To produce the embedding of each WSI, the representations of patches within the ROIs are obtained using a contrastive learning method, and then combined. Finally, they are fed to a Gaussian process-based crowdsourcing classifier, which utilizes the noisy non-expert WSI labels. We validate our crowdsourcing-multiple instance learning method in the CR-AI4SkIN dataset, addressing a binary classification problem (malign vs. benign). The proposed method obtains an F1 score of 0.7911 on the test set, outperforming three widely used aggregation methods for crowdsourcing tasks. Furthermore, our crowdsourcing method also outperforms the supervised model with expert labels on the test set (F1-score = 0.6035). The promising results support the proposed crowdsourcing multiple instance learning annotation protocol. It also validates the automatic extraction of interest regions and the use of contrastive embedding and Gaussian process classification to perform crowdsourcing classification tasks.Copyright © 2023 Elsevier B.V. All rights reserved.