高效有效的主动学习用于基于深度学习的组织图像分析。
Effective and Efficient Active Learning for Deep Learning Based Tissue Image Analysis.
发表日期:2023 Mar 21
作者:
André L S Meirelles, Tahsin Kurc, Jun Kong, Renato Ferreira, Joel Saltz, George Teodoro
来源:
BIOINFORMATICS
摘要:
近期,深度学习在数字病理领域取得了卓越的成果。然而,使用它的一个挑战是需要高质量、代表性的训练数据集来构建健壮的模型。在该领域中,数据注释工作需要大量的人力和时间,需要专业病理学家的参与。主动学习(AL)是一种减少注释的策略,目标是从未标注数据池中选择样本进行注释,以提高模型准确性。然而,AL是一种计算量极大的方法。对于学习模型的受益,其策略的使用可能会有所不同,而且对于领域专家来说,没有集成界面的解决方案可能很难进行优化。我们开发了一个框架,该框架包括友好的用户界面和运行时优化,以减少数字病理中AL中的注释和执行时间。我们的解决方案实现了几种AL策略,并采用了我们的“多样性感知数据获取”(DADA)采集函数,该函数强制数据多样性以提高模型预测性能。在这项工作中,我们采用了一种模型简化策略(网络自动缩减(NAR)),该策略与DADA相结合可以显著提高AL的执行时间。NAR生成较少的计算量模型,这些模型在AL过程中替换目标模型以减少处理需求。通过对肿瘤浸润淋巴细胞(TILs)分类应用进行评估,可以得出以下结论:(i)DADA比不同卷积神经网络(CNNs)的最新AL策略具有更高的性能;(ii)NAR将AL执行时间提高高达4.3×;(iii)使用经NAR简化的版本选择的补丁/数据的目标模型实现了类似或更优的分类质量,与使用目标CNNs进行数据选择相比。源代码:https://github.com/alsmeirelles/DADA。补充数据可在Bioinformatics在线获取。©作者(们)2023年发表于牛津大学出版社。
Deep learning attained excellent results in Digital Pathology recently. A challenge with its use is that high quality, representative training data sets are required to build robust models. Data annotation in the domain is labor intensive and demands substantial time commitment from expert pathologists. Active Learning (AL) is a strategy to minimize annotation. The goal is to select samples from the pool of unlabeled data for annotation that improves model accuracy. However, AL is a very compute demanding approach. The benefits for model learning may vary according to the strategy used, and it may be hard for a domain specialist to fine tune the solution without an integrated interface.We developed a framework that includes a friendly user interface along with run-time optimizations to reduce annotation and execution time in AL in digital pathology. Our solution implements several AL strategies along with our Diversity-Aware Data Acquisition (DADA) acquisition function, which enforces data diversity to improve the prediction performance of a model. In this work, we employed a model simplification strategy (Network Auto-Reduction (NAR)) that significantly improves AL execution time when coupled with DADA. NAR produces less compute demanding models, which replace the target models during the AL process to reduce processing demands. An evaluation with a Tumor-Infiltrating Lymphocytes (TILs) classification application shows that: (i) DADA attains superior performance compared to state-of-the-art AL strategies for different Convolutional Neural Networks (CNNs), (ii) NAR improves the AL execution time by up to 4.3 ×, and (iii) target models trained with patches/data selected by the NAR reduced versions achieve similar or superior classification quality to using target CNNs for data selection.Source code: https://github.com/alsmeirelles/DADA.Supplementary data are available at Bioinformatics online.© The Author(s) 2023. Published by Oxford University Press.