医生的判断能否增强模型的可信度?预测直肠癌病理淋巴结的案例研究。
Can physician judgment enhance model trustworthiness? A case study on predicting pathological lymph nodes in rectal cancer.
发表日期:2024 Jul 04
作者:
Kazuma Kobayashi, Yasuyuki Takamizawa, Mototaka Miyake, Sono Ito, Lin Gu, Tatsuya Nakatsuka, Yu Akagi, Tatsuya Harada, Yukihide Kanemitsu, Ryuji Hamamoto
来源:
ARTIFICIAL INTELLIGENCE IN MEDICINE
摘要:
可解释性是提高人工智能在医学领域可信度的关键。然而,医生对模型可解释性的期望与这些模型的实际行为之间存在显着差距。这种差距是由于缺乏对以医生为中心的评估框架达成共识而产生的,而该框架是定量评估有效可解释性应为从业者提供的实际好处所必需的。在这里,我们假设高级注意力图作为模型解释的机制,应该与医生关注的信息保持一致,从而可能减少预测的不确定性并提高模型的可靠性。我们采用多模态转换器,利用临床数据和磁共振成像来预测直肠癌的淋巴结转移。我们探索了通过最先进的技术可视化的注意力图如何与医生的理解达成一致。随后,我们比较了两种不同的估计不确定性的方法:仅使用预测概率方差的独立估计,以及同时考虑预测概率方差和量化一致性的人机交互估计。我们的研究结果表明,人机交互方法相对于独立方法没有显着优势。总之,该案例研究并未证实该解释在增强模型可靠性方面的预期益处。肤浅的解释可能会误导医生依赖不确定的预测,弊大于利,这表明在模型可解释性的背景下不应高估当前注意力机制的状态。版权所有 © 2024 作者。由 Elsevier B.V. 出版。保留所有权利。
Explainability is key to enhancing the trustworthiness of artificial intelligence in medicine. However, there exists a significant gap between physicians' expectations for model explainability and the actual behavior of these models. This gap arises from the absence of a consensus on a physician-centered evaluation framework, which is needed to quantitatively assess the practical benefits that effective explainability should offer practitioners. Here, we hypothesize that superior attention maps, as a mechanism of model explanation, should align with the information that physicians focus on, potentially reducing prediction uncertainty and increasing model reliability. We employed a multimodal transformer to predict lymph node metastasis of rectal cancer using clinical data and magnetic resonance imaging. We explored how well attention maps, visualized through a state-of-the-art technique, can achieve agreement with physician understanding. Subsequently, we compared two distinct approaches for estimating uncertainty: a standalone estimation using only the variance of prediction probability, and a human-in-the-loop estimation that considers both the variance of prediction probability and the quantified agreement. Our findings revealed no significant advantage of the human-in-the-loop approach over the standalone one. In conclusion, this case study did not confirm the anticipated benefit of the explanation in enhancing model reliability. Superficial explanations could do more harm than good by misleading physicians into relying on uncertain predictions, suggesting that the current state of attention mechanisms should not be overestimated in the context of model explainability.Copyright © 2024 The Author(s). Published by Elsevier B.V. All rights reserved.