利用合成数据进行隐私增强和泛化深度学习,用于纵隔肿瘤诊断。
Privacy enhancing and generalizable deep learning with synthetic data for mediastinal neoplasm diagnosis.
发表日期:2024 Oct 20
作者:
Zhanping Zhou, Yuchen Guo, Ruijie Tang, Hengrui Liang, Jianxing He, Feng Xu
来源:
Disease Models & Mechanisms
摘要:
深度学习 (DL) 的成功在很大程度上依赖于 DL 模型封装信息的训练数据。因此,深度学习模型的开发和部署会使数据面临潜在的隐私泄露,这在医学等数据敏感的环境中尤其重要。我们提出了一种名为 DiffGuard 的新技术,它可以生成真实且多样化的带有注释的合成医学图像,甚至专家无法区分,以取代真实数据进行 DL 模型训练,从而切断它们的直接联系并增强隐私安全。我们证明,DiffGuard 可以显着减少数据泄露,并更好地抵御对数据和模型的隐私攻击,从而增强隐私安全。它还提高了多中心评估中纵隔肿瘤分割和分类的深度学习模型的准确性和通用性。我们希望我们的解决方案能够为精准医疗的隐私保护深度学习指明道路,促进数据和模型共享,并激发医学人工智能生成内容技术的更多创新。© 2024。作者。
The success of deep learning (DL) relies heavily on training data from which DL models encapsulate information. Consequently, the development and deployment of DL models expose data to potential privacy breaches, which are particularly critical in data-sensitive contexts like medicine. We propose a new technique named DiffGuard that generates realistic and diverse synthetic medical images with annotations, even indistinguishable for experts, to replace real data for DL model training, which cuts off their direct connection and enhances privacy safety. We demonstrate that DiffGuard enhances privacy safety with much less data leakage and better resistance against privacy attacks on data and model. It also improves the accuracy and generalizability of DL models for segmentation and classification of mediastinal neoplasms in multi-center evaluation. We expect that our solution would enlighten the road to privacy-preserving DL for precision medicine, promote data and model sharing, and inspire more innovation on artificial-intelligence-generated-content technologies for medicine.© 2024. The Author(s).