用于皮肤颜色为色素沉着的人群中分类色素性皮肤病的人工智能:系统综述。
Artificial intelligence for the classification of pigmented skin lesions in populations with skin of colour: A systematic review.
发表日期:2023 Mar 21
作者:
Yuyang Liu, Clare A Primiero, Vishnutheertha Kulkarni, H Peter Soyer, Brigid Betz-Stablein
来源:
DERMATOLOGY
摘要:
背景:虽然肤色较深的人罹患皮肤癌的概率较低,但他们更容易在较晚的阶段被诊断出患有皮肤癌,并具有更差的预后。人工智能(AI)模型的应用可以潜在地改善对皮肤癌的早期检测,但是训练数据集中缺乏肤色多样性可能会扩大皮肤科中现有的种族差异。
目的:系统地回顾使用在肤色多样性人群中进行训练或测试的AI模型,针对色素性皮损的分类技术、质量、准确性和影响的研究。
方法:利用PubMed检索所有描述色素性皮损分类的AI模型的研究。只有使用至少有10%肤色多样性人群图像的训练数据集的研究符合条件。回顾了研究人群、AI模型的设计、准确性和研究质量的结果。
结果:共鉴定了22篇符合条件的文章。大多数研究是在中国(7/22)、韩国(5/22)和日本(3/22)人群的数据集上进行的。7项研究使用了包括Fitzpatrick皮肤类型I-III在内的多样化数据集,同时至少有10%的黑人、土著美国人、太平洋岛民或Fitzpatrick IV-VI的数据。产生二元结果(例如良性与恶性)的AI模型报告准确率从70%到99.7%不等。报告多类结果(例如特定病变诊断)的AI模型的准确率较低,从43%到93%不等。读者研究中,皮肤科医师的分类与AI模型结果相比较,一项研究报告了类似的准确率,三项研究报告了更高的AI准确率,两项研究报告了更高的临床医生准确率。质量审查显示,数据集描述和多样性、基准、公共评估和医疗应用时常未被考虑。
结论:虽然此次综述提供了在肤色多样性人群中准确的AI模型的有利证据,但是在肤色多样性人群(特别是Fitzpatrick IV-VI)中研发AI模型的数量仍存在巨大差异,与欧洲血统人群相比。公开可用的来自不同人群的数据集的缺乏可能是其影响的一个因素,另一个原因是训练数据集中关于患者肤色的元数据报告不充分。作者(们)。由S. Karger AG,巴塞尔发表。
Background While skin cancers are less prevalent in people with skin of color, they are more often diagnosed at later stages and have a poorer prognosis. The use of artificial intelligence (AI) models can potentially improve early detection of skin cancers, however the lack of skin color diversity in training datasets may only widen the pre-existing racial discrepancies in dermatology. Objective To systematically review the technique, quality, accuracy, and implications of studies using AI models trained or tested in populations with skin of color, for classification of pigmented skin lesions. Methods PubMed was used to identify any studies describing AI models for classification of pigmented skin lesions. Only studies that used training datasets with at least 10% of images from people with skin of color were eligible. Outcomes on study population, design of AI model, accuracy, and quality of the studies were reviewed. Results Twenty-two eligible articles were identified. Majority of studies were trained on datasets obtained from Chinese (7/22), Korean (5/22), and Japanese populations (3/22). Seven studies used diverse datasets containing Fitzpatrick skin type I-III in combination with at least 10% from Black American, Native American, Pacific Islander or Fitzpatrick IV-VI. AI models producing binary outcomes (e.g., benign vs malignant) reported an accuracy ranging from 70% to 99.7%. Accuracy of AI models reporting multiclass outcomes (e.g., specific lesion diagnosis) was lower, ranging from 43% to 93%. Reader studies, where dermatologists' classification is compared with AI model outcomes, reported similar accuracy in one study, higher AI accuracy in three studies, and higher clinician accuracy in two studies. A quality review revealed that dataset description and variety, benchmarking, public evaluation, and healthcare application were frequently not addressed. Conclusions While this review provides promising evidence of accurate AI models in skin of color populations, there are still large discrepancies remain in the number of AI models developed in populations with skin of color (particularly Fitzpatrick type IV-VI) and those with largely European ancestry. A lack of publicly available datasets from diverse populations is likely a contributing factor, as is the inadequate reporting of patient-level metadata relating to skin color in training datasets.The Author(s). Published by S. Karger AG, Basel.