研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

一项前瞻性临床研究中,皮肤科医生合作卷积神经网络的诊断表现评估:人机协作。

Assessment of Diagnostic Performance of Dermatologists Cooperating With a Convolutional Neural Network in a Prospective Clinical Study: Human With Machine.

发表日期:2023 May 03
作者: Julia K Winkler, Andreas Blum, Katharina Kommoss, Alexander Enk, Ferdinand Toberer, Albert Rosenberger, Holger A Haenssle
来源: JAMA Dermatology

摘要:

研究表明卷积神经网络(CNNs)在皮肤病变分类任务中的表现与经过训练的皮肤科医师相当。虽然第一个神经网络已获得临床使用批准,但缺乏证明人与机器合作获益的前瞻性研究。为评估皮肤科医师是否受益于与市场批准的CNN合作对黑色素痣进行分类的研究。在这个前瞻性诊断性的两个中心的研究中,皮肤科医师使用肉眼检查和皮肤镜进行皮肤癌筛查。皮肤科医师根据恶性概率对可疑黑素瘤进行分级(范围为0-1,恶性阈值≥0.5),并指示管理决策(无行动,随访,切除)。接下来,市场批准的CNN Moleanalyzer Pro(FotoFinder Systems)对可疑病变的皮肤镜图像进行评估。CNN恶性得分(范围为0-1,恶性阈值≥0.5)被转移到皮肤科医师,并要求他们在考虑CNN结果的情况下重新评估病变和修订初步决策。125个(54.8%)病变的参考诊断基于组织病理学检查,或在未切除病变的情况下,根据临床随访数据和专家共识。数据收集自2020年10月至2021年10月。主要结果措施是皮肤科医师单独和与CNN合作的诊断敏感性和特异性。准确性和接收器操作特征曲线下面积(ROC AUC)被视为附加措施。共有22名皮肤科医师在188名患者(平均年龄[范围]为53.4 [19-91]岁;97 [51.6%]男性患者)中检测到228个可疑黑素瘤病变(190个痣,38个黑素瘤)。当皮肤科医师将CNN结果纳入决策时,诊断敏感性和特异性显着提高(平均敏感性从84.2%[95%CI,69.6%-92.6%]增加到100.0%[95%CI,90.8%-100.0%]; P = .03; 平均特异性从72.1%[95%CI,65.3%-78.0%]增加到83.7%[95%CI,77.8%-88.3%]; P <.001; 平均准确性从74.1%[95%CI,68.1%-79.4%]增加到86.4%[95%CI,81.3%-90.3%]; P <.001; 平均ROC AUC从0.895 [95%CI,0.836-0.954]增加到0.968 [95%CI,0.948-0.988]; P = .005)。此外,与单独的皮肤科医师相比,该CNN在分类黑色素痣时的敏感性可比,特异性更高,诊断准确性更高。此外,当皮肤科医师与CNN合作时,良性痣的不必要切除减少了19.2%,从190个良性痣的104个(54.7%)减少到84个痣(P <.001)。大多数病变是由经验不到2年的皮肤科医生(78名,34.2%)或2-5年的皮肤科医生(96名,42.1%)检查的;其他人(54人,23.7%)是由超过5年经验的医师评估的。与更有经验的皮肤科医师相比,与CNN合作的皮肤科医师在诊断上的提高最为显著。在这个前瞻性诊断性研究中,这些发现表明,皮肤科医师在与市场批准的CNN合作时可能提高其表现,这种人与机器的广泛应用可能对皮肤科医师和患者都有益。
Studies suggest that convolutional neural networks (CNNs) perform equally to trained dermatologists in skin lesion classification tasks. Despite the approval of the first neural networks for clinical use, prospective studies demonstrating benefits of human with machine cooperation are lacking.To assess whether dermatologists benefit from cooperation with a market-approved CNN in classifying melanocytic lesions.In this prospective diagnostic 2-center study, dermatologists performed skin cancer screenings using naked-eye examination and dermoscopy. Dermatologists graded suspect melanocytic lesions by the probability of malignancy (range 0-1, threshold for malignancy ≥0.5) and indicated management decisions (no action, follow-up, excision). Next, dermoscopic images of suspect lesions were assessed by a market-approved CNN, Moleanalyzer Pro (FotoFinder Systems). The CNN malignancy scores (range 0-1, threshold for malignancy ≥0.5) were transferred to dermatologists with the request to re-evaluate lesions and revise initial decisions in consideration of CNN results. Reference diagnoses were based on histopathologic examination in 125 (54.8%) lesions or, in the case of nonexcised lesions, on clinical follow-up data and expert consensus. Data were collected from October 2020 to October 2021.Primary outcome measures were diagnostic sensitivity and specificity of dermatologists alone and dermatologists cooperating with the CNN. Accuracy and receiver operator characteristic area under the curve (ROC AUC) were considered as additional measures.A total of 22 dermatologists detected 228 suspect melanocytic lesions (190 nevi, 38 melanomas) in 188 patients (mean [range] age, 53.4 [19-91] years; 97 [51.6%] male patients). Diagnostic sensitivity and specificity significantly improved when dermatologists additionally integrated CNN results into decision-making (mean sensitivity from 84.2% [95% CI, 69.6%-92.6%] to 100.0% [95% CI, 90.8%-100.0%]; P = .03; mean specificity from 72.1% [95% CI, 65.3%-78.0%] to 83.7% [95% CI, 77.8%-88.3%]; P < .001; mean accuracy from 74.1% [95% CI, 68.1%-79.4%] to 86.4% [95% CI, 81.3%-90.3%]; P < .001; and mean ROC AUC from 0.895 [95% CI, 0.836-0.954] to 0.968 [95% CI, 0.948-0.988]; P = .005). In addition, the CNN alone achieved a comparable sensitivity, higher specificity, and higher diagnostic accuracy compared with dermatologists alone in classifying melanocytic lesions. Moreover, unnecessary excisions of benign nevi were reduced by 19.2%, from 104 (54.7%) of 190 benign nevi to 84 nevi when dermatologists cooperated with the CNN (P < .001). Most lesions were examined by dermatologists with 2 to 5 years (96, 42.1%) or less than 2 years of experience (78, 34.2%); others (54, 23.7%) were evaluated by dermatologists with more than 5 years of experience. Dermatologists with less dermoscopy experience cooperating with the CNN had the most diagnostic improvement compared with more experienced dermatologists.In this prospective diagnostic study, these findings suggest that dermatologists may improve their performance when they cooperate with the market-approved CNN and that a broader application of this human with machine approach could be beneficial for dermatologists and patients.