人工智能在澳大利亚乳腺癌筛查计划的 7533 次连续流行筛查乳房 X 光检查中的表现。
Performance of artificial intelligence in 7533 consecutive prevalent screening mammograms from the BreastScreen Australia program.
发表日期:2023 Nov 13
作者:
John Waugh, Jill Evans, Miranda Miocevic, Darren Lockie, Parisa Aminzadeh, Anne Lynch, Robin J Bell
来源:
EUROPEAN RADIOLOGY
摘要:
旨在评估人工智能 (AI) 算法在澳大利亚乳房 X 光检查筛查计划中的性能,该计划通常使用两个独立的读取器对不一致的结果进行仲裁。2017 年总共 7533 个流行的圆形乳房 X 光检查可供分析。 AI 程序根据乳腺癌 (BC) 风险将乳房 X 光检查分为十分位数。 BC 诊断,包括浸润性 BC (IBC) 和导管原位癌 (DCIS),包括来自流行轮次、间期癌症和两年后下一轮筛查中发现的癌症的诊断。通过敏感性、特异性、阳性和阴性预测值以及放射科医生召回并被 AI 识别为较高风险的女性比例来评估表现。放射科医生确定了 54 名患有 IBC 的女性和 13 名患有 DCIS 的女性,召回率为 9.7%。相比之下,54 例 IBC 中的 51 例和 12/13 例 DCIS 属于较高 AI 评分组(评分 10),召回率相当于 10.6%(差异为 0.9%(CI -0.03 至 1.89%,p = 0.06). 当在 2017 年轮次中识别出 IBC、2017 年被分类为假阴性或具有最小体征的间隔癌症以及 2019 年轮次中的癌症时,放射科医生确定 54/67 和 59/67 属于最高风险 AI类别(灵敏度分别为 80.6% 和 88.06%,这一差异在统计上没有差异)。由于人工智能的性能与专家放射科医生的性能相当,未来人工智能在筛查中的作用可能包括取代一台读片机并支持仲裁、减少工作量和误判。阳性结果。对澳大利亚 BreastScreen 计划的连续流行筛查乳房 X 光检查的人工智能分析表明,该算法能够与经验丰富的放射科医生的癌症检测相匹配,此外还可以识别五种间隔癌症(假阴性)以及大多数假阳性召回。• 人工智能在识别常见病变方面,该程序几乎与放射科医生一样敏感(浸润性乳腺癌为 51/54,包括原位导管癌为 63/67)。 • 如果包括选定的间隔癌症和在后续筛查中发现的癌症,则 AI 程序比放射科医生发现更多的癌症(59/67 与 54/67 相比,敏感性分别为 88.06 % 和 80.6%,p = 0.24)。 • 1-9 分的高阴性预测值表明人工智能可以作为分类工具来降低召回率(特别是误报)。© 2023。作者。
To assess the performance of an artificial intelligence (AI) algorithm in the Australian mammography screening program which routinely uses two independent readers with arbitration of discordant results.A total of 7533 prevalent round mammograms from 2017 were available for analysis. The AI program classified mammograms into deciles on the basis of breast cancer (BC) risk. BC diagnoses, including invasive BC (IBC) and ductal carcinoma in situ (DCIS), included those from the prevalent round, interval cancers, and cancers identified in the subsequent screening round two years later. Performance was assessed by sensitivity, specificity, positive and negative predictive values, and the proportion of women recalled by the radiologists and identified as higher risk by AI.Radiologists identified 54 women with IBC and 13 with DCIS with a recall rate of 9.7%. In contrast, 51 of 54 of the IBCs and 12/13 cases of DCIS were within the higher AI score group (score 10), a recall equivalent of 10.6% (a difference of 0.9% (CI -0.03 to 1.89%, p = 0.06). When IBCs were identified in the 2017 round, interval cancers classified as false negatives or with minimal signs in 2017, and cancers from the 2019 round were combined, the radiologists identified 54/67 and 59/67 were in the highest risk AI category (sensitivity 80.6% and 88.06 % respectively, a difference that was not different statistically).As the performance of AI was comparable to that of expert radiologists, future AI roles in screening could include replacing one reader and supporting arbitration, reducing workload and false positive results.AI analysis of consecutive prevalent screening mammograms from the Australian BreastScreen program demonstrated the algorithm's ability to match the cancer detection of experienced radiologists, additionally identifying five interval cancers (false negatives), and the majority of the false positive recalls.• The AI program was almost as sensitive as the radiologists in terms of identifying prevalent lesions (51/54 for invasive breast cancer, 63/67 when including ductal carcinoma in situ). • If selected interval cancers and cancers identified in the subsequent screening round were included, the AI program identified more cancers than the radiologists (59/67 compared with 54/67, sensitivity 88.06 % and 80.6% respectively p = 0.24). • The high negative predictive value of a score of 1-9 would indicate a role for AI as a triage tool to reduce the recall rate (specifically false positives).© 2023. The Author(s).