深度学习软件是否能提高不同经验水平的放射科医生对双参数前列腺MRI评估的一致性和表现?
Does deep learning software improve the consistency and performance of radiologists with various levels of experience in assessing bi-parametric prostate MRI?
发表日期:2023 Mar 20
作者:
Aydan Arslan, Deniz Alis, Servet Erdemli, Mustafa Ege Seker, Gokberk Zeybel, Sabri Sirolu, Serpil Kurtcan, Ercan Karaarslan
来源:
Insights into Imaging
摘要:
为了研究商用深度学习(DL)软件是否可以提高不同经验水平的放射科医师在双参数MRI中的前列腺成像与报告数据系统(PI-RADS)评分一致性;评估DL软件是否可以改善放射科医师在鉴别临床有意义的前列腺癌(csPCa)方面的表现。我们回顾性地收录连续进行双参数前列腺MRI的男性,他们的年龄范围在53-80岁之间。其中包括2、3、5和> 20年经验的四名放射科医师对双参数MRI扫描图像进行评估,有和没有DL软件。全切片病理检查或MRI/超声引导活检是参考标准。对于每个放射科医师,使用和不使用DL软件分别计算接收器操作特征(ROC)曲线下面积(AUROC),并使用De Long的检验进行比较。此外,使用kappa统计方法研究了医师之间的一致性。总共有153名男性参加了这项研究,平均年龄为63.59 ± 7.56岁(范围为53-80岁)。在样本中,45名男性(29.80%)患有临床明显的前列腺癌。在使用DL软件阅读时,放射科医师更改了153名患者中1/153(0.65%),2/153(1.3%),0/153(0%)和3/153(1.9%)的初始评分,导致AUROC没有显著增加(p> 0.05)。放射科医师之间的Fleiss' kappa得分分别为0.39和0.40,在DL软件和没有DL软件的情况下(p = 0.56)。商用DL软件不能提高不同经验水平的放射科医师在双参数PI-RADS评分或csPCa检测性能上的一致性。 © 2023. The Author(s)。
To investigate whether commercially available deep learning (DL) software improves the Prostate Imaging-Reporting and Data System (PI-RADS) scoring consistency on bi-parametric MRI among radiologists with various levels of experience; to assess whether the DL software improves the performance of the radiologists in identifying clinically significant prostate cancer (csPCa).We retrospectively enrolled consecutive men who underwent bi-parametric prostate MRI at a 3 T scanner due to suspicion of PCa. Four radiologists with 2, 3, 5, and > 20 years of experience evaluated the bi-parametric prostate MRI scans with and without the DL software. Whole-mount pathology or MRI/ultrasound fusion-guided biopsy was the reference. The area under the receiver operating curve (AUROC) was calculated for each radiologist with and without the DL software and compared using De Long's test. In addition, the inter-rater agreement was investigated using kappa statistics.In all, 153 men with a mean age of 63.59 ± 7.56 years (range 53-80) were enrolled in the study. In the study sample, 45 men (29.80%) had clinically significant PCa. During the reading with the DL software, the radiologists changed their initial scores in 1/153 (0.65%), 2/153 (1.3%), 0/153 (0%), and 3/153 (1.9%) of the patients, yielding no significant increase in the AUROC (p > 0.05). Fleiss' kappa scores among the radiologists were 0.39 and 0.40 with and without the DL software (p = 0.56).The commercially available DL software does not increase the consistency of the bi-parametric PI-RADS scoring or csPCa detection performance of radiologists with varying levels of experience.© 2023. The Author(s).