AI 衍生的对乳腺癌基因错义变异致病性预测工具性能的比较评估。
AI-derived comparative assessment of the performance of pathogenicity prediction tools on missense variants of breast cancer genes.
发表日期:2024 Sep 11
作者:
Rahaf M Ahmad, Bassam R Ali, Fatma Al-Jasmi, Noura Al Dhaheri, Saeed Al Turki, Praseetha Kizhakkedath, Mohd Saberi Mohamad
来源:
Human Genomics
摘要:
单核苷酸变异 (SNV) 可以对各种细胞功能产生重大且极其可变的影响,这使得准确预测其后果具有挑战性,尽管这一点至关重要,尤其是在肿瘤学等临床环境中。用于评估这些影响的基于实验室的实验方法非常耗时且通常不切实际,这凸显了计算机工具对于变量影响预测的重要性。然而,目前可用的基准数据库中乳腺癌错义变异工具的性能指标尚未得到彻底研究,在准确预测致病性方面造成了知识差距。在本研究中,基准数据集 ClinVar 和 HGMD 用于评估 21 种人工智能 (AI) 衍生的计算机工具。乳腺癌基因中的错义变异是从 ClinVar 和 HGMD professional v2023.1 中提取的。 HGMD 数据集仅关注致病性变异,为了确保平衡,ClinVar 数据库中包含了相同基因的良性变异。有趣的是,我们对这两个数据集的分析揭示了具有不同外显率水平的基因变异,例如低外显率水平、中外显率水平和高外显率水平,这增强了特定疾病工具的价值。 ClinVar 数据集上表现最好的工具是 MutPred (Accuracy = 0.73)、Meta-RNN (Accuracy = 0.72)、ClinPred (Accuracy = 0.71)、Meta-SVM、REVEL 和 Fathmm-XF (Accuracy = 0.70)。在 HGMD 数据集上,它们是 ClinPred (Accuracy = 0.72)、MetaRNN (Accuracy = 0.71)、CADD (Accuracy = 0.69)、Fathmm-MKL (Accuracy = 0.68) 和 Fathmm-XF (Accuracy = 0.67)。这些发现为临床医生和研究人员选择、改进和开发有效的乳腺癌致病性预测计算机工具提供了宝贵的见解。弥合这一知识差距有助于推进精准医学并增强乳腺癌患者的诊断和治疗方法,并对其他疾病具有潜在影响。© 2024。作者。
Single nucleotide variants (SNVs) can exert substantial and extremely variable impacts on various cellular functions, making accurate predictions of their consequences challenging, albeit crucial especially in clinical settings such as in oncology. Laboratory-based experimental methods for assessing these effects are time-consuming and often impractical, highlighting the importance of in-silico tools for variant impact prediction. However, the performance metrics of currently available tools on breast cancer missense variants from benchmarking databases have not been thoroughly investigated, creating a knowledge gap in the accurate prediction of pathogenicity. In this study, the benchmarking datasets ClinVar and HGMD were used to evaluate 21 Artificial Intelligence (AI)-derived in-silico tools. Missense variants in breast cancer genes were extracted from ClinVar and HGMD professional v2023.1. The HGMD dataset focused on pathogenic variants only, to ensure balance, benign variants for the same genes were included from the ClinVar database. Interestingly, our analysis of both datasets revealed variants across genes with varying penetrance levels like low and moderate in addition to high, reinforcing the value of disease-specific tools. The top-performing tools on ClinVar dataset identified were MutPred (Accuracy = 0.73), Meta-RNN (Accuracy = 0.72), ClinPred (Accuracy = 0.71), Meta-SVM, REVEL, and Fathmm-XF (Accuracy = 0.70). While on HGMD dataset they were ClinPred (Accuracy = 0.72), MetaRNN (Accuracy = 0.71), CADD (Accuracy = 0.69), Fathmm-MKL (Accuracy = 0.68), and Fathmm-XF (Accuracy = 0.67). These findings offer clinicians and researchers valuable insights for selecting, improving, and developing effective in-silico tools for breast cancer pathogenicity prediction. Bridging this knowledge gap contributes to advancing precision medicine and enhancing diagnostic and therapeutic approaches for breast cancer patients with potential implications for other conditions.© 2024. The Author(s).