研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

使用深度学习和可解释的人工智能对慢性淋巴细胞性白血病患者进行基因筛选,以预测治疗的需求和时间。

Genes selection using deep learning and explainable artificial intelligence for chronic lymphocytic leukemia predicting the need and time to therapy.

发表日期:2023
作者: Fortunato Morabito, Carlo Adornetto, Paola Monti, Adriana Amaro, Francesco Reggiani, Monica Colombo, Yissel Rodriguez-Aldana, Giovanni Tripepi, Graziella D'Arrigo, Claudia Vener, Federica Torricelli, Teresa Rossi, Antonino Neri, Manlio Ferrarini, Giovanna Cutrona, Massimo Gentile, Gianluigi Greco
来源: Epigenetics & Chromatin

摘要:

通过人工智能分析基因表达谱(GEP)可为癌症疾病提供有意义的见解。本研究介绍了基于深度学习和可解释人工智能的DeepSHAP自动编码器特征选择方法(DSAF-GS),用于选择基因组规模数据中的特征。DSAF-GS利用自动编码器的重构能力而不改变原始特征空间,增强了结果的解释性。然后,使用可解释人工智能从包含大约20,000个基因的GEP数据库中选择与慢性淋巴细胞白血病预后有关的信息基因,共计217例。预后预测模型的准确率为1,2%、敏感性为1,2%、特异性为1,2%。根据所提出的方法,预测结果受CEACAM19和PIGP的强烈影响,受MKL1和GNE的中度影响,其他基因的影响较小。选取前10个最有影响力的基因进行进一步分析。其中,Reactome通路数据库鉴定出的FADD、FIBP、FIBP、GNE、IGF1R、MKL1、PIGP和SLC39A6参与了信号传导、转录、蛋白代谢、免疫系统、细胞周期和凋亡。此外,通过使用NetworkAnalyst工具探索的3D蛋白质相互作用(PPI)网络模型,发现FADD、FIBP、IGF1R、QTRT1、GNE、SLC39A6和MKL1相互耦合成一个复杂网络。最后,所有选定的10个基因在基于IGHV突变状态、del(11q)和del(17p)、NOTCH1突变、β2微球蛋白、Rai分期和B淋巴细胞增多等已知预测TTFT的基本预后模型的单变量分析中均表现出预测能力。然而,当与基本模型的预后因素结合时,仅有IGF1R [风险比(HR)1.41,95%CI 1.08-1.84,P=0.013)、COL28A1(HR 0.32, 95% CI 0.10-0.97, P=0.045)和QTRT1(HR 7.73,95%CI 2.48-24.04,P<0.001)基因在多变量分析中与TTFT显著相关,最终将Harrell's c指数和解释的变异度分别提高到78.6%(基本预后模型的76.5%)和52.6%(基本预后模型的42.2%)。而且,模型拟合性能得到了提升(χ2 = 20.1,P=0.002),表明其高于基本预后模型的表现。总之,DSAF-GS确定了一组对于CLL预后显著的基因,为生物分子研究的未来方向提供了建议。
Analyzing gene expression profiles (GEP) through artificial intelligence provides meaningful insight into cancer disease. This study introduces DeepSHAP Autoencoder Filter for Genes Selection (DSAF-GS), a novel deep learning and explainable artificial intelligence-based approach for feature selection in genomics-scale data. DSAF-GS exploits the autoencoder's reconstruction capabilities without changing the original feature space, enhancing the interpretation of the results. Explainable artificial intelligence is then used to select the informative genes for chronic lymphocytic leukemia prognosis of 217 cases from a GEP database comprising roughly 20,000 genes. The model for prognosis prediction achieved an accuracy of 86.4%, a sensitivity of 85.0%, and a specificity of 87.5%. According to the proposed approach, predictions were strongly influenced by CEACAM19 and PIGP, moderately influenced by MKL1 and GNE, and poorly influenced by other genes. The 10 most influential genes were selected for further analysis. Among them, FADD, FIBP, FIBP, GNE, IGF1R, MKL1, PIGP, and SLC39A6 were identified in the Reactome pathway database as involved in signal transduction, transcription, protein metabolism, immune system, cell cycle, and apoptosis. Moreover, according to the network model of the 3D protein-protein interaction (PPI) explored using the NetworkAnalyst tool, FADD, FIBP, IGF1R, QTRT1, GNE, SLC39A6, and MKL1 appear coupled into a complex network. Finally, all 10 selected genes showed a predictive power on time to first treatment (TTFT) in univariate analyses on a basic prognostic model including IGHV mutational status, del(11q) and del(17p), NOTCH1 mutations, β2-microglobulin, Rai stage, and B-lymphocytosis known to predict TTFT in CLL. However, only IGF1R [hazard ratio (HR) 1.41, 95% CI 1.08-1.84, P=0.013), COL28A1 (HR 0.32, 95% CI 0.10-0.97, P=0.045), and QTRT1 (HR 7.73, 95% CI 2.48-24.04, P<0.001) genes were significantly associated with TTFT in multivariable analyses when combined with the prognostic factors of the basic model, ultimately increasing the Harrell's c-index and the explained variation to 78.6% (versus 76.5% of the basic prognostic model) and 52.6% (versus 42.2% of the basic prognostic model), respectively. Also, the goodness of model fit was enhanced (χ2 = 20.1, P=0.002), indicating its improved performance above the basic prognostic model. In conclusion, DSAF-GS identified a group of significant genes for CLL prognosis, suggesting future directions for bio-molecular research.Copyright © 2023 Morabito, Adornetto, Monti, Amaro, Reggiani, Colombo, Rodriguez-Aldana, Tripepi, D’Arrigo, Vener, Torricelli, Rossi, Neri, Ferrarini, Cutrona, Gentile and Greco.