研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

癌症患者DNA甲基化预后生物标记的机器学习研究:一个基于表观遗传组学的系统综述。

Machine learning in the identification of prognostic DNA methylation biomarkers among patients with cancer: A systematic review of epigenome-wide studies.

发表日期:2023 Sep
作者: Tanwei Yuan, Dominic Edelmann, Ziwen Fan, Elizabeth Alwers, Jakob Nikolas Kather, Hermann Brenner, Michael Hoffmeister
来源: ARTIFICIAL INTELLIGENCE IN MEDICINE

摘要:

DNA甲基化生物标志物在改善癌症患者预后分类系统方面具有巨大潜力。基于机器学习(ML)的分析技术可能有助于克服分析高维数据在相对较小样本大小上的挑战。本系统综述总结了目前在表观基因组范围内的ML方法在鉴定与癌症预后相关的DNA甲基化特征上的应用。我们检索了PubMed、EMBASE和Web of Science等三个电子数据库,收录了截至2023年1月2日发表的相关文章。提取和总结使用ML方法和工作流来识别与癌症预后相关的DNA甲基化特征。两名作者独立使用从《评估预测模型研究风险偏倚和适用性的工具(PROBAST)》和《肿瘤标记物预后研究的报告推荐(REMARK)》中改编的七项检查表来评估纳入研究的方法学质量。所纳入研究使用的不同ML方法和工作流程分别通过旭光图、气泡图和桑基图进行总结和可视化。本综述共纳入83项研究。确定了三种主要类型的ML工作流:1)无监督聚类,2)有监督特征选择,3)基于深度学习的特征转换。对于这三种工作流程,最常用的ML技术分别是共识聚类、最小绝对值缩减和选择算子(LASSO)和自编码器。系统综述揭示了这些方法的性能尚未得到充分评估,并且在使用ML技术的纳入研究中普遍存在方法和报告缺陷。在表观基因组范围内使用ML方法鉴定与癌症预后相关的DNA甲基化标记物的方法学策略存在巨大的异质性。从理论上讲,大多数现有工作流程无法处理表观基因组范围内DNA甲基化数据中的高度多重共线性和潜在的非线性相互作用。需要进行基准研究,比较各种方法在特定癌症类型中的相对性能。迫切需要遵守相关的方法和报告指南。版权所有©2023年作者。由Elsevier B.V.出版。保留所有权利。
DNA methylation biomarkers have great potential in improving prognostic classification systems for patients with cancer. Machine learning (ML)-based analytic techniques might help overcome the challenges of analyzing high-dimensional data in relatively small sample sizes. This systematic review summarizes the current use of ML-based methods in epigenome-wide studies for the identification of DNA methylation signatures associated with cancer prognosis.We searched three electronic databases including PubMed, EMBASE, and Web of Science for articles published until 2 January 2023. ML-based methods and workflows used to identify DNA methylation signatures associated with cancer prognosis were extracted and summarized. Two authors independently assessed the methodological quality of included studies by a seven-item checklist adapted from 'A Tool to Assess Risk of Bias and Applicability of Prediction Model Studies (PROBAST)' and from the 'Reporting Recommendations for Tumor Marker Prognostic Studies (REMARK). Different ML methods and workflows used in included studies were summarized and visualized by a sunburst chart, a bubble chart, and Sankey diagrams, respectively.Eighty-three studies were included in this review. Three major types of ML-based workflows were identified. 1) unsupervised clustering, 2) supervised feature selection, and 3) deep learning-based feature transformation. For the three workflows, the most frequently used ML techniques were consensus clustering, least absolute shrinkage and selection operator (LASSO), and autoencoder, respectively. The systematic review revealed that the performance of these approaches has not been adequately evaluated yet and that methodological and reporting flaws were common in the identified studies using ML techniques.There is great heterogeneity in ML-based methodological strategies used by epigenome-wide studies to identify DNA methylation markers associated with cancer prognosis. In theory, most existing workflows could not handle the high multi-collinearity and potentially non-linearity interactions in epigenome-wide DNA methylation data. Benchmarking studies are needed to compare the relative performance of various approaches for specific cancer types. Adherence to relevant methodological and reporting guidelines are urgently needed.Copyright © 2023 The Authors. Published by Elsevier B.V. All rights reserved.