癌症生物标志物发现中特征选择的三重和四重优化。
Triple and quadruple optimization for feature selection in cancer biomarker discovery.
发表日期:2024 Oct 10
作者:
L Cattelani, V Fortino
来源:
JOURNAL OF BIOMEDICAL INFORMATICS
摘要:
组学数据的激增促进了癌症生物标志物的发现,但在外部验证方面往往存在不足,这主要是由于对预测准确性的狭隘关注而忽视了临床实用性和验证可行性。我们引入基于遗传算法的三目标和四目标优化策略,以识别组学研究中临床上可行的生物标志物,解决旨在区分仅组织学分析之外的难以区分的癌症亚型的分类任务。我们的假设是,通过优化癌症生物标志物的多个特征,我们可以识别出能够提高其在外部验证中成功的生物标志物。我们的目标是:(i) 使用机器学习 (ML) 框架评估生物标记物组的准确性; (ii) 确保生物标志物在不同亚型中表现出显着的倍数变化,从而提高 PCR 或免疫组织化学验证的成功率; (iii) 选择一组简洁的生物标志物来简化验证过程并降低临床成本; (iv) 确定对预测总体生存至关重要的生物标志物,这在确定癌症亚型的预后价值方面发挥着重要作用。我们对来自 TCGA 的肾癌基因表达数据实施并应用了三重和四重优化算法。该研究针对的是难以通过组织病理学方法区分的肾癌亚型。根据金标准方法(仅依赖于临床信息)和基于外部微阵列的验证数据集对选定的 RNA-seq 生物标志物进行评估。值得注意的是,这些生物标志物在外部验证中达到了超过 0.8 的准确度,并为生存预测增加了显着的价值,优于单独使用临床数据的 C 指数。提供的工具还有助于探索目标之间的权衡,在进行昂贵的验证或临床试验之前提供多种临床评估解决方案。版权所有 © 2024。由 Elsevier Inc. 出版。
The proliferation of omics data has advanced cancer biomarker discovery but often falls short in external validation, mainly due to a narrow focus on prediction accuracy that neglects clinical utility and validation feasibility. We introduce three- and four-objective optimization strategies based on genetic algorithms to identify clinically actionable biomarkers in omics studies, addressing classification tasks aimed at distinguishing hard-to-differentiate cancer subtypes beyond histological analysis alone. Our hypothesis is that by optimizing more than one characteristic of cancer biomarkers, we may identify biomarkers that will enhance their success in external validation. Our objectives are to: (i) assess the biomarker panel's accuracy using a machine learning (ML) framework; (ii) ensure the biomarkers exhibit significant fold-changes across subtypes, thereby boosting the success rate of PCR or immunohistochemistry validations; (iii) select a concise set of biomarkers to simplify the validation process and reduce clinical costs; and (iv) identify biomarkers crucial for predicting overall survival, which plays a significant role in determining the prognostic value of cancer subtypes. We implemented and applied triple and quadruple optimization algorithms to renal carcinoma gene expression data from TCGA. The study targets kidney cancer subtypes that are difficult to distinguish through histopathology methods. Selected RNA-seq biomarkers were assessed against the gold standard method, which relies solely on clinical information, and in external microarray-based validation datasets. Notably, these biomarkers achieved over 0.8 of accuracy in external validations and added significant value to survival predictions, outperforming the use of clinical data alone with a superior c-index. The provided tool also helps explore the trade-off between objectives, offering multiple solutions for clinical evaluation before proceeding to costly validation or clinical trials.Copyright © 2024. Published by Elsevier Inc.