研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

OncoRTT:使用BERT嵌入和组学特征预测新型肿瘤治疗靶点。

OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features.

发表日期:2023
作者: Maha A Thafar, Somayah Albaradei, Mahmut Uludag, Mona Alshahrani, Takashi Gojobori, Magbubah Essack, Xin Gao
来源: Frontiers in Genetics

摘要:

药物开发的后期失败通常是无效靶点的结果。因此需要进行适当的靶点识别,这可能是通过计算方法实现的。原因是,有效的靶点具有与疾病相关的生物学功能,而组学数据揭示了参与这些功能的蛋白质。此外,蛋白质的氨基酸序列可推断出有利于药物与靶点之间的结合的性质。在这项工作中,我们开发了一种基于深度学习(DL)的方法OncoRTT,用于预测新的治疗靶点。OncoRTT旨在通过使用DL方法基于已知有效靶点的特征识别新的靶点,以减少次优的靶点选择。首先,我们创建了“肿瘤学TT”数据集,其中包括与十种常见癌症类型相关的基因/蛋白质。然后,我们为所有基因生成了三组特征:组学特征,蛋白质氨基酸序列BERT嵌入以及集成特征以单独训练和测试DL分类器。该模型在AUC方面实现了高预测性能,即对于所有癌症类型,AUC均大于0.88,对于白血病最高达0.95。此外,在七种常见的癌症类型中,OncoRTT的表现优于使用其数据的最先进方法中的五种。此外,OncoRTT使用与七种癌症类型相关的新测试数据预测新的治疗靶点。我们使用Open Targets平台和针对肺癌的前10个预测治疗靶点的案例研究进一步证实了这些结果。版权所有©2023 Thafar,Albaradei,Uludag,Alshahrani,Gojobori,Essack和Gao。
Late-stage drug development failures are usually a consequence of ineffective targets. Thus, proper target identification is needed, which may be possible using computational approaches. The reason being, effective targets have disease-relevant biological functions, and omics data unveil the proteins involved in these functions. Also, properties that favor the existence of binding between drug and target are deducible from the protein's amino acid sequence. In this work, we developed OncoRTT, a deep learning (DL)-based method for predicting novel therapeutic targets. OncoRTT is designed to reduce suboptimal target selection by identifying novel targets based on features of known effective targets using DL approaches. First, we created the "OncologyTT" datasets, which include genes/proteins associated with ten prevalent cancer types. Then, we generated three sets of features for all genes: omics features, the proteins' amino-acid sequence BERT embeddings, and the integrated features to train and test the DL classifiers separately. The models achieved high prediction performances in terms of area under the curve (AUC), i.e., AUC greater than 0.88 for all cancer types, with a maximum of 0.95 for leukemia. Also, OncoRTT outperformed the state-of-the-art method using their data in five out of seven cancer types commonly assessed by both methods. Furthermore, OncoRTT predicts novel therapeutic targets using new test data related to the seven cancer types. We further corroborated these results with other validation evidence using the Open Targets Platform and a case study focused on the top-10 predicted therapeutic targets for lung cancer.Copyright © 2023 Thafar, Albaradei, Uludag, Alshahrani, Gojobori, Essack and Gao.