研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

蛋白质磷酸化位点预测的机器学习和算法方法综述。

A Review of Machine Learning and Algorithmic Methods for Protein Phosphorylation Sites Prediction.

发表日期:2023 Oct 18
作者: Farzaneh Esmaili, Mahdi Pourmirzaei, Shahin Ramazi, Seyedehsamaneh Shojaeilangari, Elham Yavari
来源: GENOMICS PROTEOMICS & BIOINFORMATICS

摘要:

翻译后修饰 (PTM) 在扩展蛋白质功能多样性方面发挥着关键作用,从而调节原核和真核生物中的多种细胞过程。磷酸化修饰是大多数蛋白质中发生的重要 PTM,并在许多生物过程中发挥重要作用。磷酸化过程的紊乱会导致多种疾病,包括神经系统疾病和癌症。这篇综述论文的目的是整理与磷酸化位点(p 位点)预测相关的知识体系,以促进该领域的未来研究。首先,我们全面回顾了所有相关数据库,并介绍了 p 位点预测中数据集创建、数据预处理和方法评估的所有步骤。接下来,我们研究了 p 位点预测方法,这些方法分为两个计算组:算法和机器学习 (ML)。此外,研究表明,机器学习的 p 位点预测基本上有两种主要方法:传统方法和端到端深度学习方法,并对这两种方法进行了概述。此外,这项研究还介绍了最重要的特征提取技术,这些技术主要用于p位点预测。最后,我们根据一般物种和人类物种,利用与 2022 年发布的 dbPTM 数据库版本相关的新蛋白质创建了三个测试集。对 dbPTM 2022 版本中引入的新添加蛋白质(与 dbPTM 2019 版本中的蛋白质不同)评估在线 p 位点预测工具揭示了其局限性。换句话说,这些在线 p 位点预测工具对看不见的蛋白质的实际性能明显低于各自研究论文中报告的结果。版权所有 © 2023。由 Elsevier B.V. 出版。
Post-translational modifications (PTMs) have key roles in extending the functional diversity of proteins and as a result, regulating diverse cellular processes in prokaryotic and eukaryotic organisms. Phosphorylation modification is a vital PTM that occurs in most proteins and plays a significant role in many biological processes. Disorders in the phosphorylation process lead to multiple diseases including neurological disorders and cancers. The purpose of this review paper is to organize this body of knowledge associated with phosphorylation site (p-site) prediction to facilitate future research in this field. At first, we comprehensively reviewed all related databases and introduced all steps regarding dataset creation, data preprocessing, and method evaluation in p-site prediction. Next, we investigated p-sites prediction methods which were divided into two computational groups: algorithmic and machine learning (ML). Additionally, it was shown that there are basically two main approaches for p-sites prediction by ML: conventional and end-to-end deep learning methods, which were given an overview for both of them. Moreover, this study introduced the most important feature extraction techniques which have mostly been used in p-site prediction. Finally, we created three test sets from new proteins related to the released version of the dbPTM database in 2022 based on general and human species. Evaluating online p-site prediction tools on new added proteins introduced in the dbPTM 2022 release, distinct from those in the dbPTM 2019 release, revealed their limitations. In other words, the actual performance of these online p-site prediction tools on unseen proteins is notably lower than the results reported in their respective research papers.Copyright © 2023. Published by Elsevier B.V.