研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

3DDPDs:描述蛋白质动力学以进行蛋白质化学计量生物活性预测——以(突变型)G蛋白偶联受体为例。

3DDPDs: describing protein dynamics for proteochemometric bioactivity prediction. A case for (mutant) G protein-coupled receptors.

发表日期:2023 Aug 28
作者: Marina Gorostiola González, Remco L van den Broek, Thomas G M Braun, Magdalini Chatzopoulou, Willem Jespers, Adriaan P IJzerman, Laura H Heitman, Gerard J P van Westen
来源: MOLECULAR & CELLULAR PROTEOMICS

摘要:

蛋白质化学计量学(Proteochemometric,PCM)建模是一种强大的计算药物发现工具,用于基于化学和蛋白质信息进行潜在药物候选物的生物活性预测。在PCM中,计算特征用于描述小分子和蛋白质,这直接影响预测模型的质量。然而,最先进的蛋白质描述符是通过蛋白质序列计算出来的,忽略了蛋白质的动态性质。可以使用分子动力学(MD)进行计算模拟来模拟这种动态性质。在这里,设计了新颖的三维动态蛋白质描述符(3DDPDs),以应用于PCM模型中的生物活性预测任务。作为测试案例,使用GPCRmd平台上公开可用的G蛋白偶联受体(GPCR)MD数据。GPCRs是膜结合蛋白质,由激素和神经递质激活,是药物发现的重要靶点家族。GPCRs存在不同构象状态,可以传递多样化的信号,并可以被配体相互作用等因素所修改。为了翻译MD编码的蛋白质动力学,考虑了两种类型的3DDPDs:一种是独热编码的残基特异性(rs),另一种是类嵌入的蛋白质特异性(ps)3DDPDs。通过计算轨迹坐标和部分电荷的分布、应用降维技术,并将其分别压缩成每个残基或蛋白质的向量来开发描述符。在几个PCM任务中,将3DDPD和最先进的非动态蛋白质描述符进行了基准测试。我们的rs和ps3DDPD在使用时间上的分割进行回归任务时优于非动态描述符,在随机分割和所有分类任务中表现出可比较的性能。非动态描述符与3DDPD的组合并未导致性能提高。最后,探索了3DDPDs捕捉突变GPCRs的动态波动的能力。这里呈现的结果展示了在机器学习任务,特别是生物活性预测中包括蛋白质动态信息的潜力,并为肿瘤学等药物发现应用提供了机会。© 2023 Springer Nature Switzerland AG.
Proteochemometric (PCM) modelling is a powerful computational drug discovery tool used in bioactivity prediction of potential drug candidates relying on both chemical and protein information. In PCM features are computed to describe small molecules and proteins, which directly impact the quality of the predictive models. State-of-the-art protein descriptors, however, are calculated from the protein sequence and neglect the dynamic nature of proteins. This dynamic nature can be computationally simulated with molecular dynamics (MD). Here, novel 3D dynamic protein descriptors (3DDPDs) were designed to be applied in bioactivity prediction tasks with PCM models. As a test case, publicly available G protein-coupled receptor (GPCR) MD data from GPCRmd was used. GPCRs are membrane-bound proteins, which are activated by hormones and neurotransmitters, and constitute an important target family for drug discovery. GPCRs exist in different conformational states that allow the transmission of diverse signals and that can be modified by ligand interactions, among other factors. To translate the MD-encoded protein dynamics two types of 3DDPDs were considered: one-hot encoded residue-specific (rs) and embedding-like protein-specific (ps) 3DDPDs. The descriptors were developed by calculating distributions of trajectory coordinates and partial charges, applying dimensionality reduction, and subsequently condensing them into vectors per residue or protein, respectively. 3DDPDs were benchmarked on several PCM tasks against state-of-the-art non-dynamic protein descriptors. Our rs- and ps3DDPDs outperformed non-dynamic descriptors in regression tasks using a temporal split and showed comparable performance with a random split and in all classification tasks. Combinations of non-dynamic descriptors with 3DDPDs did not result in increased performance. Finally, the power of 3DDPDs to capture dynamic fluctuations in mutant GPCRs was explored. The results presented here show the potential of including protein dynamic information on machine learning tasks, specifically bioactivity prediction, and open opportunities for applications in drug discovery, including oncology.© 2023. Springer Nature Switzerland AG.