DeepICSH:一种用于从人类基因组中识别细胞特异性沉默子及其强度的复杂深度学习框架。
DeepICSH: a complex deep learning framework for identifying cell-specific silencers and their strength from the human genome.
发表日期:2023 Aug 29
作者:
Tianjiao Zhang, Liangyu Li, Hailong Sun, Dali Xu, Guohua Wang
来源:
BRIEFINGS IN BIOINFORMATICS
摘要:
沉默子是位于基因组上、抑制基因表达的非编码DNA序列片段。特定细胞中沉默子的变异与基因表达和癌症发展密切相关。仅仅依靠DNA序列信息进行沉默子识别的计算方法未能考虑到沉默子的细胞特异性,因此准确性降低。尽管已发现数种与基因组上沉默子相关的转录因子和表观遗传修饰,但仍未找到明确的生物学信号或其组合来完全表征沉默子,这增加了选择合适的生物学信号进行其识别的挑战。因此,我们提出了一种复杂的深度学习框架DeepICSH,该框架基于多个生物数据源。具体而言,DeepICSH利用深度卷积神经网络自动捕捉与沉默子强相关的生物学信号组合,这些信号组合来自多样的生物学信号。此外,注意机制的运用有助于得分和可视化这些信号组合,而跳跃连接的使用则有助于融合多层次序列特征和信号组合,从而实现了对特定细胞内沉默子的准确识别。在HepG2和K562细胞系数据集上进行了广泛实验,结果显示DeepICSH在沉默子识别方面优于最先进的方法。值得注意的是,我们首次提出了使用多组学数据进行分类强弱沉默子的深度学习框架,并取得了良好的性能。总之,DeepICSH 在复杂疾病中的沉默子研究和分析方面具有巨大潜力。源代码可在https://github.com/lyli1013/DeepICSH上获取。© 作者(们) 2023。由牛津大学出版社出版。版权所有。欲获取授权,请发送电子邮件至: journals.permissions@oup.com。
Silencers are noncoding DNA sequence fragments located on the genome that suppress gene expression. The variation of silencers in specific cells is closely related to gene expression and cancer development. Computational approaches that exclusively rely on DNA sequence information for silencer identification fail to account for the cell specificity of silencers, resulting in diminished accuracy. Despite the discovery of several transcription factors and epigenetic modifications associated with silencers on the genome, there is still no definitive biological signal or combination thereof to fully characterize silencers, posing challenges in selecting suitable biological signals for their identification. Therefore, we propose a sophisticated deep learning framework called DeepICSH, which is based on multiple biological data sources. Specifically, DeepICSH leverages a deep convolutional neural network to automatically capture biologically relevant signal combinations strongly associated with silencers, originating from a diverse array of biological signals. Furthermore, the utilization of attention mechanisms facilitates the scoring and visualization of these signal combinations, whereas the employment of skip connections facilitates the fusion of multilevel sequence features and signal combinations, thereby empowering the accurate identification of silencers within specific cells. Extensive experiments on HepG2 and K562 cell line data sets demonstrate that DeepICSH outperforms state-of-the-art methods in silencer identification. Notably, we introduce for the first time a deep learning framework based on multi-omics data for classifying strong and weak silencers, achieving favorable performance. In conclusion, DeepICSH shows great promise for advancing the study and analysis of silencers in complex diseases. The source code is available at https://github.com/lyli1013/DeepICSH.© The Author(s) 2023. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.