癌症中拷贝数进化的零不可知模型。
A zero-agnostic model for copy number evolution in cancer.
发表日期:2023 Nov 09
作者:
Henri Schmidt, Palash Sashittal, Benjamin J Raphael
来源:
PLoS Computational Biology
摘要:
新的低覆盖率单细胞 DNA 测序技术能够测量肿瘤内数千个单个细胞的拷贝数谱。根据这些数据,人们可以通过拷贝数畸变对基因组的转变进行建模来推断肿瘤的进化历史。拷贝数畸变会改变多个相邻的基因组基因座,违反了基因座独立进化的标准系统发育假设。因此,引入了推断拷贝数系统发育的专门模型。广泛使用的模型是拷贝数转换(CNT)模型,其中基因组由整数向量表示,拷贝数畸变是增加或减少基因组连续片段的拷贝数的事件。一对拷贝数概况之间的 CNT 距离是将一个概况转变为另一个概况所需的最小事件数。虽然可以有效地计算这个距离,但尚未开发出有效的算法来找到 CNT 模型下最简约的系统发育。我们引入了零不可知拷贝数转换(ZCNT)模型,它是 CNT 模型的简化,允许扩增或删除零拷贝区域。我们推导出两个拷贝数概况之间的 ZCNT 距离的闭合形式表达式,并表明,与 CNT 距离不同,ZCNT 距离形成一个度量。我们利用 ZCNT 距离的闭合形式表达式和拷贝数概况的替代表征来导出多项式时间算法,用于拷贝数概况上小简约问题的两个自然松弛。虽然 ZCNT 模型下允许的零拷贝数区域的改变在生物学上不现实,但我们在模拟和真实数据集上都表明,ZCNT 距离非常接近 CNT 距离。扩展我们针对 ZCNT 小简约问题的多项式时间算法,我们开发了一种算法 Lazac,用于解决拷贝数概况上的大简约问题。我们证明 Lazac 在模拟和真实数据上均优于现有的推断拷贝数系统发育的方法。版权所有:© 2023 Schmidt 等人。这是一篇根据知识共享署名许可条款分发的开放获取文章,允许在任何媒体上不受限制地使用、分发和复制,前提是注明原始作者和来源。
New low-coverage single-cell DNA sequencing technologies enable the measurement of copy number profiles from thousands of individual cells within tumors. From this data, one can infer the evolutionary history of the tumor by modeling transformations of the genome via copy number aberrations. Copy number aberrations alter multiple adjacent genomic loci, violating the standard phylogenetic assumption that loci evolve independently. Thus, specialized models to infer copy number phylogenies have been introduced. A widely used model is the copy number transformation (CNT) model in which a genome is represented by an integer vector and a copy number aberration is an event that either increases or decreases the number of copies of a contiguous segment of the genome. The CNT distance between a pair of copy number profiles is the minimum number of events required to transform one profile to another. While this distance can be computed efficiently, no efficient algorithm has been developed to find the most parsimonious phylogeny under the CNT model.We introduce the zero-agnostic copy number transformation (ZCNT) model, a simplification of the CNT model that allows the amplification or deletion of regions with zero copies. We derive a closed form expression for the ZCNT distance between two copy number profiles and show that, unlike the CNT distance, the ZCNT distance forms a metric. We leverage the closed-form expression for the ZCNT distance and an alternative characterization of copy number profiles to derive polynomial time algorithms for two natural relaxations of the small parsimony problem on copy number profiles. While the alteration of zero copy number regions allowed under the ZCNT model is not biologically realistic, we show on both simulated and real datasets that the ZCNT distance is a close approximation to the CNT distance. Extending our polynomial time algorithm for the ZCNT small parsimony problem, we develop an algorithm, Lazac, for solving the large parsimony problem on copy number profiles. We demonstrate that Lazac outperforms existing methods for inferring copy number phylogenies on both simulated and real data.Copyright: © 2023 Schmidt et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.