研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

一项对CRC患者队列中微生物组16S rRNA序列分析方法(OTU聚类、DADA2和Deblur)的独立评估。

An independent evaluation in a CRC patient cohort of microbiome 16S rRNA sequence analysis methods: OTU clustering, DADA2, and Deblur.

发表日期:2023
作者: Guang Liu, Tong Li, Xiaoyan Zhu, Xuanping Zhang, Jiayin Wang
来源: Frontiers in Microbiology

摘要:

16S rRNA是微生物的通用基因,通常被用作通过下一代测序(NGS)技术获取微生物群落数据的靶基因。传统上,利用16S rRNA根据97%的阈值将序列聚类成操作分类单元(OTUs),并绕过了减少测序错误的方法,这可能导致错误的分类单元。已经发表了几种去噪声算法来解决这个问题,例如DADA2和Deblur,它们可以通过生成引物序列变体(ASVs)在单核苷酸分辨率上纠正测序错误。随着高分辨率ASVs比OTUs越来越受欢迎,并且通常在特定研究中只选择一种分析方法,在OTU聚类和去噪声流程之间需要进行全面比较。本研究通过使用来自结直肠癌(CRC)筛查队列的358个临床粪便样本生成的16S rRNA扩增测序数据,全面比较了三种最广泛使用的16S rRNA方法(两种去噪声算法,DADA2和Deblur,以及从头开始的OTU聚类)。我们的研究结果表明,所有方法导致的分类学特征相似(在PERMNAOVA中P >0.05,在Mantel检验中P <0.001),尽管ASVs/OTUs的数量和α多样性指数差异很大。尽管识别的与疾病相关的标志物存在显著差异,但与疾病相关的分析表明,所有方法都可以得出类似的结论。Fusobacterium、Streptococcus、Peptostreptococcus、Parvimonas、Gemella和Haemophilus被三种方法都鉴定为CRC组富集,而Roseburia、Faecalibacterium、Butyricicoccus和Blautia则被三种方法鉴定为健康组富集。此外,使用基于这些不同方法的数据的机器学习算法生成的疾病诊断模型都具有良好的诊断效率(AUC:0.87-0.89),其中基于DADA2的模型产生了最高的AUC(在训练集和测试集中分别为0.8944和0.8907)。然而,模型之间的性能没有明显差异(P >0.05)。总之,本研究证明DADA2、Deblur和从头开始的OTU聚类在分类单元分配中具有类似的能力,并且在CRC队列情况下可以得出类似的结论。版权所有©2023 刘,李,朱,张,王。
16S rRNA is the universal gene of microbes, and it is often used as a target gene to obtain profiles of microbial communities via next-generation sequencing (NGS) technology. Traditionally, sequences are clustered into operational taxonomic units (OTUs) at a 97% threshold based on the taxonomic standard using 16S rRNA, and methods for the reduction of sequencing errors are bypassed, which may lead to false classification units. Several denoising algorithms have been published to solve this problem, such as DADA2 and Deblur, which can correct sequencing errors at single-nucleotide resolution by generating amplicon sequence variants (ASVs). As high-resolution ASVs are becoming more popular than OTUs and only one analysis method is usually selected in a particular study, there is a need for a thorough comparison of OTU clustering and denoising pipelines. In this study, three of the most widely used 16S rRNA methods (two denoising algorithms, DADA2 and Deblur, along with de novo OTU clustering) were thoroughly compared using 16S rRNA amplification sequencing data generated from 358 clinical stool samples from the Colorectal Cancer (CRC) Screening Cohort. Our findings indicated that all approaches led to similar taxonomic profiles (with P > 0.05 in PERMNAOVA and P <0.001 in the Mantel test), although the number of ASVs/OTUs and the alpha-diversity indices varied considerably. Despite considerable differences in disease-related markers identified, disease-related analysis showed that all methods could result in similar conclusions. Fusobacterium, Streptococcus, Peptostreptococcus, Parvimonas, Gemella, and Haemophilus were identified by all three methods as enriched in the CRC group, while Roseburia, Faecalibacterium, Butyricicoccus, and Blautia were identified by all three methods as enriched in the healthy group. In addition, disease-diagnostic models generated using machine learning algorithms based on the data from these different methods all achieved good diagnostic efficiency (AUC: 0.87-0.89), with the model based on DADA2 producing the highest AUC (0.8944 and 0.8907 in the training set and test set, respectively). However, there was no significant difference in performance between the models (P >0.05). In conclusion, this study demonstrates that DADA2, Deblur, and de novo OTU clustering display similar power levels in taxa assignment and can produce similar conclusions in the case of the CRC cohort.Copyright © 2023 Liu, Li, Zhu, Zhang and Wang.