研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

机器学习研究非编码区SNPs预测非小细胞肺癌易感性。

Machine Learning Study of SNPs in Noncoding Regions to Predict Non-small Cell Lung Cancer Susceptibility.

发表日期:2023 Sep 02
作者: Y Huang, T Bao, T Zhang, G Ji, Y Wang, Z Ling, W Li
来源: Environmental Technology & Innovation

摘要:

非小细胞肺癌(NSCLC)是肺癌最常见的类型。环境因素和遗传因素被报道影响了肺癌易感性。我们在中国人群中使用Illumina全基因组亚洲筛查芯片对287例NSCLC患者和467例健康对照进行了基因组关联研究(GWAS),共使用了712,095个SNP(单核苷酸多态性)。通过逻辑回归模型,GWAS发现了与NSCLC风险相关的17个新的非编码区域SNP位点,其中前三个(rs80040741、rs9568547、rs6010259)的p值严格(<3.02e-6)。值得注意的是,rs80040741和rs6010259分别从MUC3A和MLC1的内含子区域注释而来。结合之前在中国NSCLC患者中报告的其他五个SNP和其他四个协变量(如吸烟情况、年龄、低剂量CT筛查、性别),机器学习方法构建的预测模型可将NSCLC与健康对照分开,准确率达到86%。这是首次将机器学习方法应用于利用遗传和临床特征预测NSCLC易感性。我们的发现将为NSCLC早期诊断提供有希望的方法,并改善我们对精准医学中应用机器学习方法的理解。版权所有©2023。由Elsevier Ltd.出版。
Non-small cell lung cancer (NSCLC) is the most common pathological subtype of lung cancer. Both environmental and genetic factors have been reported to impact the lung cancer susceptibility. We conducted a genome-wide association study (GWAS) of 287 NSCLC patients and 467 healthy controls in a Chinese population using the Illumina Genome-Wide Asian Screening Array Chip on 712,095 SNPs (single nucleotide polymorphisms). Using logistic regression modeling, GWAS identified 17 new noncoding region SNP loci associated with the NSCLC risk, and the top three (rs80040741, rs9568547, rs6010259) were under a stringent p-value (<3.02e-6). Notably, rs80040741 and rs6010259 were annotated from the intron regions of MUC3A and MLC1, respectively. Together with another five SNPs previously reported in Chinese NSCLC patients and another four covariates (e.g., smoking status, age, low dose CT screening, sex), a predictive model by machine learning methods can separate the NSCLC from healthy controls with an accuracy of 86%. This is the first time to apply machine learning method in predicting the NSCLC susceptibility using both genetic and clinical characteristics. Our findings will provide a promising method in NSCLC early diagnosis and improve our understanding of applying machine learning methods in precision medicine.Copyright © 2023. Published by Elsevier Ltd.