研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

使用电子索赔记录预测肺癌:基于 Transformer 的方法。

Lung Cancer Prediction using Electronic Claims Records: A Transformer-based Approach.

发表日期:2023 Oct 12
作者: Huan-Yu Chen, Hui-Min Wang, Ching-Heng Lin, Rob Yang, Chi-Chun Lee
来源: IEEE Journal of Biomedical and Health Informatics

摘要:

电子索赔记录 (ECR) 是个人医疗服务寻求行为的大规模纵向集合。与院内病历(EMR)相比,ECR 更加标准化和跨站点。最近,有研究表明,对广泛的医疗应用的索赔数据建模取得了有希望的结果。然而,他们中很少有人提出队列选择的排除标准,以在没有先前迹象的情况下提取新的发病率,而且往往缺乏对早期癌症预测的重视。在这项工作中,我们的目标是使用 ECR 设计肺癌预测框架,并使用最先进的基于序列的变压器进行严格的排除设计。此外,这项工作提出了将疾病预测模型应用于台湾整个人口的首批结果之一。结果显示,在我们的数据集中,全期肺癌的预测能力超过 2.1,平均阳性预测值 (PPV) 为 5,曲线下面积 (AUC) 为 0.668,早期阶段的预测能力约为 2.0,平均 PPV 为 1,AUC 为 0.645 。子队列分析可以将高精度选择性群体纳入优先临床检查。发作分析验证了我们的排除标准的效果。这项工作对肺癌预测进行了全面的分析,所提出的方法可以作为索赔数据的最先进的疾病风险预测框架。
Electronic claims records (ECRs) are large scale and longitudinal collections of individual's medical service seeking actions. Compared to in-hospital medical records (EMRs), ECRs are more standardized and cross-sites. Recently, there has been studies showing promising results on modeling claims data for a wide range of medical applications. However, few of them address the exclusion criteria on cohort selection to extract new incidence without prior signs and also often lack of emphasis on predicting cancer in early stages. In this work, we aim to design a lung cancer prediction framework using ECRs with rigorous exclusion design using state-of-the-art sequence-based transformer. Furthermore, this work presents one of the first results by applying disease prediction model to the entire population in Taiwan. The result shows over 2.1 predictive power, 5 average positive predictive value (PPV), and 0.668 area under curve (AUC) in all-stage lung cancer and around 2.0 predictive power, 1 average PPV and 0.645 AUC in early-stage in our dataset. Sub-cohort analysis could funnel high precision selective group into prioritized clinical examination. Onset analysis validates the effect of our exclusion criteria. This work presents comprehensive analyses on lung cancer prediction, and the proposed approach can serve as a state-of-the-art disease risk prediction framework on claims data.