贝叶斯半参数联合建模计数结果和不方便的时序预测变量。

Bayesian semiparametric joint modeling of a count outcome and inconveniently timed longitudinal predictors.

Original text

发表日期：2023 Feb 28

作者： Woobeen Lim, Michael L Pennell, Michelle J Naughton, Electra D Paskett

来源： STATISTICAL METHODS IN MEDICAL RESEARCH

摘要：

女性健康倡议（WHI）癌症治疗后生命及长寿（LILAC）研究是研究乳腺癌治疗后生活质量的极好资源。在研究开始时，询问女性有关最初癌症治疗后出现的新症状。在本文中，我们着眼于采用回归建模来估计临床和生活方式因素与新症状病例比较（受变量）的相关性，这些因素都是在癌症诊断时（独立变量）收集而来。虽然临床和生活方式数据是纵向收集的，但是诊断时或诊断之前的一致时间点上获得的测量数据很少，这使得数据分析更为复杂。此外，参数计数模型（例如泊松分布和负二项式）不能很好地拟合症状数据。因此，鉴于在LILAC中遇到的问题，我们提出了两种贝叶斯联合模型，用于处理纵向数据和计数结果。我们的两种模型根据结果分布上所作的假设而有所不同：一种使用负二项式分布，另一种使用非参数高斯舍入混合（RMG）分布。每个计数分布的平均值取决于在感兴趣的时间点（例如诊断）上连续变量、二元变量和序数变量的潜在值。为了便于插补，纵向变量共同使用线性混合模型来描述潜在的正常随机变量，并对随机个体效应分配狄利克雷过程先验以松弛分布假设。在模拟研究中，当数据不是负二项式分布时，RMG联合模型表现出了优越的功效和预测准确性。对于包含预测因子通过向前传递的最后一个值插补的RMG模型，产生偏向中性的估计结果。我们使用我们的模型研究了LILAC中癌症诊断时睡眠健康与癌症治疗后新症状病例之间的关系。

The Women's Health Initiative (WHI) Life and Longevity After Cancer (LILAC) study is an excellent resource for studying the quality of life following breast cancer treatment. At study entry, women were asked about new symptoms that appeared following their initial cancer treatment. In this article, we were interested in using regression modeling to estimate associations of clinical and lifestyle factors at cancer diagnosis (independent variables) with the number of new symptoms (dependent variable). Although clinical and lifestyle data were collected longitudinally, few measurements were obtained at diagnosis or at a consistent timepoint prior to diagnosis, which complicates the analysis. Furthermore, parametric count models, such as the Poisson and negative binomial, do not fit the symptom data well. Thus, motivated by the issues encountered in LILAC, we propose two Bayesian joint models for longitudinal data and a count outcome. Our two models differ according to the assumption on the outcome distribution: one uses a negative binomial (NB) distribution and the other a nonparametric rounded mixture of Gaussians (RMG). The mean of each count distribution is dependent on imputed values of continuous, binary, and ordinal variables at a time point of interest (e.g. diagnosis). To facilitate imputation, longitudinal variables are modeled jointly using a linear mixed model for a latent underlying normal random variable, and a Dirichlet process prior is assigned to the random subject-specific effects to relax distribution assumptions. In simulation studies, the RMG joint model exhibited superior power and predictive accuracy over the NB model when the data were not NB. The RMG joint model also outperformed an RMG model containing predictors imputed using the last value carried forward, which generated estimates that were biased toward the null. We used our models to examine the relationship between sleep health at diagnosis and the number of new symptoms following breast cancer treatment in LILAC.