研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

COVID-19 大流行期间行为癌症风险因素的公共卫生监测:Twitter 数据的情绪和情绪分析。

Public Health Surveillance of Behavioral Cancer Risk Factors During the COVID-19 Pandemic: Sentiment and Emotion Analysis of Twitter Data.

发表日期:2023 Nov 02
作者: Nicolette Christodoulakis, Wael Abdelkader, Cynthia Lokker, Michelle Cotterchio, Lauren E Griffith, Leigh M Vanderloo, Laura N Anderson
来源: PHYSICAL THERAPY & REHABILITATION JOURNAL

摘要:

COVID-19 大流行及其相关的公共卫生缓解策略极大地改变了全世界日常生活活动的模式,对行为风险因素(包括吸烟、饮酒、营养不良和缺乏身体活动)产生了意想不到的后果。社交媒体数据的信息流行可能为评估大流行期间与行为风险因素相关的变化提供新的机会。我们探讨了使用 Twitter 数据进行情绪和情绪分析来评估行为癌症风险因素(身体不活动、营养不良、酗酒)的可行性。在 COVID-19 大流行的第一年中,随着时间的推移,消费和吸烟)。2020 年期间与 COVID-19 大流行和 4 个癌症风险因素相关的推文是从乔治华盛顿大学图书馆数据宇宙中提取的。使用关键字定义和过滤推文以创建 4 个数据集。我们使用预先标记的 Twitter 数据集训练和测试了机器学习分类器。这用于确定每条推文的情绪(积极、消极或中立)。使用自然语言处理包根据推文中包含的单词来识别情绪(愤怒、期待、厌恶、恐惧、快乐、悲伤、惊讶和信任)。随着时间的推移,对每个风险因素的情绪和情绪进行评估,并进行分析,以确定出现的关键词。情绪分析显示,关于身体活动的推文中有 56.69% (51,479/90,813) 是积极的,16.4% (14,893/90,813) 是积极的。负面的,26.91% (24,441/90,813) 是中性的。营养方面也观察到类似的模式,其中正面、负面和中立的推文分别占 55.44% (27,939/50,396)、15.78% (7950/50,396) 和 28.79% (14,507/50,396)。对于酒精,正面、负面和中性推文的比例分别为 46.85%(34,897/74,484)、22.9%(17,056/74,484)和 30.25%(22,531/74,484),而对于吸烟,则为 41.2%(分别为 11,628/28,220)、24.23% (6839/28,220) 和 34.56% (9753/28,220)。随着时间的推移,情绪相对稳定。情绪分析表明,体育活动和营养推文中表达的最常见情绪是信任(分别为 69,495/320,741、21.67% 和 42,324/176,564、23.97%);对于酒精来说,是快乐(49,147/273,128, 17.99%);对于吸烟,则是恐惧(23,066/110,256,20.92%)。在观察期间,表达的情绪保持相对稳定。对推文中最常见单词的分析进一步揭示了与某些风险因素和可能的偏差来源相关的共同主题。这项分析提供了对新冠疫情第一年推特上表达的行为癌症风险因素的深入了解。 19 流行病。提取与所有 4 个风险因素相关的推文是可行的,并且大多数推文都具有积极的情绪,不同数据集的情绪各不相同。尽管这些结果可以在促进公共卫生方面发挥作用,但可以通过定性分析进行更深入的研究,以对每条推文进行上下文检查。©Nicolette Christodoulakis、Wael Abdelkader、Cynthia Lokker、Michelle Cotterchio、Lauren E Griffith、Leigh M Vanderloo ,劳拉·N·安德森。最初发表于 JMIR 形成研究 (https://formative.jmir.org),2023 年 11 月 2 日。
The COVID-19 pandemic and its associated public health mitigation strategies have dramatically changed patterns of daily life activities worldwide, resulting in unintentional consequences on behavioral risk factors, including smoking, alcohol consumption, poor nutrition, and physical inactivity. The infodemic of social media data may provide novel opportunities for evaluating changes related to behavioral risk factors during the pandemic.We explored the feasibility of conducting a sentiment and emotion analysis using Twitter data to evaluate behavioral cancer risk factors (physical inactivity, poor nutrition, alcohol consumption, and smoking) over time during the first year of the COVID-19 pandemic.Tweets during 2020 relating to the COVID-19 pandemic and the 4 cancer risk factors were extracted from the George Washington University Libraries Dataverse. Tweets were defined and filtered using keywords to create 4 data sets. We trained and tested a machine learning classifier using a prelabeled Twitter data set. This was applied to determine the sentiment (positive, negative, or neutral) of each tweet. A natural language processing package was used to identify the emotions (anger, anticipation, disgust, fear, joy, sadness, surprise, and trust) based on the words contained in the tweets. Sentiments and emotions for each of the risk factors were evaluated over time and analyzed to identify keywords that emerged.The sentiment analysis revealed that 56.69% (51,479/90,813) of the tweets about physical activity were positive, 16.4% (14,893/90,813) were negative, and 26.91% (24,441/90,813) were neutral. Similar patterns were observed for nutrition, where 55.44% (27,939/50,396), 15.78% (7950/50,396), and 28.79% (14,507/50,396) of the tweets were positive, negative, and neutral, respectively. For alcohol, the proportions of positive, negative, and neutral tweets were 46.85% (34,897/74,484), 22.9% (17,056/74,484), and 30.25% (22,531/74,484), respectively, and for smoking, they were 41.2% (11,628/28,220), 24.23% (6839/28,220), and 34.56% (9753/28,220), respectively. The sentiments were relatively stable over time. The emotion analysis suggests that the most common emotion expressed across physical activity and nutrition tweets was trust (69,495/320,741, 21.67% and 42,324/176,564, 23.97%, respectively); for alcohol, it was joy (49,147/273,128, 17.99%); and for smoking, it was fear (23,066/110,256, 20.92%). The emotions expressed remained relatively constant over the observed period. An analysis of the most frequent words tweeted revealed further insights into common themes expressed in relation to some of the risk factors and possible sources of bias.This analysis provided insight into behavioral cancer risk factors as expressed on Twitter during the first year of the COVID-19 pandemic. It was feasible to extract tweets relating to all 4 risk factors, and most tweets had a positive sentiment with varied emotions across the different data sets. Although these results can play a role in promoting public health, a deeper dive via qualitative analysis can be conducted to provide a contextual examination of each tweet.©Nicolette Christodoulakis, Wael Abdelkader, Cynthia Lokker, Michelle Cotterchio, Lauren E Griffith, Leigh M Vanderloo, Laura N Anderson. Originally published in JMIR Formative Research (https://formative.jmir.org), 02.11.2023.