Public Health Surveillance of Behavioral Cancer Risk Factors During the COVID-19 Pandemic: Sentiment and Emotion Analysis of Twitter Data

JMIR formative research(2023)

引用 0|浏览3
暂无评分
摘要
Background: The COVID-19 pandemic and its associated public health mitigation strategies have dramatically changed patterns of daily life activities worldwide, resulting in unintentional consequences on behavioral risk factors, including smoking, alcohol consumption, poor nutrition, and physical inactivity. The infodemic of social media data may provide novel opportunities for evaluating changes related to behavioral risk factors during the pandemic. Objective: We explored the feasibility of conducting a sentiment and emotion analysis using Twitter data to evaluate behavioral cancer risk factors (physical inactivity, poor nutrition, alcohol consumption, and smoking) over time during the first year of the COVID-19 pandemic.Methods: Tweets during 2020 relating to the COVID-19 pandemic and the 4 cancer risk factors were extracted from the George Washington University Libraries Dataverse. Tweets were defined and filtered using keywords to create 4 data sets. We trained and tested a machine learning classifier using a prelabeled Twitter data set. This was applied to determine the sentiment (positive, negative, or neutral) of each tweet. A natural language processing package was used to identify the emotions (anger, anticipation, disgust, fear, joy, sadness, surprise, and trust) based on the words contained in the tweets. Sentiments and emotions for each of the risk factors were evaluated over time and analyzed to identify keywords that emerged.Results: The sentiment analysis revealed that 56.69% (51,479/90,813) of the tweets about physical activity were positive, 16.4% (14,893/90,813) were negative, and 26.91% (24,441/90,813) were neutral. Similar patterns were observed for nutrition, where 55.44% (27,939/50,396), 15.78% (7950/50,396), and 28.79% (14,507/50,396) of the tweets were positive, negative, and neutral, respectively. For alcohol, the proportions of positive, negative, and neutral tweets were 46.85% (34,897/74,484), 22.9% (17,056/74,484), and 30.25% (22,531/74,484), respectively, and for smoking, they were 41.2% (11,628/28,220), 24.23% (6839/28,220), and 34.56% (9753/28,220), respectively. The sentiments were relatively stable over time. The emotion analysis suggests that the most common emotion expressed across physical activity and nutrition tweets was trust (69,495/320,741, 21.67% and 42,324/176,564, 23.97%, respectively); for alcohol, it was joy (49,147/273,128, 17.99%); and for smoking, it was fear (23,066/110,256, 20.92%). The emotions expressed remained relatively constant over the observed period. An analysis of the most frequent words tweeted revealed further insights into common themes expressed in relation to some of the risk factors and possible sources of bias.Conclusions: This analysis provided insight into behavioral cancer risk factors as expressed on Twitter during the first year of the COVID-19 pandemic. It was feasible to extract tweets relating to all 4 risk factors, and most tweets had a positive sentiment with varied emotions across the different data sets. Although these results can play a role in promoting public health, a deeper dive via qualitative analysis can be conducted to provide a contextual examination of each tweet.
更多
查看译文
关键词
cancer risk factors, Twitter, sentiment analysis, emotion analysis, social media, physical inactivity, poor nutrition, alcohol, smoking
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要