Using Tweets to Understand How COVID-19-Related Health Beliefs Are Affected in the Age of Social Media: Twitter Data Analysis Study
Background: The emergence of SARS-CoV-2 (ie, COVID-19) has given rise to a global pandemic affecting 215 countries and over 40 million people as of October 2020. Meanwhile, we are also experiencing an infodemic induced by the overabundance of information, some accurate and some inaccurate, spreading rapidly across social media platforms. Social media has arguably shifted the information acquisition and dissemination of a considerably large population of internet users toward higher interactivities. Objective: This study aimed to investigate COVID-19-related health beliefs on one of the mainstream social media platforms, Twitter, as well as potential impacting factors associated with fluctuations in health beliefs on social media. Methods: We used COVID-19-related posts from the mainstream social media platform Twitter to monitor health beliefs. A total of 92,687,660 tweets corresponding to 8,967,986 unique users from January 6 to June 21, 2020, were retrieved. To quantify health beliefs, we employed the health belief model (HBM) with four core constructs: perceived susceptibility, perceived severity, perceived benefits, and perceived barriers. We utilized natural language processing and machine learning techniques to automate the process of judging the conformity of each tweet with each of the four HBM constructs. A total of 5000 tweets were manually annotated for training the machine learning architectures. Results: The machine learning classifiers yielded areas under the receiver operating characteristic curves over 0.86 for the classification of all four HBM constructs. Our analyses revealed a basic reproduction number R(0 )of 7.62 for trends in the number of Twitter users posting health belief-related content over the study period. The fluctuations in the number of health belief-related tweets could reflect dynamics in case and death statistics, systematic interventions, and public events. Specifically, we observed that scientific events, such as scientific publications, and nonscientific events, such as politicians' speeches, were comparable in their ability to influence health belief trends on social media through a Kruskal-Wallis test (P=.78 and P=.92 for perceived benefits and perceived barriers, respectively). Conclusions: As an analogy of the classic epidemiology model where an infection is considered to be spreading in a population with an R-0 greater than 1, we found that the number of users tweeting about COVID-19 health beliefs was amplifying in an epidemic manner and could partially intensify the infodemic. It is unhealthy that both scientific and nonscientific events constitute no disparity in impacting the health belief trends on Twitter, since nonscientific events, such as politicians' speeches, might not be endorsed by substantial evidence and could sometimes be misleading.
original_citation: Wang HY, Li YK, Hutch M, Naidech A, Luo Y. Using Tweets to Understand How COVID-19-Related Health Beliefs Are Affected in the Age of Social Media: Twitter Data Analysis Study. Journal of Medical Internet Research. 2021;23(2):15.
HW, YLi, and YLuo conceived of the presented idea. HW, YLi, and MH contributed to the data annotation and validation. HW carried out all the experiments and conducted all the data analyses. HW wrote the manuscript and designed the tables and figures with support from MH and YLi. AN provided clinical insights and contributed to the results interpretation. YLuo advised and supervised the entire project. We thank Twitter for providing the API for developers to retrieve tweets for this research.