This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
Timely understanding of public perceptions allows public health agencies to provide up-to-date responses to health crises such as infectious diseases outbreaks. Social media such as Twitter provide an unprecedented way for the prompt assessment of the large-scale public response.
The aims of this study were to develop a scheme for a comprehensive public perception analysis of a measles outbreak based on Twitter data and demonstrate the superiority of the convolutional neural network (CNN) models (compared with conventional machine learning methods) on measles outbreak-related tweets classification tasks with a relatively small and highly unbalanced gold standard training set.
We first designed a comprehensive scheme for the analysis of public perception of measles based on tweets, including 3 dimensions: discussion themes, emotions expressed, and attitude toward vaccination. All 1,154,156 tweets containing the word “measles” posted between December 1, 2014, and April 30, 2015, were purchased and downloaded from DiscoverText.com. Two expert annotators curated a gold standard of 1151 tweets (approximately 0.1% of all tweets) based on the 3-dimensional scheme. Next, a tweet classification system based on the CNN framework was developed. We compared the performance of the CNN models to those of 4 conventional machine learning models and another neural network model. We also compared the impact of different word embeddings configurations for the CNN models: (1) Stanford GloVe embedding trained on billions of tweets in the general domain, (2) measles-specific embedding trained on our 1 million measles related tweets, and (3) a combination of the 2 embeddings.
Cohen kappa intercoder reliability values for the annotation were: 0.78, 0.72, and 0.80 on the 3 dimensions, respectively. Class distributions within the gold standard were highly unbalanced for all dimensions. The CNN models performed better on all classification tasks than k-nearest neighbors, naïve Bayes, support vector machines, or random forest. Detailed comparison between support vector machines and the CNN models showed that the major contributor to the overall superiority of the CNN models is the improvement on recall, especially for classes with low occurrence. The CNN model with the 2 embedding combination led to better performance on discussion themes and emotions expressed (microaveraging F1 scores of 0.7811 and 0.8592, respectively), while the CNN model with Stanford embedding achieved best performance on attitude toward vaccination (microaveraging F1 score of 0.8642).
The proposed scheme can successfully classify the public’s opinions and emotions in multiple dimensions, which would facilitate the timely understanding of public perceptions during the outbreak of an infectious disease. Compared with conventional machine learning methods, our CNN models showed superiority on measles-related tweet classification tasks with a relatively small and highly unbalanced gold standard. With the success of these tasks, our proposed scheme and CNN-based tweets classification system is expected to be useful for the analysis of tweets about other infectious diseases such as influenza and Ebola.
Nearly 40 million cases of measles, caused by a highly contagious virus, lead to over 300,000 deaths worldwide every year [
During an outbreak of an infectious disease such as measles, responsible public health agencies need to send out timely messages to the public during different stages of the crisis [
Social media have been increasingly used by the general public, patients, and health professionals to communicate about health-related issues [
Many studies have used Twitter to assess various public health topics. However, most of the studies thus far have focused on analyzing the frequency of postings rather than on understanding post contents [
Compared to conventional machine learning algorithms, neural network models are advantageous because they have saved significant time on task-specific features engineering, achieved higher performance, and are scalable to large applications [
All tweets including the word “measles” posted between December 1, 2014, and April 30, 2015, were purchased and downloaded from DiscoverText.com. This time frame was chosen because the unidentified Patient Zero of this outbreak visited the Disneyland theme park in California in December 2014. The first few suspected cases of measles were reported on January 5, 2015, and the last case was reported on March 2, 2015. CDC officially declared the outbreak to be over on April 17, 2015 [
Frequency of measles-related tweets by date and type.
In order to understand measles-related contents on Twitter comprehensively, we created an annotation scheme containing 3 dimensions:
Two coders manually coded 0.1% of all tweets selected through systematic sampling. The first tweet was identified using a random number generator. After this, every 1000th tweet was selected in the sample. The Cohen kappa intercoder reliability values for the 3 dimensions were 0.78, 0.72, 0.80, respectively. Afterward, the 2 coders discussed their results to resolve discrepancies.
The vocabulary used on Twitter is very different from the general English vocabulary. User names, URLs, and hashtags need to be normalized. We first replaced tokens containing all capital letters with the lowercase of the token with string “<ALLCAPS>”. Then all URLs were replaced with string “<URL>”. Twitter user names (eg, @twitter) were then replaced with string “<USER>”. All numbers were replaced with string “<NUMBER>”. All hashtags were separated into tokens by uppercase letters (eg, we replace “#VaccineWork” with “<HASHTAG> Vaccine Work”). Afterwards, all tweets were converted to lowercase. Our tweets preprocessing process was based on the Stanford GloVe tweets preprocessing script [
Raw tweet text: “RT @KTLA: #BREAKING: At least 9 measles cases linked to visits to @Disneyland from Dec. 15-20 http://t.co/1GRlwFhPgv http://t.co/3Nl15jmqAE”
Cleaned tweet text: “rt <allcaps> <user>: breaking: at least <number> measles cases linked to visits to <user> from dec. <number> <number> <url> <url>”
Commonly used in various computer vision tasks [
Measles tweets annotation scheme for different dimensions.
System architecture for measles-related tweets classification using convolutional neural networks.
We used 3 filters of size 3, 4, and 5 to generate the convolutional layer on each embedding. The feature maps generated by filters from each embedding were concatenated and fed to the pooling layer. We adopted max-pooling strategy with a dropout rate at 0.5 on the pooling layer. The output layer consisted of different classes for each dimension. This CNN system was built based on the Python and Tensorflow libraries [
For generic tweets embedding, we used pretrained GloVe tweets embedding from Stanford. GloVe is an unsupervised learning algorithm developed by Pennington et al [
For the CNN-based framework, we performed the following experiments: (1) use of pretrained GloVe tweets embedding only, (2) use of tweets measles embedding only, and (3) use of a combination of the pretrained GloVe tweets embedding and measles tweets embedding. For the use of 1 embedding only, we just used 1 channel of the proposed framework. We chose 4 popular machine learning models for comparison as our baselines: KNN [
We leveraged a 10-fold cross-validation to evaluate the performances of these models for each classification task. Standard metrics including precision, recall, and F1 score were calculated for each class. We also calculated the microaveraging F score and macroaveraging F score to evaluate their performance on each classification task. For microaveraged score, we summed up all the individual true positives, false positives, and false negatives. For macroaveraged score, we took the average of the F1 score of different categories.
This study received institutional review board approval from the Committee for the Protection of Human Subjects at the University of Texas Health Science Center at Houston. The reference number is HSC-SBMI-16-0291.
In total, 1151 tweets were annotated. Class distributions were highly unbalanced for all 3 tasks (
Comparison of the performances of CNN models and 4 machine learning models on the 3 dimensions can be seen in
Class distribution in the gold standard for 3 dimensions.
Dimension and class | Tweets, n (%) | ||
Resource | 718 (62.4) | ||
Personal experience | 21 (1.8) | ||
Personal opinions and interest | 344 (29.9) | ||
Question | 20 (1.7) | ||
Other | 48 (4.2) | ||
Humor or sarcasm | 109 (9.5) | ||
Positive emotion | 39 (3.4) | ||
Anger | 35 (3.0) | ||
Concern | 919 (79.8) | ||
Not applicable | 49 (4.3) | ||
Pro | 202 (17.6) | ||
Against | 36 (3.1) | ||
Not applicable | 913 (79.3) |
Ten-fold cross-validation results of neural network models and 4 conventional machine learning models on 3 dimensions. Italics indicate best performance in that class.
Model | Microaveraging F score | Macroaveraging F score | ||||
Discussion themes | Emotions expressed | Attitude toward vaccination | Discussion themes | Emotions expressed | Attitude toward vaccination | |
KNNa | 0.5143 | 0.6977 | 0.8129 | 0.3223 | 0.4074 | 0.5114 |
Naïve Bayes | 0.6811 | 0.7767 | 0.7171 | 0.4101 | 0.4814 | 0.5343 |
Random forest | 0.7350 | 0.8393 | 0.8085 | 0.4243 | 0.4393 | 0.5356 |
SVMb | 0.7696 | 0.8365 | 0.8211 | 0.3917 | 0.4269 | 0.5345 |
Bi-LSTMc | 0.7315 | 0.8271 | 0.7958 | 0.2899 | 0.3730 | 0.4358 |
CNN_Md | 0.7533 | 0.8480 | 0.8355 | 0.4282 | 0.4849 | 0.5871 |
CNN_Se | 0.8575 | 0.4158 | 0.5419 | |||
CNN_M+Sf | 0.7811 | 0.8254 | 0.6078 |
aKNN: k-nearest neighbor.
bSVM: support vector machines.
cBi-LSTM: bidirectional long short-term memory.
dCNN_M: convolutional neural network using the measles tweets embedding.
eCNN_S: convolutional neural network using the pretrained GloVe tweets embedding from Stanford.
fCNN_M+S: convolutional neural network using the combination of pretrained GloVe tweets embedding and measles tweets embedding.
As shown in
The comparison of SVM and the CNN models on
For dimension 3,
Detailed precision, recall, and F score of each class for
Class | Precision | Recall | F1 score | |||||||
SVMa | CNN_M+Sb | CNN_Sc | SVM | CNN_M+S | CNN_S | SVM | CNN_M+S | CNN_S | ||
Resource (n=718) | 0.7907 | 0.8119 | 0.9318 | 0.9401 | 0.8619 | 0.8677 | ||||
Personal experience (n=21) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
Personal opinions and interest (n=344) | 0.7021 | 0.6984 | 0.5773 | 0.6192 | 0.6336 | 0.6564 | ||||
Question (n=20) | 0 | 0.5 | 0 | 0 | 0.0500 | 0 | 0 | 0 | ||
Other (n=48) | 0.8750 | 0.8421 | 0.1458 | 0.2500 | 0.2500 | 0.3871 |
aSVM: support vector machines.
bCNN_M+S: convolutional neural network using the combination of pretrained GloVe tweets embedding and measles tweets embedding.
cCNN_S: convolutional neural network using the pretrained GloVe tweets embedding from Stanford.
Detailed precision, recall and F scores of each class for
Class | Precision | Recall | F1 score | |||||||||
SVMa | CNN_M+Sb | CNN_Sc | SVM | CNN_M+S | CNN_S | SVM | CNN_ M+S | CNN_S | ||||
Humor or sarcasm (n=109) | 0.9388 | 0.8909 | 0.3486 | 0.4220 | 0.5170 | 0.5823 | ||||||
Positive emotion (n=39) | 0.0513 | 0.1282 | 0.0967 | 0.2273 | ||||||||
Anger (n=35) | 0 | 0.6667 | 0 | 0.0286 | 0 | 0.0556 | ||||||
Concern (n=919) | 0.8312 | 0.8538 | 0.9069 | 0.9946 | 0.9069 | 0.9195 | ||||||
Not applicable (n=49) | 0.7500 | 0.8947 | 0.2105 | 0.3469 | 0.2105 | 0.5000 |
aSVM: support vector machines.
bCNN_M+S: convolutional neural network using the combination of pretrained GloVe tweets embedding and measles tweets embedding.
cCNN_S: convolutional neural network using the pretrained GloVe tweets embedding from Stanford.
Detailed precision, recall, and F score of each class for
Class | Precision | Recall | F1 score | ||||||||
SVMa | CNN_M+Sb | CNN_Sc | SVM | CNN_M+S | CNN_S | SVM | CNN_M+S | CNN_S | |||
Pro (n=202) | 0.6458 | 0.7554 | 0.1919 | 0.3069 | 0.3089 | 0.4161 | |||||
Against (n=36) | 0.6667 | 0.8571 | 0.0556 | 0.1026 | 0.2791 | ||||||
Not applicable (n=913) | 0.8228 | 0.8408 | 0.9660 | 0.9682 | 0.8982 | 0.8991 |
aSVM: support vector machines.
bCNN_M+S: convolutional neural network using the combination of pretrained GloVe tweets embedding and measles tweets embedding.
cCNN_S: convolutional neural network using the pretrained GloVe tweets embedding from Stanford.
This study makes 2 primary contributions. First, we designed and implemented a comprehensive scheme for the public perception analysis of measles-related tweets, including
In classifying measles-related tweets in terms of
Although the CNN models can greatly increase the performance for most of the classes with few cases, for some minor classes with extremely low numbers of cases such as personal experience in
Future research could take a few directions. Additional hyperparameter tuning (ie, activation functions selection, pooling strategies) can also improve the performance on the disease-related tweets classification tasks. In addition, although the Bi-LSTM model doesn’t work well on our tasks (probably due to the limited training data size), other recurrent neural network-based frameworks such as attentive Bi-LSTM [
Timely understanding of public perceptions during the outbreak of an infectious disease such as measles will allow public health agencies to adapt their messages to address the needs, concerns, and emotions of the public. In order to understand the contents of Twitter text regarding measles and vaccination, we designed a classification scheme that contains
bidirectional long short-term memory
convolutional neural networks
k-nearest neighbors
support vector machines
Centers for Disease Control and Prevention
natural language processing
This research was partially supported by the National Library of Medicine of the National Institutes of Health under award number R01LM011829, the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under award number R01AI130460, and the UTHealth Innovation for Cancer Prevention Research Training Program Pre-Doctoral Fellowship (Cancer Prevention and Research Institute of Texas grant #RP160015). This study was also partially supported by a University of Alabama System’s Collaborative Grant.
None declared.