This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
Electronic cigarette (e-cigarette) use has increased in the United States, leading to active debate in the public health sphere regarding e-cigarette use and regulation. To better understand trends in e-cigarette attitudes and behaviors, public health and communication professionals can turn to the dialogue taking place on popular social media platforms such as Twitter.
The objective of this study was to conduct a content analysis to identify key conversation trends and patterns over time using historical Twitter data.
A 5-category content analysis was conducted on a random sample of tweets chosen from all publicly available tweets sent between May 1, 2013, and April 30, 2014, that matched strategic keywords related to e-cigarettes. Relevant tweets were isolated from the random sample of approximately 10,000 tweets and classified according to sentiment, user description, genre, and theme. Descriptive analyses including univariate and bivariate associations, as well as correlation analyses were performed on all categories in order to identify patterns and trends.
The analysis revealed an increase in e-cigarette–related tweets from May 2013 through April 2014, with tweets generally being positive; 71% of the sample tweets were classified as having a positive sentiment. The top two user categories were everyday people (65%) and individuals who are part of the e-cigarette community movement (16%). These two user groups were responsible for a majority of informational (79%) and news tweets (75%), compared to reputable news sources and foundations or organizations, which combined provided 5% of informational tweets and 12% of news tweets. Personal opinion (28%), marketing (21%), and first person e-cigarette use or intent (20%) were the three most common genres of tweets, which tended to have a positive sentiment. Marketing was the most common theme (26%), and policy and government was the second most common theme (20%), with 86% of these tweets coming from everyday people and the e-cigarette community movement combined, compared to 5% of policy and government tweets coming from government, reputable news sources, and foundations or organizations combined.
Everyday people and the e-cigarette community are dominant forces across several genres and themes, warranting continued monitoring to understand trends and their implications regarding public opinion, e-cigarette use, and smoking cessation. Analyzing social media trends is a meaningful way to inform public health practitioners of current sentiments regarding e-cigarettes, and this study contributes a replicable methodology.
In recent years, electronic cigarette (e-cigarette) use has gained momentum in the United States, with 36.5% of current smokers surveyed in 2013 reporting ever using e-cigarettes, compared to 9.8% in 2010 [
Given that e-cigarettes are still relatively new and the opinions toward them are often divergent, there is increasing dialogue surrounding e-cigarettes on social media. As we aim to understand the health effects of e-cigarettes, we must also attempt to discern the core voices, message frames, and sentiment surrounding e-cigarette discussions. Understanding these conversations allows public health and communication professionals to identify trends in attitudes and behaviors and to develop strategies to disseminate factual information and create culturally relevant cessation interventions for nicotine products, including traditional cigarettes and e-cigarettes.
Analysis of Twitter data has become an active research area, offering insight for the behavioral and social sciences and providing access to demographic groups that are often underrepresented in research, such as minorities. According to the Pew Research Center [
In the case of tobacco use and cessation, examination of social media data can continue to uncover trends in knowledge, attitudes, and behavior; identify marketing strategies; inform public health and public policy; and pave the way for interventions delivered via social media [
Given the depth of data, the breadth of its audience, and its ability to capture real-time trends, this study focuses exclusively on understanding snapshots of dialogue surrounding e-cigarettes captured on Twitter. The objective of this analysis is to conduct a content analysis to identify key conversation trends and patterns over time using historical Twitter data.
To conduct this analysis, we used strategic keywords to collect historical tweets potentially related to e-cigarettes from May 1, 2013, to May 1, 2014. Keywords were selected using an iterative process with incremental addition and subtraction of words. A preliminary search, using words like e-cigarette and vapor was conducted, followed by refinement to remove terms capturing tweets that were not relevant. Addition of words was heavily influenced by a list of previously published keywords [
Data were provided by Gnip, a company with full access to the Twitter Firehose (entire stream of Twitter data) supplying historical tweets not available through the Twitter application program interface (API). Search results garnered 3.7 million potentially relevant tweets. Gnip data utilized for the purposes of this study include time, date, user profile link, tweet content, and tweet link. To facilitate user-friendly evaluation of the tweets among 6 analysts, a database and Web form were developed that prepopulated each tweet along with the coding categories (see
Manual content analysis was used to categorize tweets according to a coding category list developed through previous literature and adapted for the purposes of this research [
Classification for sentiment, user description, genre, and theme in Stage 2 was conducted according to a codebook developed for the classification of tweets that builds on previous research [
Content categories for sentiment.
Category | Definition |
Positive | Tweets that are in favor of e-cigarettes, related products, and use |
Neutral | Tweets not strong in either direction for or against e-cigarettes |
Negative | Tweets with that are against e-cigarettes |
Content categories for user description.
Category | Definition |
Celebrity | Famous people in pop culture, people that are Internet famous, people that have accounts verified by Twitter |
Government | National Institutes of Health, CDC, political figures, etc |
Foundation/organization | Reputable organizations such as American Heart Association |
Reputable news source | New sources such as New York Times, Washington Post, Wall Street Journal, Associated Press, etc |
Everyday person | Twitter account with a reasonable amount of posts, followers, and following a reasonable amount of people; timelines span a variety of topics that are not primarily e-cigarette–related |
E-cigarette community movement | Groups or people whose timelines are primarily devoted to e-cigarette conversation (eg, Women Who Vape, The Vape Club, John Doe with entire timeline of e-cigarette tweets) |
Retailer | Outlets that sell e-cigarettes (online or physical) |
Tobacco company | Companies that manufacture e-cigarettes (eg, blu, Apollo, Njoy) |
Bot/hacked | Accounts that appear to be fake/computerized that are primarily promoting e-cigarette products (or other products); most accounts are disguised to appear as “everyday person” |
Content categories for genre.
Category | Definition |
News/update | Update about a current event from a reputable news source, or post from user about relevant news from news source |
Information | Factoid or resource, can be a personal blog or forum, or link to product review (posted by everyday person or e-cigarette community movement) |
First person e-cigarette use or intent | Reports personal use of, intent, or interest to use e-cigarettes |
Second/third person experience | Reports someone else’s use of e-cigarette |
Personal opinion | Personal opinion related to e-cigarettes |
Marketing | Activities involved in the transfer of goods from the producer or seller to the consumer or buyer, eg, sales of e-cigarette products or accessories, job announcements, review of products posted by e-cigarette company/retailer |
Content categories for theme.
Category | Definition |
Cessation | Mention of using e-cigarettes to quit smoking cigarettes or other non-e-cigarette tobacco products |
Health and safety | Direct or indirect reference to health consequences of e-cigarette use |
Underage usage | E-cig use by minors, especially high-school age or under |
Craving | Desire to use e-cigarettes; eg “Stressful day. Time for my #vapepen” |
Other substances | E-cigarettes mentioned in association with other addictive substances (eg, alcohol, caffeine) |
Illicit substance use in e-cigarettes | Mention of using e-cigarettes for anything other than nicotine (eg, marijuana) |
Policy or government | Mention of government or policy in relation to e-cigarettes including regulation, deeming, bans, and restrictions |
Parental use of e-cigarettes | Tweet mentioning use of e-cigarettes by parents of the poster or parents of a person mentioned in the tweet |
Advertisement/ promotion | Ads for e-cigarettes, giveaways, samples, sales, direct links to sellers’ websites, word-of-mouth, and reviews |
Flavors | Tweet discussing e-cigarette flavors (generic or mixed, including menthol) |
Sentiment and user description are mutually exclusive categories—meaning that only one choice could be made per category, while genre and theme are not—meaning that more than one choice could be made per category. All categories were mandatory with the exception of theme, given the granularity of the content and because every topic could not be realistically represented. Additionally, during Stage 2, analysts documented media links included in each tweet (eg, image, video, location, website).
After the content analysis was complete, descriptive statistical analyses were performed on the data sample, including one-way frequencies for each category; two-way cross tabulations for categories, temporal trends, and media type, in addition to the chi-square test for intercategory statistical association (using Fisher’s exact test for cell counts ˂5); and intercategory correlation analysis based on Cramer’s V coefficient (representing each category option as a binary variable). Both the chi-square tests and correlation analyses with Cramer’s V provide a statistically sound assessment of the significance and strength of the relationships between various categories. SAS version 9.3 was used for all analyses. The goal of the current analysis was to identify patterns and trends in the sample of tweets related to the overarching content categories: sentiment, user description, genre, and theme.
General trends are reported for the entire sample of coded tweets; only statistically significant trends are discussed for each category (
A total of 17,098 tweets were coded during Stage 1, of which 10,128 (59.23%) were found to be relevant and interpretable. The range of interrater reliability was .64-.70 and is reported in
Interrater reliability scores for manual annotation of tweet categories.
Category | Interrater reliability |
Relevancea | .70 |
Sentimentb | .65 |
User descriptionb | .66 |
Genre | .64 |
Theme | .65 |
aBinary version of this category was created in addition to multiclass version for the purposes of the analysis.
bCategories were mutually exclusive and thus analyzed as multiclass.
Between May 2013 and November 2013, each month contributed 4.29-6.53% of the tweets in the overall sample; however, there is a clear increase in the number of relevant e-cigarette tweets in December 2013. The number of tweets in December 2013 (n=1388) is more than twice the number of tweets that occurred in November 2013 (n=631; see
Almost half of the tweets (48.00%) included links that were functional at the time of the content analysis. Tweets with images accounted for 8.30% of the sample.
Frequency of e-cigarette tweets by month from May 2013 to April 2014.
As indicated by
Tweet distribution by sentiment (N=10,128).
Sentiment | N (%) |
Positive | 7202 (71.11) |
Neutral | 1699 (16.78) |
Negative | 1227 (12.11) |
Absolute number of tweets by sentiment and month from May 2013 to April 2014.
A majority of the sample consisted of tweets that originated from users identified by analysts as everyday people (64.99%), with the second largest population being the e-cigarette community (15.92%) (see
Tweet distribution by user description (N=10,128).
User description | N (%) |
Celebrity | 45 (0.44) |
Government | 8 (0.08) |
Foundations/organization | 122 (1.20) |
Reputable news source | 73 (0.72) |
Everyday person | 6582 (64.99) |
E-cigarette community movement | 1612 (15.92) |
Retailer | 787 (7.77) |
Tobacco company | 200 (1.97) |
Bot/hacked | 699 (6.90) |
Number of tweets by month and user from May 2013 to April 2014.
The three most common tweet genres were personal opinion-oriented tweets, marketing-related tweets, and personal experience-related tweets (see
Tweet distribution by genre (N=10,128).
Genre | N (%) |
News/update | 828 (8.18) |
Information | 1459 (14.41) |
First person e-cigarette use or intent | 2056 (20.30) |
Second/Third person experience | 797 (7.87) |
Personal opinion | 2850 (28.14) |
Marketing | 2142 (21.15) |
Tweet genre distribution by month (N=10,128).
Genre | N (%) | ||||||||||||
May | June | July | Aug. | Sept. | Oct. | Nov. | Dec. | Jan. | Feb. | Mar. | Apr. | Total | |
Personal experience | 102 (23.50) | 122 (22.90) | 109 (24.49) | 106 (23.35) | 131 (19.12) | 134 (23.63) | 152 (23.00) | 196 (14.12) | 230 (19.97) | 248 (24.27) | 270 (19.74) | 256 (18.04) | 2056 |
Marketing | 130 (29.95) | 140 (26.27) | 128 (28.76) | 115 (25.33) | 157 (22.92) | 130 (22.93) | 167 (25.26) | 239 (17.22) | 225 (19.53) | 174 (17.03) | 272 (19.88) | 262 (18.46) | 2139 |
Personal opinion | 95 (21.89) | 112 (21.01) | 97 (21.80) | 105 (23.13) | 191 (27.88) | 137 (24.16) | 176 (26.63) | 618 (44.52) | 363 (31.51) | 285 (27.89) | 349 (25.51) | 321 (22.62) | 2849 |
Second person | 34 (7.83) | 52 (9.76) | 34 (7.64) | 45 (9.91) | 50 (7.30) | 37 (6.53) | 48 (7.26) | 82 (5.91) | 96 (8.33) | 93 (9.10) | 112 (8.19) | 114 (8.00) | 797 |
Information | 65 (15.00) | 74 (13.88) | 51 (11.46) | 62 (13.66) | 99 (14.45) | 93 (16.40) | 82 (12.41) | 165 (11.89) | 150 (13.02) | 145 (14.19) | 221 (16.15) | 249 (17.55) | 1456 |
News | 8 (1.84) | 33 (6.19) | 26 (5.84) | 21 (4.63) | 57 (8.32) | 36 (6.35) | 35 (5.30) | 87 (6.27) | 88 (7.64) | 77 (7.53) | 143 (10.45) | 216 (15.22) | 827 |
Total | 434 | 533 | 445 | 454 | 685 | 567 | 661 | 1388 | 1152 | 1022 | 1368 | 1419 | 10,128 |
Tweet distribution by theme.a
Theme | N (%) |
Cessation | 638 (6.30) |
Health and safety | 1327 (13.10) |
Underage usage | 423 (4.18) |
Craving | 394 (3.89) |
Other substances | 116 (1.15) |
Illicit substance use in e-cigarettes | 160 (1.58) |
Policy/government | 2042 (20.16) |
Parental use of e-cigarettes | 74 (0.73) |
Advertisement/promotion | 2663 (26.29) |
Flavors | 451 (4.45) |
aIncludes tweets coded with multiple themes.
The bivariate associations reported are statistically significant (
Over 92.27% of tweets containing an image were positive in sentiment. Retailers accounted for 19.74% of tweets containing images and marketing-related tweets are twice as likely to contain an image (17.30%) compared to the average rate at which images occur in tweets (8.30%). E-cigarette community users produced 23.47% of tweets containing a link to a website. Marketing, news, and information-related tweets have much higher rates of website links (60.12%, 70.40%, and 88.51%) than the overall average (35.43%).
Nearly half (49.60%) of information tweets originated from everyday people and 29.28% were from e-cigarettes community movements. Everyday people represented 62.27% of news tweets as compared to reputable news sources accounted for 6.41% of news and 0.96% of information tweets. Foundations/organizations provided 3.98% of information tweets and 5.20% of news tweets. Also, 32.40% of marketing tweets came from everyday people compared to 26.18% from retailers and 6.40% from tobacco companies. User-related trends in tweet genre are illustrated in
Distribution of tweet content genre for each user category.
As an additional measure of correlation, Cramer’s V statistic was computed for all categories after representing each category option as a binary variable.
This content analysis revealed noteworthy trends about e-cigarette–related tweets from May 2013 through April 2014. The number of these tweets rose during the data collection period, with a peak in December 2013. Tweets were overwhelmingly positive and frequently posted by everyday people and e-cigarette community movement accounts.
The increase in e-cigarette-related tweets coincides with several e-cigarette milestones, ranging from government proposals for policies and regulations [
From May 2013 through April 2014, everyday people dominated the e-cigarette conversation on Twitter by accounting for over two-thirds of the tweets in the dataset. The e-cigarette community movement represented the second most common user type. As expected, everyday people accounted for the majority of personal opinion and personal experience tweets, though e-cigarette community movement accounts represented a sizeable proportion of personal opinion tweets as well. These two user groups accounted for 80% of information tweets, with minimal information coming from government, public health non-governmental organizations, and reputable news sources. Future research may look into the legitimacy of the information shared, its origins, and how it is shared across Twitter. This will help us understand the degree to which e-cigarette information spreads and how that might impact beliefs and opinions surrounding e-cigarettes. Everyday people also tweeted nearly a third of the marketing-related tweets, which is equivalent to the percentage of marketing tweets from retailers and tobacco companies combined. It is important to note that the volume of tweets from a particular user group does not reflect the reach or number of impressions their tweets made as this analysis did not take into account the number of followers a Twitter account has, nor the number of retweets or favorites a tweet received. Although this is a limitation of the current study, it presents the opportunity for future research to determine which Twitter voices are the “loudest” in the sense that their tweets are being seen and shared most often, and how these visible tweets influence perceptions and use of e-cigarettes.
E-cigarette community tweets spiked in December 2013, which represents a four-fold increase in tweet volume from the prior month. The cause of the sharp rise in e-cigarette community movement tweets remains unknown, but there were several e-cigarette milestones during this time. For example, in December 2013 Phillip Morris International Inc. announced its partnership with Altria Group Inc. to sell e-cigarettes [
Tweets originating from reputable news sources and government agencies comprised less than 1% of the sample. There continues to be debate regarding how to regulate e-cigarettes within the United States and many countries. Our analysis from Twitter suggests that the uncertainty expressed within the field of public health is not reflected in the nature of the ongoing social media dialogues. In the absence of informative dialogue from public health authorities, personal opinion and marketing content surrounding e-cigarettes have become the most common themes. This sample shows a decisive dip in tweets originating from accounts that were clearly marketers of e-cigarette products, but a large amount of marketing content continues to be posted by individuals and e-cigarette communities.
In addition to understanding who is talking on Twitter, it is necessary to dissect what is being said. Most tweets were determined to have positive sentiment indicating that Twitter dialogue skews favorably toward e-cigarettes, although the proportion of positive tweets declined during the analysis period. This trend warrants further monitoring with specific consideration for what fuels opinion over time. Furthermore, this research establishes an e-cigarette sentiment baseline that serves as a valuable starting point for public health professionals to develop campaigns and interventions. A majority of marketing-related tweets had positive sentiment, while approximately one-fifth of news-related tweets were positive. The most prevalent genre uncovered was personal opinion, followed closely by marketing to comprise nearly half of the sample. It can be expected that a platform like Twitter is conducive to sharing personal opinion; however, 32% of marketing tweets came from everyday people, while 32% came from retailers and tobacco companies combined.
As with any research study, our study has limitations. It must be noted that our analysis is quite specific to the topic of e-cigarettes, and thus our keyword list was limited to terms directly related to e-cigarettes. It is possible that we have overlooked conversations around topics that are socially similar to, but not exactly the same as e-cigarettes such as e-hookah. In addition, the vocabulary surrounding e-cigarettes is continuously growing and changing. This is due to the expanding range of products, brands, and vaping-related activities that people engage in. As a result, the list of keywords used in this study would need to be reconsidered and almost certainly expanded to accommodate the changing e-cigarette and vaping terminology. Furthermore, calculation of precision and recall for the search would have provided a better understanding of the validity of terms retrieved by our search. However, we are confident that our methodology is replicable, with appropriate resources and thus would allow for expansion in order to explore other emerging trends. We recommend calculation of precision and recall to refine the search and report validity using a systematic quantitative method such as that described by Stryker et al [
Additionally, our exclusion methodology for relevance, which included eliminating retweets without additional information and duplicate tweets from suspended accounts, may have led to an underestimation of the true prevalence of these types of tweets. However, we believe that even if our study provides a conservative estimate of the information available on Twitter in relation to e-cigarettes, the information remains useful to gain an understanding of the general trends. Future studies may be interested in utilizing less restrictive relevance criteria and using methods such as social network analysis to determine the structure of the network and how this relates to dissemination of e-cigarette attitudes and perspectives.
Twitter users do not represent the general population, and thus findings from this study must be considered in the context of people who use this specific social media platform [
Additionally, in our analysis, we did not apply an explicit weighting or correction methodology to adjust for changes in tweet volumes over time because any approach to making such an adjustment would potentially bias results. Given that most of our descriptive metrics compare fractions of tweets of a specific classification across points in time rather than absolute numbers of tweets, we believe that the comparative picture presented of the e-cig–related tweet landscape as it evolves over time is valid.
Despite a few limitations, there were many strengths of this analysis. Our study accessed data from the Twitter Firehose (ie, access to all of the daily tweets on Twitter) and utilized a large sample of tweets. Moreover, analyses were carried out over a critical period of time in the e-cigarette landscape and expanded on previously established methodologies for thematic analysis of Twitter.
An additional key strength of this work is the significant amount of time and effort spent manually building a dataset that is sampled from the Twitter Firehose (rather than the free API). The dataset of 10,128 manually coded tweets for this study is a much larger sample than previous work on Twitter and emerging tobacco products [
For future research, the data from this content analysis can be used as a training dataset to build supervised machine learning algorithms. These algorithms can be used to implement automated surveillance of e-cigarette-related conversations on Twitter. This would allow more data to be analyzed with less manpower and also allow observation and analysis of Twitter trends for e-cigarette conversations over a greater period of time. This form of infoveillance lends itself to several aspects of tobacco control, including marketing regulations, underage use, cessation, and health outcomes.
Continuing snapshots of the social media landscape around e-cigarettes may help policymakers and public health professionals assess changing trends and inform interventions for tobacco cessation. Identifying means to integrate these types of assessments and analyses into data collected by traditional epidemiology and surveillance methods may prove especially valuable [
Tweet filter keywords.
Data collection Web form.
Definitions of annotation categories.
Sample tweets by annotation category.
Correlation matrix for content categories.
Application Program Interface
Centers for Disease Control and Prevention
Food and Drug Administration
Health Information National Trends Survey
Statistical Analysis System
This work was funded by the National Cancer Institute’s Tobacco Control Research Branch—National Institutes of Health, National Cancer Institute HHSN261200900022C, Subcontract Number D6-ICF-1. The authors would like to thank Shinett Boggan, Alex Feith-Tiongson, Samantha Letizia, Thomas Madden, Delsie Sequiera, Nick Ngugi, and Emily Grenen for their contribution to the study.
None declared.