Ethical Issues in Using Twitter for Public Health Surveillance and Research: Developing a Taxonomy of Ethical Concepts From the Research Literature

doi:10.2196/jmir.3617

Short Paper

Mike Conway, PhD

University of California San Diego, Department of Family and Preventive Medicine, La Jolla, CA, United States

Corresponding Author:

Mike Conway, PhD

University of California San Diego

Department of Family and Preventive Medicine

9500 Gilman Drive # 0905

La Jolla, CA, 92093

United States

Phone: 1 858 8224478

Fax:1 858 3001099

Email: mike.conway@utah.edu

Background: The rise of social media and microblogging platforms in recent years, in conjunction with the development of techniques for the processing and analysis of “big data”, has provided significant opportunities for public health surveillance using user-generated content. However, relatively little attention has been focused on developing ethically appropriate approaches to working with these new data sources.

Objective: Based on a review of the literature, this study seeks to develop a taxonomy of public health surveillance-related ethical concepts that emerge when using Twitter data, with a view to: (1) explicitly identifying a set of potential ethical issues and concerns that may arise when researchers work with Twitter data, and (2) providing a starting point for the formation of a set of best practices for public health surveillance through the development of an empirically derived taxonomy of ethical concepts.

Methods: We searched Medline, Compendex, PsycINFO, and the Philosopher’s Index using a set of keywords selected to identify Twitter-related research papers that reference ethical concepts. Our initial set of queries identified 342 references across the four bibliographic databases. We screened titles and abstracts of these references using our inclusion/exclusion criteria, eliminating duplicates and unavailable papers, until 49 references remained. We then read the full text of these 49 articles and discarded 36, resulting in a final inclusion set of 13 articles. Ethical concepts were then identified in each of these 13 articles. Finally, based on a close reading of the text, a taxonomy of ethical concepts was constructed based on ethical concepts discovered in the papers.

Results: From these 13 articles, we iteratively generated a taxonomy of ethical concepts consisting of 10 top level categories: privacy, informed consent, ethical theory, institutional review board (IRB)/regulation, traditional research vs Twitter research, geographical information, researcher lurking, economic value of personal information, medical exceptionalism, and benefit of identifying socially harmful medical conditions.

Conclusions: In summary, based on a review of the literature, we present a provisional taxonomy of public health surveillance-related ethical concepts that emerge when using Twitter data.

J Med Internet Res 2014;16(12):e290

doi:10.2196/jmir.3617

Keywords

social media; twitter messaging; ethics

Since its inception in 2006, the microblog platform Twitter has become a key resource for understanding—and sometimes predicting—mass behavior, particularly in the area of marketing [1] and politics [2]. More recently, the public health community has recognized Twitter’s potential for public health surveillance [3,4] with applications including monitoring the prevalence of infectious diseases in the community [5,6], identifying early-stage disease outbreaks [7], detecting disease outbreaks in mass gatherings [8], and recognizing and understanding health behaviors, like temporal variability in problem drinking [9], and attitudes toward emerging tobacco products such as electronic cigarettes and hookah [10]. Despite the clear utility of using Twitter to augment current public health surveillance, there remains doubt among regulatory authorities, ethics committees, and individual researchers regarding ethically appropriate conduct in this kind of large-scale research, where a single researcher can automatically process hundreds of millions of public tweets. Adding to this difficulty is the fact that many Twitter researchers are based in university computer science and engineering departments, environments that often have not shared as long a tradition of ethical and regulatory oversight as health-related fields.

While there has been significant research effort in developing ethical guidelines for conducting Internet discussion forum-based research generally [11] and for developing ethical guidelines on appropriate use of social media for clinicians [12], there is little current work addressing ethical problems in large-scale automatic Twitter-based public health research. In this paper, we attempt to address this problem by systematically reviewing ethical content in Twitter-based public health surveillance papers with a view to: (1) explicitly identifying a set of potential ethical issues and concerns that may arise when researchers work with Twitter data, and (2) providing a starting point for the formation of a set of best practices for public health surveillance through the development of a taxonomy of ethical concepts derived from the research literature.

In this review, we are focused on exploring ethical issues that have been identified in published Twitter-based public health surveillance research papers. Relevant research is dispersed across several broad research areas, including biomedicine (Medline), computer science and engineering (Compendex), philosophy (Philosopher’s Index), and psychology (PsycINFO). As we are primarily interested in ethical and regulatory issues and how they relate to public health surveillance, with the aid of a biomedical librarian, we designed a complex set of queries for each indexing service to identify those papers that included ethics-related terms in their titles or abstracts, such as “IRB” (institutional review board), “ethics”, and “privacy”. See Figure 1 for a complete list of keywords.

As the focus of this review is on ethical issues in large-scale automatic Twitter-based research for public health surveillance, we excluded work centered on non-microblog social media platforms (eg, Facebook). We also excluded work on policy and clinician/student professionalism (eg, proposed guidelines for governing clinician interaction with patients via Twitter), and research focused on non-health related topics (eg, marketing) with the exception of those articles concentrating on automatically identifying personality variables from Twitter feeds.

After searching the four databases with queries shown in Figure 1, we screened articles by titles and abstracts, discarding papers that were not available on an open-access basis or via the University of California library system. We began identifying ethical concepts by carefully reading two papers that, through our initial review, we identified as being especially rich in ethical content [13,14]. From these two initial papers, we highlighted sections of the text discussing ethical content and iteratively constructed an initial ethical taxonomy. We then carefully reviewed the remaining 11 papers, adding to and refining the taxonomy. Our methodology was inspired by, but is not identical to, that used by Strech et al [15] who used a rigorous grounded theory methodology to comprehensively investigate ethical issues in the dementia literature. Our aim in this short paper is limited to producing an outline of the ethical issues identified in the Twitter-based public health surveillance research literature.

Figure 1. Literature search queries for PubMed, Compendex, Philosopher's Index, and PsycINFO.

Overview

Our initial set of queries identified 342 references across the four databases (see Figure 2). After title and abstract screening, 49 references remained. After further full-text screening of these 49 references, 13 remained. Five of the papers were from biomedical journals [13,16-19] and six were from computer science and engineering conference proceedings [20-25]. One paper appeared in a journal dedicated to the social and cultural impact of technology [14]. Finally, one paper was published in the proceedings of a collaborative technology conference [26]. All articles were peer reviewed and written in English.

Figure 2. Inclusion/exclusion flowchart.

Taxonomy

Our iterative, taxonomy construction process identified 10 broad ethical categories (eg, privacy, IRB/regulation). Six of these categories included several subcategories (eg, privacy has subcategories including the concept of privacy, IRB/regulation has subcategories including data protection legislation). The most prevalent and complex ethical category we found was privacy with 16 subcategories covering such important ethical concepts as unintended revelation of personal information and population level monitoring vs individual diagnosis (Textbox 1). Multimedia Appendix 1 shows example sentences and paragraphs for each ethical concept in the taxonomy.

Ethical categories identified during the iterative, taxonomy construction process.

Privacya. Concept of privacyi. Public vs private - doubts about the location of Twitter data on the public/private spectrum [13,14,18,21,24]ii. Fluidity in the concept of privacy - rapidly changing concept of privacy [13,14,18,21,24]iii. Generational differences in the concept of privacy [13,18]iv. Panopticon effect - risk that public health monitoring will change user behavior on Twitter [24]b. Confidentialityi. Data linkage - risk of privacy loss due to linking data from different sources [13,21]ii. Confidentiality - appropriate storage of Twitter data by researchers [14]iii. Right to/desire for anonymity - research using Twitter challenges a participant’s right to (and desire for) anonymity [13,14,16]c. Stigmatized medical conditions - concerns about protecting the privacy of those with stigmatized medical conditions (eg, epilepsy, depression) [13,20]d. Twitter’s privacy policy - implications of Twitter’s privacy policy and how it is understood by users [14,18,19,21,22]e. Twitter is publicly accessible by default - emphasizes that Twitter is a broadcast medium. Unless a user changes privacy settings, tweets are public [14,19,21,22,24]f. Reliability of user provided personal details - reliability of information derived from Twitter when some users use false or whimsical personal details to maintain anonymity [19,21,24]g. Interpreting decontextualized Twitter data as fully representative of users who are in fact multifaceted - possibility that the user might be experimenting with self-presentation or exhibit a belief or behavior in their historical tweets that they no longer adhere to (eg, illegal drug use) [13,24]h. Unintended revelation of personal information - potential for a user to unintentionally provide insights into their mental health, health behaviors, etc, through information garnered from their tweets [20,22-24]i. Personal responsibility of Twitter users - emphasizes the responsibility users have for their posts [22,24]j. Twitter users have no expectation of privacy - emphasizes the researchers’ belief that Twitter users have no reasonable expectation of privacy [19,21]k. Identifying users’ mental health status or personality traits to:i. identify those in need of treatment [22,25]ii. job placement [20,22-24]iii. targeted marketing [21,23]iv. system interface design (eg, introverts prefer data presented in a certain way) [21,23]v. law enforcement (eg, identifying psychopaths) [22]l. Population level monitoring vs individual diagnosis - difference between using Twitter to identify broad, population level changes and diagnosing individuals [20]m. Potential for discrimination based on health status as garnered from social media [20]n. Danger of inaccurately labeling a user as suffering from a particular health problem [20,22]o. Traceability of Twitter data - risk that tweets can be traced back to the original tweeter if reproduced verbatim in research work, threatening anonymity [13,17]p. Intended audience for tweets - some Twitter users use Twitter as a communication tool for a small group of family and friends and do not expect their tweets to be widely read (ie, hidden in plain sight). Other Twitter users aim to broadcast to the world and gain the maximum number of followers [14,21,24]
Informed Consenta. Twitter users are oblivious or unwilling research participants [13]b. Informed consent is difficult (or impossible) to gain (or not required) for large-scale Twitter work [13,14,24]
Ethical Theorya. Difficulties in applying current ethical theories to mass Twitter research [13]b. Ethical theories:i. Deontology [13,26]ii. Utilitarianism [13,26]iii. Feminism [13]iv. Communitarianism [13]v. Application of the “golden rule” [13]vi. Agile/situational ethics [14]vii. Rawls’ theory of justice [26]
IRB/Regulationa. Citizens’ rights to communicate and share information [24,26]b. Researcher belief that regulatory oversight is not required when using Twitter data [19]c. Discussion of IRB/ethics committees, generally [14,18]d. Data protection legislation [14,18]e. Professional codes of conduct [14]f. Need for regulatory control, generally [18,20]g. Privacy regulation by country [14,24]
Traditional research vs Twitter researcha. Apomediation - shifting from hierarchical models of research to a situation where the researcher is a potential participant [18]b. Scale of Twitter-based research - research norms that were developed for small-scale research do not scale to millions of Twitter users [14]c. Greater distance between researcher and participants - mass Twitter-based research increases the distance between researchers and participants [14]d. Ambiguous status of participants - the status of participants is more ambiguous than in traditional research (ie, are they consumers, participants, patients, service users, journalists, etc) [14]e. Increase in researcher power - in mass Twitter research, a single researcher has access to millions of Twitter users, hence increasing researcher power [14]
Geographical Informationa. Tracking physical location - potential loss of privacy in tracking Twitter users’ physical locations [13,14,19,21]b. Appropriate geographical granularity – potential loss of privacy in reporting a Twitter user’s precise location, compared to their general location. For example, reporting that a Twitter user is somewhere in Los Angeles is very different to reporting their precise location in Los Angeles [14]
Researcher Lurking [13,14]
Economic Value of Personal Information [14]
Medical Exceptionalism - health-related matters are qualitatively different from other, non-medical areas and require special attention (and perhaps regulation) [25]
Benefit of Identifying Socially Harmful Medical Conditions [22]

Textbox 1. Ethical categories identified during the iterative, taxonomy construction process.

Normative Rules

In several of the papers under review, explicit normative rules were presented (or suggested) for conducting and reporting mass Twitter public health surveillance work (Textbox 2). Note that these rules are discussed but not necessarily endorsed.

Explicit normative rules for mass Twitter public health surveillance work.

When reporting research, avoid quoting directly from users’ Twitter streams. Paraphrases should be used [13].
Informed consent should be gained from participants [13].
Metadata (usernames, location data, etc) should not be disclosed [13].
Twitter-based work is human subjects research and requires that some form of appropriate IRB/ethical review take place [14].
Data collection should be logged and justified [14].
There should be parity between the researcher and participants (ie, the researcher’s tweets and their associated locations, if appropriate, should be public) [14].
Employment-related profiling for mental health conditions should only be performed in exceptional circumstances (eg, security critical roles) [24].
Consent should be gained from potential employees before employment-related profiling for mental health conditions is performed [24].

Textbox 2. Explicit normative rules for mass Twitter public health surveillance work.

The main output of this research is a taxonomy of ethical concepts derived from close reading of the literature. The taxonomy will be used to help frame future interview-based qualitative research focused on Twitter users’ attitudes to the use of microblog data for public health surveillance and, in due course, inform the generation of a set of ethical guidelines for using Twitter for public health surveillance and research. We found that ethical theory was rarely mentioned in the reviewed papers and, when it was discussed, that discussion was typically brief. Only two papers [13,26] explicitly discuss the application of traditional ethical theories (eg, deontology, utilitarianism) to mass Twitter-based public health surveillance. As expected, the bulk of the ethical concepts we discovered were concerned with privacy [13,14,16-25], including frequent references to the fluid and changing nature of the concept of privacy [13,14,18,21,24], and more concretely, to Twitter’s privacy policies [14,18,19,21,22]. Discussion of IRBs and regulation (or the lack thereof) was also widespread in the literature [14,18-20,24,26]. Some topics were raised by a single research paper, for example, the idea that the ability to automatically process millions of tweets increases researcher power compared to traditional research methodologies [14], and the idea that the benefits of using Twitter for public health purposes are so great that they mitigate any ethical doubts that would apply to other, non-health-related uses of Twitter data (eg, for commercial gain) [25].

Although inspired by the ethics-oriented qualitative literature review methodology proposed by Strech [15], the approach taken in this review is substantially different, in particular the use of a single reviewer (author MC) rather than a group of reviewers, and the use of close reading in place of a theoretically grounded qualitative methodology. A further characteristic of this review is that our search strategy was confined to papers indexed in PubMed, Compendex, PsycINFO, and the Philosopher’s Index. Papers that were not available via the University of California San Diego library system or on an open-access basis were excluded. It is likely that we “missed” relevant papers in business disciplines or in those computer science and engineering conferences and journals not indexed by Compendex. However, our purpose in this review was the identification of a broad taxonomy of ethical concepts relevant to Twitter-based public health research using a systematic, reproducible methodology and thus comprehensiveness, while desirable, is not a necessity.

In conclusion, this short paper provides a taxonomy of ethical concepts derived from the research literature. Future work will involve interview-based qualitative research exploring Twitter users’ attitudes toward the mining of their data for public health purposes, and ultimately the formation of best practice guidelines for public health surveillance using Twitter data.

Acknowledgments

We would like to thank Dr Dan O’Connor (Head of Medical Humanities at the Wellcome Trust) and Dr Samantha Hurst in the Department of Family and Preventive Medicine at the University of California San Diego for advice and guidance in the design and execution of this research. This work was supported by a grant from the National Library of Medicine K99LM011393.

Conflicts of Interest

None declared.

‎

Multimedia Appendix 1

Taxonomy.

PDF File (Adobe PDF File), 99KB

Bulearca M, Bulearca S. Twitter: a viable marketing tool for SMEs. Global Business and Management Research: An International Journal 2012;2(4):296-309 [FREE Full text]
Tumasjan A, Sprenger T, Sandner P, Welpe I. Predicting elections with Twitter: what 140 characters reveal about political sentiment. In: Proceedings of the Fourth International Conference on Weblogs and Social Media. 2010 Presented at: Fourth International Conference on Weblogs and Social Media; 2010; Washington, DC p. 178-185.
Eysenbach G. Infodemiology and infoveillance: framework for an emerging set of public health informatics methods to analyze search, communication and publication behavior on the Internet. J Med Internet Res 2009;11(1):e11 [FREE Full text] [CrossRef] [Medline]
Stoové MA, Pedrana AE. Making the most of a brave new world: opportunities and considerations for using Twitter as a public health monitoring tool. Prev Med 2014 Jun;63:109-111. [CrossRef] [Medline]
Collier N, Son NT, Nguyen NM. OMG U got flu? Analysis of shared health messages for bio-surveillance. J Biomed Semantics 2011;2 Suppl 5:S9 [FREE Full text] [CrossRef] [Medline]
Chunara R, Andrews JR, Brownstein JS. Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak. Am J Trop Med Hyg 2012 Jan;86(1):39-45 [FREE Full text] [CrossRef] [Medline]
Collier N, Doan S. Syndromic classification of Twitter messages. In: Szomszor M, Kostkova P, editors. Electronic Healthcare. Berlin: Springer; 2012:186-195.
Yom-Tov E, Borsa D, Cox IJ, McKendry RA. Detecting disease outbreaks in mass gatherings using Internet data. J Med Internet Res 2014;16(6):e154 [FREE Full text] [CrossRef] [Medline]
West JH, Hall P, Hanson C, Prier K, Giraud-Carrier C, Neeley E, et al. Temporal variability of problem drinking on Twitter. OJPM 2012;02(01):43-48. [CrossRef]
Myslín M, Zhu SH, Chapman W, Conway M. Using Twitter to examine smoking behavior and perceptions of emerging tobacco products. J Med Internet Res 2013;15(8):e174 [FREE Full text] [CrossRef] [Medline]
Eysenbach G, Till JE. Ethical issues in qualitative research on internet communities. BMJ 2001 Nov 10;323(7321):1103-1105 [FREE Full text] [Medline]
Gholami-Kordkheili F, Wild V, Strech D. The impact of social media on medical professionalism: a systematic qualitative review of challenges and opportunities. J Med Internet Res 2013;15(8):e184 [FREE Full text] [CrossRef] [Medline]
McKee R. Ethical issues in using social media for health and health care research. Health Policy 2013 May;110(2-3):298-301. [CrossRef] [Medline]
Neuhaus F, Webmoor T. Agile ethics for massified research and visualization. Information, Communication & Society 2012 Feb;15(1):43-65. [CrossRef]
Strech D, Mertz M, Knüppel H, Neitzke G, Schmidhuber M. The full spectrum of ethical issues in dementia care: systematic qualitative review. Br J Psychiatry 2013 Jun;202:400-406 [FREE Full text] [CrossRef] [Medline]
Sugawara Y, Narimatsu H, Hozawa A, Shao L, Otani K, Fukao A. Cancer patients on Twitter: a novel patient community on social media. BMC Res Notes 2012;5:699 [FREE Full text] [CrossRef] [Medline]
Heaivilin N, Gerbert B, Page JE, Gibbs JL. Public health surveillance of dental pain via Twitter. J Dent Res 2011 Sep;90(9):1047-1051 [FREE Full text] [CrossRef] [Medline]
O'Connor D. The apomediated world: regulating research when social media has changed research. J Law Med Ethics 2013;41(2):470-483. [CrossRef] [Medline]
Burton SH, Tanner KW, Giraud-Carrier CG, West JH, Barnes MD. "Right time, right place" health communication on Twitter: value and accuracy of location information. J Med Internet Res 2012;14(6):e156 [FREE Full text] [CrossRef] [Medline]
Sumner C, Byers A, Boochever R, Park GJ. Predicting dark triad personality traits from Twitter usage and a linguistic analysis of tweets. In: Proceedings of the 11th IEEE International Conference on Machine Learning and Applications. 2012 Presented at: 11th IEEE International Conference on Machine Learning and Applications; 2012; Florida. [CrossRef]
Quercia D, Kosinski M, Stillwell D, Crowcroft J. Our Twitter profiles, our selves: Predicting personality with Twitter. In: Proceedings of the IEEE International Conference on Privacy, Security, Risk and Trust. 2011 Presented at: IEEE International Conference on Privacy, Security, Risk and Trust (PASSAT); 2011; Massachusetts p. 180-185.
Wald R, Khoshgoftaar TM, Napolitano A, Sumner C. Using Twitter content to predict psychopathy. In: Proceedings of the 11th IEEE International Conference on Machine Learning and Applications. 2012 Presented at: 11th IEEE International Conference on Machine Learning and Applications; 2012; Florida p. 394-401. [CrossRef]
Golbeck J, Robles C, Edmondson M, Turner K. Predicting personality from Twitter. In: Proceedings of the IEEE International Conference on Privacy, Security, Risk and Trust (PASSAT). 2011 Presented at: IEEE International Conference on Privacy, Security, Risk and Trust (PASSAT); 2011; Massachusetts p. 149-156. [CrossRef]
Kandias M, Galbogini K, Mitrou L, Gritzalis D. Insiders trapped in the mirror reveal themselves in social media. In: Network and System Security: Proceedings of the 7th International Conference on Network and System Security. Berlin: Springer; 2013 Presented at: 7th International Conference on Network and System Security; 2013; Madrid p. 220-235. [CrossRef]
De Choudhury M, Counts S, Horvitz E. Predicting postpartum changes in emotion and behavior via social media. In: Proceedings of the 31st Annual CHI Conference on Human Factors in Computing Systems. 2013 Presented at: 31st Annual CHI Conference on Human Factors in Computing Systems; 2012; France p. 3267-3276. [CrossRef]
Heverin T. Ethical concerns of Twitter use for collective crisis response. In: Proceedings of the 12th International Conference on Collaboration Technologies and Systems. 2011 Presented at: 12th International Conference on Collaboration Technologies and Systems; 2011; Colorado p. 625-626.

‎

IRB: institutional review board

Edited by G Eysenbach; submitted 18.06.14; peer-reviewed by E Vayena, HP Lee; comments to author 19.08.14; accepted 28.10.14; published 22.12.14

This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.

This paper is in the following e-collection/theme issue:

Ethical Issues in Using Twitter for Public Health Surveillance and Research: Developing a Taxonomy of Ethical Concepts From the Research Literature