Published on in Vol 18, No 2 (2016): February

Garbage in, Garbage Out: Data Collection, Quality Assessment and Reporting Standards for Social Media Data Use in Health Research, Infodemiology and Digital Disease Detection

Garbage in, Garbage Out: Data Collection, Quality Assessment and Reporting Standards for Social Media Data Use in Health Research, Infodemiology and Digital Disease Detection

Garbage in, Garbage Out: Data Collection, Quality Assessment and Reporting Standards for Social Media Data Use in Health Research, Infodemiology and Digital Disease Detection

Authors of this article:

Yoonsang Kim1 Author Orcid Image ;   Jidong Huang1 Author Orcid Image ;   Sherry Emery1 Author Orcid Image


  1. Kostygina G, Tran H, Shi Y, Kim Y, Emery S. ‘Sweeter Than a Swisher’: amount and themes of little cigar and cigarillo content on Twitter. Tobacco Control 2016;25(Suppl 1):i75 View
  2. Zhou L, Zhang D, Yang C, Wang Y. Harnessing social media for health information management. Electronic Commerce Research and Applications 2018;27:139 View
  3. Allem J, Ferrara E. The Importance of Debiasing Social Media Data to Better Understand E-Cigarette-Related Attitudes and Behaviors. Journal of Medical Internet Research 2016;18(8):e219 View
  4. Lienemann B, Unger J, Cruz T, Chu K. Methods for Coding Tobacco-Related Twitter Data: A Systematic Review. Journal of Medical Internet Research 2017;19(3):e91 View
  5. Ramos C, Casado-Molina A, Ignácio-Peláez J. An Innovative Management Perspective for Organizations through a Reputation Intelligence Management Model. International Journal of Information Systems in the Service Sector 2019;11(4):1 View
  6. Paul M, Dredze M. Social Monitoring for Public Health. Synthesis Lectures on Information Concepts, Retrieval, and Services 2017;9(5):1 View
  7. Kim Y, Nordgren R, Emery S. The Story of Goldilocks and Three Twitter’s APIs: A Pilot Study on Twitter Data Sources and Disclosure. International Journal of Environmental Research and Public Health 2020;17(3):864 View
  8. Jeong S, Kuk S, Kim H. A Smartphone Magnetometer-Based Diagnostic Test for Automatic Contact Tracing in Infectious Disease Epidemics. IEEE Access 2019;7:20734 View
  9. Dyda A, Shah Z, Surian D, Martin P, Coiera E, Dey A, Leask J, Dunn A. HPV vaccine coverage in Australia and associations with HPV vaccine information exposure among Australian Twitter users. Human Vaccines & Immunotherapeutics 2019;15(7-8):1488 View
  10. Aramburu M, Berlanga R, Lanza I. Social Media Multidimensional Analysis for Intelligent Health Surveillance. International Journal of Environmental Research and Public Health 2020;17(7):2289 View
  11. Kedzior S, Bianco-Miotto T, Breen J, Diener K, Donnelley M, Dunning K, Penno M, Schjenken J, Sharkey D, Hodyl N, Fullston T, Gardiner M, Brown H, Rumbold A. It takes a community to conceive: an analysis of the scope, nature and accuracy of online sources of health information for couples trying to conceive. Reproductive Biomedicine & Society Online 2019;9:48 View
  12. Mirtalaie M, Hussain O. Sentiment aggregation of targeted features by capturing their dependencies: Making sense from customer reviews. International Journal of Information Management 2020;53:102097 View
  13. Zinsstag J, Crump L, Schelling E, Hattendorf J, Maidane Y, Ali K, Muhummed A, Umer A, Aliyi F, Nooh F, Abdikadir M, Ali S, Hartinger S, Mäusezahl D, de White M, Cordon-Rosales C, Castillo D, McCracken J, Abakar F, Cercamondi C, Emmenegger S, Maier E, Karanja S, Bolon I, de Castañeda R, Bonfoh B, Tschopp R, Probst-Hensch N, Cissé G. Climate change and One Health. FEMS Microbiology Letters 2018;365(11) View
  14. Albalawi Y, Nikolov N, Buckley J. Trustworthy Health-Related Tweets on Social Media in Saudi Arabia: Tweet Metadata Analysis. Journal of Medical Internet Research 2019;21(10):e14731 View
  15. Adams N, Artigiani E, Wish E. Choosing Your Platform for Social Media Drug Research and Improving Your Keyword Filter List. Journal of Drug Issues 2019;49(3):477 View
  16. Bunyan A, Venuturupalli S, Reuter K. Expressed Symptoms and Attitudes Toward Using Twitter for Health Care Engagement Among Patients With Lupus on Social Media: Protocol for a Mixed Methods Study. JMIR Research Protocols 2021;10(5):e15716 View
  17. Kim Y, Huang J, Emery S. The Research Topic Defines “Noise” in Social Media Data – a Response from the Authors. Journal of Medical Internet Research 2017;19(6):e165 View
  18. Cho H, Silver N, Na K, Adams D, Luong K, Song C. Visual Cancer Communication on Social Media: An Examination of Content and Effects of #Melanomasucks. Journal of Medical Internet Research 2018;20(9):e10501 View
  19. O'Connor K, Sarker A, Perrone J, Gonzalez Hernandez G. Promoting Reproducible Research for Characterizing Nonmedical Use of Medications Through Data Annotation: Description of a Twitter Corpus and Guidelines. Journal of Medical Internet Research 2020;22(2):e15861 View
  20. Stens O, Weisman M, Simard J, Reuter K. Insights From Twitter Conversations on Lupus and Reproductive Health: Protocol for a Content Analysis. JMIR Research Protocols 2020;9(8):e15623 View
  21. Guiñazú M, Cortés V, Ibáñez C, Velásquez J. Employing online social networks in precision-medicine approach using information fusion predictive model to improve substance use surveillance: A lesson from Twitter and marijuana consumption. Information Fusion 2020;55:150 View
  22. Daniulaityte R, Chen L, Lamy F, Carlson R, Thirunarayan K, Sheth A. “When ‘Bad’ is ‘Good’”: Identifying Personal Communication and Sentiment in Drug-Related Tweets. JMIR Public Health and Surveillance 2016;2(2):e162 View
  23. Chang Y, Chiang W, Wang W, Lin C, Hung L, Tsai Y, Suen J, Chen Y. Google Trends-based non-English language query data and epidemic diseases: a cross-sectional study of the popular search behaviour in Taiwan. BMJ Open 2020;10(7):e034156 View
  24. Delir Haghighi P, Kang Y, Buchbinder R, Burstein F, Whittle S. Investigating Subjective Experience and the Influence of Weather Among Individuals With Fibromyalgia: A Content Analysis of Twitter. JMIR Public Health and Surveillance 2017;3(1):e4 View
  25. Rose S, Jo C, Binns S, Buenger M, Emery S, Ribisl K. Perceptions of Menthol Cigarettes Among Twitter Users: Content and Sentiment Analysis. Journal of Medical Internet Research 2017;19(2):e56 View
  26. Goncalves M, Cornelius Smith E. Social media as a data gathering tool for international business qualitative research: opportunities and challenges. Journal of Transnational Management 2018;23(2-3):66 View
  27. Rashid M, Wang D. CovidSens: a vision on reliable social sensing for COVID-19. Artificial Intelligence Review 2021;54(1):1 View
  28. Scarborough W. Feminist Twitter and Gender Attitudes: Opportunities and Limitations to Using Twitter in the Study of Public Opinion. Socius: Sociological Research for a Dynamic World 2018;4:237802311878076 View
  29. Thrasher J, Brewer N, Niederdeppe J, Peters E, Strasser A, Grana R, Kaufman A. Advancing Tobacco Product Warning Labels Research Methods and Theory: A Summary of a Grantee Meeting Held by the US National Cancer Institute. Nicotine & Tobacco Research 2019;21(7):855 View
  30. Cabitza F, Locoro A, Alderighi C, Rasoini R, Compagnone D, Berjano P. The elephant in the record: On the multiplicity of data recording work. Health Informatics Journal 2019;25(3):475 View
  31. Reuter K, Angyan P, Le N, MacLennan A, Cole S, Bluthenthal R, Lane C, El-Khoueiry A, Buchanan T. Monitoring Twitter Conversations for Targeted Recruitment in Cancer Trials in Los Angeles County: Protocol for a Mixed-Methods Pilot Study. JMIR Research Protocols 2018;7(9):e177 View
  32. Chae S, Kim Y, Lee K, Park H. Development and Clinical Evaluation of a Web-Based Upper Limb Home Rehabilitation System Using a Smartwatch and Machine Learning Model for Chronic Stroke Survivors: Prospective Comparative Study. JMIR mHealth and uHealth 2020;8(7):e17216 View
  33. Lazard A, Wilcox G, Tuttle H, Glowacki E, Pikowski J. Public reactions to e-cigarette regulations on Twitter: a text mining analysis. Tobacco Control 2017;26(e2):e112 View
  34. Colditz J, Chu K, Emery S, Larkin C, James A, Welling J, Primack B. Toward Real-Time Infoveillance of Twitter Health Messages. American Journal of Public Health 2018;108(8):1009 View
  35. Roeseler A, Meaney M, Riordan M, Solomon M, Herndon S, Hallett C. NCI's state and community research initiative: a model for future tobacco control research. Tobacco Control 2016;25(Suppl 1):i1 View
  36. Syamsuddin M, Fakhruddin M, Sahetapy-Engel J, Soewono E. Causality Analysis of Google Trends and Dengue Incidence in Bandung, Indonesia With Linkage of Digital Data Modeling: Longitudinal Observational Study. Journal of Medical Internet Research 2020;22(7):e17633 View
  37. Sullivan M, Robinson S, Littnan C, Lepczyk C. Social media as a data resource for #monkseal conservation. PLOS ONE 2019;14(10):e0222627 View
  38. Hand R, Kenne D, Wolfram T, Abram J, Fleming M. Assessing the Viability of Social Media for Disseminating Evidence-Based Nutrition Practice Guideline Through Content Analysis of Twitter Messages and Health Professional Interviews: An Observational Study. Journal of Medical Internet Research 2016;18(11):e295 View
  39. Karafillakis E, Martin S, Simas C, Olsson K, Takacs J, Dada S, Larson H. Methods for Social Media Monitoring Related to Vaccination: Systematic Scoping Review. JMIR Public Health and Surveillance 2021;7(2):e17149 View
  40. Phengsuwan J, Shah T, Thekkummal N, Wen Z, Sun R, Pullarkatt D, Thirugnanam H, Ramesh M, Morgan G, James P, Ranjan R. Use of Social Media Data in Disaster Management: A Survey. Future Internet 2021;13(2):46 View
  41. Reuter K, Deodhar A, Makri S, Zimmer M, Berenbaum F, Nikiphorou E. The impact of the COVID-19 pandemic on people with rheumatic and musculoskeletal diseases: insights from patient-generated data on social media. Rheumatology 2021;60(SI):SI77 View
  42. Bulcock A, Hassan L, Giles S, Sanders C, Nenadic G, Campbell S, Dixon W. Public Perspectives of Using Social Media Data to Improve Adverse Drug Reaction Reporting: A Mixed-Methods Study. Drug Safety 2021;44(5):553 View
  43. He L, Yin T, Hu Z, Chen Y, Hanauer D, Zheng K. Developing a standardized protocol for computational sentiment analysis research using health-related social media data. Journal of the American Medical Informatics Association 2021;28(6):1125 View
  44. Reuter K, Rocca E. Correspondence on ‘Mining social media data to investigate patient perceptions regarding DMARD pharmacotherapy for rheumatoid arthritis’. Annals of the Rheumatic Diseases 2023;82(4):e91 View
  45. Reuter K, Lee D. Perspectives Toward Seeking Treatment Among Patients With Psoriasis: Protocol for a Twitter Content Analysis. JMIR Research Protocols 2021;10(2):e13731 View
  46. Kim Y, Emery S, Vera L, David B, Huang J. At the speed of Juul: measuring the Twitter conversation related to ENDS and Juul across space and time (2017–2018). Tobacco Control 2021;30(2):137 View
  47. Radwan E, Radwan A, Radwan W. The role of social media in spreading panic among primary and secondary school students during the COVID-19 pandemic: An online questionnaire study from the Gaza Strip, Palestine. Heliyon 2020;6(12):e05807 View
  48. Teng S, Khong K, Pahlevan Sharif S, Ahmed A. YouTube Video Comments on Healthy Eating: Descriptive and Predictive Analysis. JMIR Public Health and Surveillance 2020;6(4):e19618 View
  49. He L, He C, Reynolds T, Bai Q, Huang Y, Li C, Zheng K, Chen Y. Why do people oppose mask wearing? A comprehensive analysis of U.S. tweets during the COVID-19 pandemic. Journal of the American Medical Informatics Association 2021;28(7):1564 View
  50. Bour C, Ahne A, Schmitz S, Perchoux C, Dessenne C, Fagherazzi G. The Use of Social Media for Health Research Purposes: Scoping Review. Journal of Medical Internet Research 2021;23(5):e25736 View
  51. Ewald V, Sridaran Venkat R, Asokkumar A, Benedictus R, Boller C, Groves R. Perception modelling by invariant representation of deep learning for automated structural diagnostic in aircraft maintenance: A study case using DeepSHM. Mechanical Systems and Signal Processing 2022;165:108153 View
  52. Purnat T, Vacca P, Czerniak C, Ball S, Burzo S, Zecchin T, Wright A, Bezbaruah S, Tanggol F, Dubé È, Labbé F, Dionne M, Lamichhane J, Mahajan A, Briand S, Nguyen T. Infodemic Signal Detection During the COVID-19 Pandemic: Development of a Methodology for Identifying Potential Information Voids in Online Conversations. JMIR Infodemiology 2021;1(1):e30971 View
  53. Duan Z, Li J, Lukito J, Yang K, Chen F, Shah D, Yang S. Algorithmic Agents in the Hybrid Media System: Social Bots, Selective Amplification, and Partisan News about COVID-19. Human Communication Research 2022;48(3):516 View
  54. Himelboim I, Golan G. A Social Network Approach to Social Media Influencers on Instagram: The Strength of Being a Nano-Influencer in Cause Communities. Journal of Interactive Advertising 2023;23(1):1 View
  55. Jenrette J, Liu Z, Chimote P, Hastie T, Fox E, Ferretti F. Shark detection and classification with machine learning. Ecological Informatics 2022;69:101673 View
  56. Ma S, Bergan D, Ahn S, Carnahan D, Gimby N, McGraw J, Virtue I. Fact-checking as a deterrent? A conceptual replication of the influence of fact-checking on the sharing of misinformation by political elites. Human Communication Research 2023;49(3):321 View
  57. Schmidt A, Rodriguez-Esteban R, Gottowik J, Leddin M. Applications of quantitative social media listening to patient-centric drug development. Drug Discovery Today 2022;27(5):1523 View
  58. Gupta R, Mohanty V, Balappanavar A, Chahar P, Rijhwani K, Bhatia S. Infodemiology for oral health and disease: A scoping review. Health Information & Libraries Journal 2022;39(3):207 View
  59. Reuter K, Angyan P, Le N, Buchanan T. Using Patient-Generated Health Data From Twitter to Identify, Engage, and Recruit Cancer Survivors in Clinical Trials in Los Angeles County: Evaluation of a Feasibility Study. JMIR Formative Research 2021;5(11):e29958 View
  60. Kostygina G, Tran H, Schillo B, Silver N, Emery S. Industry response to strengthened regulations: amount and themes of flavoured electronic cigarette promotion by product vendors and manufacturers on Instagram. Tobacco Control 2022;31(Suppl 3):s249 View
  61. Robinson F, Wilkes S, Schaefer N, Goldstein M, Rice M, Gray J, Meyers S, Valentino L. Patient-centered pharmacovigilance: priority actions from the inherited bleeding disorders community. Therapeutic Advances in Drug Safety 2023;14:204209862211464 View
  62. Venuturupalli S, Kumar A, Bunyan A, Davuluri N, Fortune N, Reuter K. Using Patient‐Reported Health Data From Social Media to Identify Diverse Lupus Patients and Assess Their Symptom and Medication Expressions: A Feasibility Study. Arthritis Care & Research 2023;75(2):365 View
  63. Cheatham S, Kummervold P, Parisi L, Lanfranchi B, Croci I, Comunello F, Rota M, Filia A, Tozzi A, Rizzo C, Gesualdo F. Understanding the vaccine stance of Italian tweets and addressing language changes through the COVID-19 pandemic: Development and validation of a machine learning model. Frontiers in Public Health 2022;10 View
  64. Kostygina G, Tran H, Czaplicki L, Perks S, Vallone D, Emery S, Hair E. Developing a theoretical marketing framework to analyse JUUL and compatible e-cigarette product promotion on Instagram. Tobacco Control 2023;32(e2):e192 View
  65. Malik A, Berggren W, Al-Busaidi A. Instagram as a research tool for examining tobacco-related content: A methodological review. Technology in Society 2022;70:102008 View
  66. Love E. Going Viral: The Role of Social Media in Decision-Making. Journal of Cases in Educational Leadership 2022;25(3):292 View
  67. Charbonneau E, Mellouli S, Chouikh A, Couture L, Desroches S. The Information Sharing Behaviors of Dietitians and Twitter Users in the Nutrition and COVID-19 Infodemic: Content Analysis Study of Tweets. JMIR Infodemiology 2022;2(2):e38573 View
  68. Rocco G. Garbage in, garbage out. European Journal of Cardio-Thoracic Surgery 2022;61(5):1020 View
  69. Matenga T, Zulu J, Moonzwe Davis L, Chavula M. Motivating factors for and barriers to the COVID-19 vaccine uptake: A review of social media data in Zambia. Cogent Public Health 2022;9(1) View
  70. Chen K, Duan Z, Yang S. Twitter as research data. Politics and the Life Sciences 2022;41(1):114 View
  71. Toner T, Pancholi R, Miller P, Forster T, Coleman H, Overton I. Strategies and techniques for quality control and semantic enrichment with multimodal data: a case study in colorectal cancer with eHDPrep. GigaScience 2022;12 View
  72. Isip Tan I, Cleofas J, Solano G, Pillejera J, Catapang J. Interdisciplinary Approach to Identify and Characterize COVID-19 Misinformation on Twitter: Mixed Methods Study. JMIR Formative Research 2023;7:e41134 View
  73. Uhawenimana T, Musabwasoni M, Nsengiyumva R, Mukamana D. Sexuality and Sexual and Reproductive Health Depiction in Social Media: Content Analysis of Kinyarwanda YouTube Channels. Journal of Medical Internet Research 2023;25:e46488 View
  74. Silver N, Pearson G, Kucherlapaty P, Kalla S, Schillo B. To Tweet or Not to Tweet: Tweets About Tobacco Regulation can Help Disseminate Anti-regulatory Messages. Nicotine and Tobacco Research 2023;25(9):1603 View
  75. Devyani Chaudhari , Neha Vinson . Garbage Reporting Application. International Journal of Advanced Research in Science, Communication and Technology 2023:507 View
  76. Zhang Y, Chen F, Suk J, Yue Z. WordPPR: A Researcher-Driven Computational Keyword Selection Method for Text Data Retrieval from Digital Media. Communication Methods and Measures 2023:1 View
  77. Sedlakova J, Daniore P, Horn Wintsch A, Wolf M, Stanikic M, Haag C, Sieber C, Schneider G, Staub K, Alois Ettlin D, Grübner O, Rinaldi F, von Wyl V, Sarmiento R. Challenges and best practices for digital unstructured data enrichment in health research: A systematic narrative review. PLOS Digital Health 2023;2(10):e0000347 View
  78. Arillotta D, Floresta G, Guirguis A, Corkery J, Catalani V, Martinotti G, Sensi S, Schifano F. GLP-1 Receptor Agonists and Related Mental Health Issues; Insights from a Range of Social Media Platforms Using a Mixed-Methods Approach. Brain Sciences 2023;13(11):1503 View
  79. Donaldson S, Dormanesh A, Majmundar A, Pérez C, Lopez H, Saghian M, Beard T, Unger J, Allem J. Examining the Peer-Reviewed Literature on Tobacco-Related Social Media Data: Scoping Review. Nicotine and Tobacco Research 2024;26(4):413 View
  80. Lin J, Chen C. The epistemic ethical concerns involving algorithms in intelligent communication. Teknokultura. Revista de Cultura Digital y Movimientos Sociales 2023;20(Special Issue):27 View
  81. Rhee J, Huang Y, Soroosh A, Alsudais S, Ni S, Kumar A, Paredes J, Li C, Timberlake D. The Marketing and Perceptions of Non-Tobacco Blunt Wraps on Twitter. Substance Use & Misuse 2024;59(4):469 View
  82. Rashid M, Wang D. CovidTrak: A Vision on Social Intelligence-Empowered COVID-19 Contact Tracing. IEEE Transactions on Computational Social Systems 2023;10(6):3385 View
  83. Kostygina G, Kim Y, Seeskin Z, LeClere F, Emery S. Disclosure Standards for Social Media and Generative Artificial Intelligence Research: Toward Transparency and Replicability. Social Media + Society 2023;9(4) View
  84. Kresovich A, Norris A, Carter C, Kim Y, Kostygina G, Emery S. Deciphering Influence on Social Media: A Comparative Analysis of Influential Account Detection Metrics in the Context of Tobacco Promotion. Social Media + Society 2024;10(1) View
  85. Doras C, Özcelik R, Abakar M, Issa R, Kimala P, Youssouf S, Bolon I, Dürr S. Community-based symptom reporting among agro-pastoralists and their livestock in Chad in a One Health approach. Acta Tropica 2024;253:107167 View
  86. Wang Y, Koffman J, Gao W, Zhou Y, Chukwusa E, Curcin V. Social media for palliative and end-of-life care research: a systematic review. BMJ Supportive & Palliative Care 2024;14(2):149 View
  87. Kapoor S, Cantrell E, Peng K, Pham T, Bail C, Gundersen O, Hofman J, Hullman J, Lones M, Malik M, Nanayakkara P, Poldrack R, Raji I, Roberts M, Salganik M, Serra-Garcia M, Stewart B, Vandewiele G, Narayanan A. REFORMS: Consensus-based Recommendations for Machine-learning-based Science. Science Advances 2024;10(18) View
  88. Jung S, Murthy D, Bateineh B, Loukas A, Wilkinson A. Normalization of Vaping on TikTok: A Mixed-Methods Approach Using Computer Vision, Natural Language Processing, and Qualitative Thematic Analysis (Preprint). Journal of Medical Internet Research 2023 View
  89. Ayers J, Poliak A, Beros N, Paul M, Dredze M, Hogarth M, Smith D. A Digital Cohort Approach for Social Media Monitoring: A Cohort Study of People Who Vape E-Cigarettes. American Journal of Preventive Medicine 2024;67(1):147 View
  90. Moon T, Lee J, Son J. Accurate Imputation of Greenhouse Environment Data for Data Integrity Utilizing Two-Dimensional Convolutional Neural Networks. Sensors 2021;21(6):2187 View

Books/Policy Documents

  1. Torous J, Namiri N, Keshavan M. Personalized Psychiatry. View
  2. Udgata S, Suryadevara N. Internet of Things and Sensor Network for COVID-19. View
  3. Gray K, Gilbert C. Advances in Biomedical Informatics. View
  4. Trifan A, Antunes R, Oliveira J. Practical Applications of Computational Biology & Bioinformatics, 14th International Conference (PACBB 2020). View
  5. Kasthurirathne S, Grannis S. Clinical Informatics Study Guide. View
  6. Ramos C. Encyclopedia of Data Science and Machine Learning. View
  7. Mays T. Handbook of Open, Distance and Digital Education. View
  8. Mays T. Handbook of Open, Distance and Digital Education. View
  9. Brehm-Stecher B. Encyclopedia of Food Safety. View
  10. Shoenbill K, Kasturi S, Mendonca E. Chronic Illness Care. View