<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.0 20040830//EN" "journalpublishing.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="2.0" xml:lang="en" article-type="review-article"><front><journal-meta><journal-id journal-id-type="nlm-ta">J Med Internet Res</journal-id><journal-id journal-id-type="publisher-id">jmir</journal-id><journal-id journal-id-type="index">1</journal-id><journal-title>Journal of Medical Internet Research</journal-title><abbrev-journal-title>J Med Internet Res</abbrev-journal-title><issn pub-type="epub">1438-8871</issn><publisher><publisher-name>JMIR Publications</publisher-name><publisher-loc>Toronto, Canada</publisher-loc></publisher></journal-meta><article-meta><article-id pub-id-type="publisher-id">v28i1e86200</article-id><article-id pub-id-type="doi">10.2196/86200</article-id><article-categories><subj-group subj-group-type="heading"><subject>Review</subject></subj-group></article-categories><title-group><article-title>Thematic Mapping and Evolution of Social Media Mining in Health Research: Hybrid Bibliometric Synthesis</article-title></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name name-style="western"><surname>Yang</surname><given-names>Mia Jiming</given-names></name><degrees>MSc</degrees><xref ref-type="aff" rid="aff1"/></contrib><contrib contrib-type="author"><name name-style="western"><surname>Bohnet-Joschko</surname><given-names>Sabine</given-names></name><degrees>Prof Dr</degrees><xref ref-type="aff" rid="aff1"/></contrib></contrib-group><aff id="aff1"><institution>Chair of Management and Innovation in Healthcare, Faculty of Management, Economics and Society, Witten/Herdecke University</institution><addr-line>Alfred-Herrhausen-Str. 50</addr-line><addr-line>Witten</addr-line><addr-line>North Rhine-Westphalia</addr-line><country>Germany</country></aff><contrib-group><contrib contrib-type="editor"><name name-style="western"><surname>Brini</surname><given-names>Stefano</given-names></name></contrib></contrib-group><contrib-group><contrib contrib-type="reviewer"><name name-style="western"><surname>Xiang</surname><given-names>Bo</given-names></name></contrib><contrib contrib-type="reviewer"><name name-style="western"><surname>Zikos</surname><given-names>Dimitrios</given-names></name></contrib></contrib-group><author-notes><corresp>Correspondence to Mia Jiming Yang, MSc, Chair of Management and Innovation in Healthcare, Faculty of Management, Economics and Society, Witten/Herdecke University, Alfred-Herrhausen-Str. 50, Witten, North Rhine-Westphalia, 58455, Germany, 49 2302-926-38043; <email>mia.yang@uni-wh.de</email></corresp></author-notes><pub-date pub-type="collection"><year>2026</year></pub-date><pub-date pub-type="epub"><day>8</day><month>5</month><year>2026</year></pub-date><volume>28</volume><elocation-id>e86200</elocation-id><history><date date-type="received"><day>20</day><month>10</month><year>2025</year></date><date date-type="rev-recd"><day>27</day><month>03</month><year>2026</year></date><date date-type="accepted"><day>02</day><month>04</month><year>2026</year></date></history><copyright-statement>&#x00A9; Mia Jiming Yang, Sabine Bohnet-Joschko. Originally published in the Journal of Medical Internet Research (<ext-link ext-link-type="uri" xlink:href="https://www.jmir.org">https://www.jmir.org</ext-link>), 8.5.2026. </copyright-statement><copyright-year>2026</copyright-year><license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (<ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on <ext-link ext-link-type="uri" xlink:href="https://www.jmir.org/">https://www.jmir.org/</ext-link>, as well as this copyright and license information must be included.</p></license><self-uri xlink:type="simple" xlink:href="https://www.jmir.org/2026/1/e86200"/><abstract><sec><title>Background</title><p>Social media platforms offer extensive data, as they are widely used globally. Social media mining (SMM) enables real-time monitoring of user-reported health information and serves as a supplement to traditional health data analytics. However, the rapid proliferation of literature has produced fragmentation, and a comprehensive knowledge map regarding SMM is lacking. Further, existing bibliometric reviews in health fields are easily undermined by synonym fragmentation and parameter settings, reducing their robustness. Thus, a more robust, reproducible, and decision-oriented bibliometric framework is required.</p></sec><sec><title>Objective</title><p>This study aimed to (1) outline key thematic clusters in health-related SMM and map their dynamic evolution, and (2) methodologically demonstrate how machine learning&#x2013;based bibliometric analysis can strengthen the robustness, transparency, and foresight capacity of evidence synthesis.</p></sec><sec sec-type="methods"><title>Methods</title><p>This study designed a fully automated and reproducible bibliometric analysis of PubMed journal articles published from 2015 to 2025 (n=250) and analyzed records with both abstracts and keywords (n=189). We performed cleaning and standardization for titles, abstracts, author keywords, and MeSH terms, and carried out an exploratory descriptive analysis to obtain preliminary insights into publication patterns. Subsequently, we used SPECTER2 and PubMedBERT embeddings with keywords and abstracts to construct a hybrid similarity matrix. Then, we applied Uniform Manifold Approximation and Projection for dimensionality reduction, followed by Hierarchical Density-Based Spatial Clustering of Applications with Noise for thematic clustering, and visualized the results in a 3D strategic coordinate system (maturity, influence, and recency). We performed intercluster relationship analysis and time-slice analysis to examine thematic intersections and evolution. To ensure robustness and enhance interpretability, we implemented dual-level validation.</p></sec><sec sec-type="results"><title>Results</title><p>We identified 6 thematic clusters: cluster 1 (candidate incubator pool of peripheral cross-cutting topics in health-related SMM), cluster 2 (computational methods in health informatics), cluster 3 (public attitudes and sociopsychological determinants), cluster 4 (infodemiology and the COVID-19 information ecosystem), cluster 5 (health communication and public health engagement), and cluster 6 (social media analysis and network methods). Strategic 3D mapping revealed that methodological clusters (clusters 2 and 6) occupied high-maturity and high-influence positions, while application-driven clusters (clusters 3 and 4) occupied high-influence and high-recency positions, representing rapidly expanding frontiers. Clusters 1 and 5 demonstrated strong potential for further growth. Temporal slicing confirmed a trajectory moving from methodological consolidation and thematic diversification to a renewed focus on convergence and problem-solving. Validation showed strong semantic coherence and robustness of the methods and findings.</p></sec><sec sec-type="conclusions"><title>Conclusions</title><p>We developed a semantic-structural hybrid bibliometric framework with dual-level validation, reducing synonym fragmentation and parameter sensitivity inherent in traditional approaches. The resulting decision-oriented knowledge map offers strategic guidance for infodemiology-informed and audience-segmented public health communication, research priority settings, and the deployment and evaluation of real-world surveillance and pharmacovigilance workflows while supporting evidence-driven and patient-centered decision-making in public health and health care.</p></sec></abstract><kwd-group><kwd>social media mining</kwd><kwd>bibliometrics</kwd><kwd>machine learning</kwd><kwd>strategic mapping</kwd><kwd>health research</kwd></kwd-group></article-meta></front><body><sec id="s1" sec-type="intro"><title>Introduction</title><p>Social media has become a ubiquitous component of modern life, with over 5 billion people worldwide using social platforms as of February 2025 [<xref ref-type="bibr" rid="ref1">1</xref>]. The value of social media data in health research lies in its volume, velocity, and variety, and the insights derived from user-reported content. Unlike traditional historical data, social media offers a large-scale, real-time representation of the natural state of public discourse and behavior. Content from platforms, such as Twitter and Reddit, constitutes a rich data source for gaining insights into population health trends, behaviors, and opinions [<xref ref-type="bibr" rid="ref2">2</xref>-<xref ref-type="bibr" rid="ref4">4</xref>]. Researchers have recognized that the computational analysis of social media data&#x2014;often referred to as social media mining (SMM)&#x2014;enables real-time health monitoring and health-related interactions that are unattainable through traditional data sources [<xref ref-type="bibr" rid="ref5">5</xref>]. In fact, SMM is flourishing as an emerging science aimed at providing health care stakeholders with additional evidence for decision-making.</p><p>Over the past 5 years, there has been explosive growth in digital health research utilizing social media data, spanning areas such as epidemiological studies and pharmacovigilance [<xref ref-type="bibr" rid="ref5">5</xref>-<xref ref-type="bibr" rid="ref8">8</xref>]. Moreover, SMM techniques have been applied in health communication and promotion (eg, evaluating public responses to health campaigns), mental health monitoring (eg, identifying depression-related discourse online), and behavioral research (eg, analyzing lifestyle-related topics) [<xref ref-type="bibr" rid="ref2">2</xref>,<xref ref-type="bibr" rid="ref9">9</xref>,<xref ref-type="bibr" rid="ref10">10</xref>]. Mapping such a fragmented and rapidly evolving field has become especially important. Therefore, a literature review in this area can provide an overview of existing research.</p><p>Four reviews have previously addressed the use of social media and the analysis of social media data for health purposes. A review of the application of SMM in health outcomes included 19 papers [<xref ref-type="bibr" rid="ref11">11</xref>]. However, it was published in 2019, prior to the major expansion of social media data mining, and only provided an overview of literature related to medication and treatment side effects. The authors suggested at the time that adverse drug events and health-related quality of life would be future research directions. Another review on social text mining conducted in the early stages of the pandemic focused on the lack of ethical consensus and guidelines in this emerging field, thus emphasizing ethical standards in the use of social media text data analysis [<xref ref-type="bibr" rid="ref12">12</xref>]. Subsequently, a 2021 review of the literature on social media for health purposes identified at least 10 applications in the health domain&#x2014;spanning health interventions, public communication, surveillance, education, and research&#x2014;highlighting the breadth of the field [<xref ref-type="bibr" rid="ref13">13</xref>]. While this review did not provide an overview of SMM, it illustrated the diverse potential application scenarios of SMM. Another scoping review on the analysis of social media data in health care searched 5 academic databases, including PubMed, Web of Science, and Embase, and ultimately included 134 papers on social media data analysis [<xref ref-type="bibr" rid="ref14">14</xref>]. However, only 55 of these papers fell under the category of SMM, with the rest being traditional qualitative studies. Although this review only briefly described the methods used in SMM in the section on computational tools, it pointed out that future analysis of social media data will inevitably move in a technology-driven direction.</p><p>In terms of research methodology, existing bibliometric reviews on health research and medical innovation predominantly follow 2 typical approaches. The first approach uses co-word networks and thematic mapping based on keyword co-occurrence, combined with descriptive metrics, such as publication volume, country- and institution-level collaboration, and core journals, to summarize the research landscape [<xref ref-type="bibr" rid="ref15">15</xref>-<xref ref-type="bibr" rid="ref22">22</xref>]. The second approach uses topic modeling techniques, such as latent Dirichlet allocation (LDA) and structured topic modeling, to extract latent themes from textual data [<xref ref-type="bibr" rid="ref15">15</xref>,<xref ref-type="bibr" rid="ref21">21</xref>,<xref ref-type="bibr" rid="ref23">23</xref>-<xref ref-type="bibr" rid="ref28">28</xref>]. While valuable, these methods often exhibit 2 limitations when applied to rapidly evolving research fields. First, pure co-occurrence statistics rely excessively on string sequences and frequency counts, frequently overlooking synonymy, near-synonymy, and conceptual fragmentation caused by variations in expression, and the results are highly sensitive to threshold selection [<xref ref-type="bibr" rid="ref24">24</xref>,<xref ref-type="bibr" rid="ref29">29</xref>-<xref ref-type="bibr" rid="ref31">31</xref>]. Furthermore, traditional topic models typically rely on the bag-of-words assumption and require predefined topic counts, potentially compromising comparability and robustness across different time windows [<xref ref-type="bibr" rid="ref23">23</xref>,<xref ref-type="bibr" rid="ref24">24</xref>,<xref ref-type="bibr" rid="ref29">29</xref>]. Addressing these methodological gaps, this study proposes a hybrid bibliometric framework integrating semantic and structural analysis [<xref ref-type="bibr" rid="ref30">30</xref>]. We introduce semantic similarity while preserving the co-occurrence network structure to mitigate synonym fragmentation [<xref ref-type="bibr" rid="ref30">30</xref>]. Aligned with this approach, a recent study has integrated structural and semantic information within citation networks to more precisely characterize the influence of publications and their capacity for interdisciplinary knowledge diffusion [<xref ref-type="bibr" rid="ref32">32</xref>]. Density-based clustering enables topic cluster identification without prespecifying the cluster numbers [<xref ref-type="bibr" rid="ref33">33</xref>]. By integrating time slicing, burst detection, and multidimensional strategic coordinates (maturity, influence, and recency), we systematically characterize a topic&#x2019;s knowledge accumulation, real-world impact, and recent growth potential. This provides a more decision-oriented, reproducible evidentiary landscape for the field. To ensure the robustness and reproducibility of this hybrid framework, we further detail the rationale for key parameter settings in the Methods section and demonstrate the validity and stability of the methodology and results through multiple validation tests.</p><p>In summary, prior studies have involved highly diverse application scenarios of social media data and have made fragmented statements. While previous reviews have cataloged applications and thematic summarization, they have rarely integrated advanced computational approaches for a structured synthesis. In particular, bibliometric reviews often rely on co-occurrence statistics or topic models with limited robustness. This leads to a limitation in sustaining a prospective strategic vision. To address this, our study developed an automated, reproducible pipeline for bibliometric analysis of SMM in health care, introducing several methodological innovations.</p><p>Furthermore, this study aimed to not only identify hot topics but also trace their temporal evolution. Additionally, this study attempted to explore the future hotspots of research innovation and reveal the composition of the innovation incubation pool, thereby helping researchers, funding agencies, and policymakers to anticipate future directions and foster interdisciplinary integration. Thus, this study had the following questions:</p><list list-type="order"><list-item><p>What are the main topics of research on SMM in health care and their latent patterns?</p></list-item><list-item><p>What are their developmental stages, their strategic positioning, and the future potential research themes?</p></list-item></list><p>To answer these questions, we used the following guiding goals: (1) substantively outline key thematic clusters in health-related SMM, map their pre-, during-, and postexplosive growth period evolution, and evaluate the implications for fostering social equity, effective health management, and informed policy, and (2) methodologically demonstrate how advanced bibliometric methods can strengthen the robustness, transparency, and foresight capacity of evidence synthesis.</p></sec><sec id="s2" sec-type="methods"><title>Methods</title><sec id="s2-1"><title>Study Design</title><p>This study aimed to systematically review and analyze the research progress and thematic evolution of SMM in the context of health by constructing a fully automated and reproducible bibliometric analysis framework. To achieve this objective, we designed a multistage methodological pathway in accordance with BIBLIO (Preliminary Guideline for Reporting Bibliometric Reviews of the Biomedical Literature) [<xref ref-type="bibr" rid="ref34">34</xref>]. Figure S1 in <xref ref-type="supplementary-material" rid="app1">Multimedia Appendix 1</xref> illustrates the research process. The methods are elaborated in more detail in Figure S2 in <xref ref-type="supplementary-material" rid="app1">Multimedia Appendix 1</xref>. Additionally, the BIBLIO checklist has been provided in <xref ref-type="supplementary-material" rid="app8">Checklist 1</xref>.</p></sec><sec id="s2-2"><title>Data Retrieval and Processing</title><p>The first phase of this study involved constructing a comprehensive literature dataset. We selected the PubMed database as the primary data source, as it is one of the most extensive and authoritative bibliographic databases in the field of health research. We summarized the search strategy, record identification, and inclusion process using an adapted PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 flow diagram. Following the PRISMA-S extension for reporting literature searches, we have presented the search process in <xref ref-type="supplementary-material" rid="app9">Checklist 2</xref> to strengthen the transparency and reproducibility of the search [<xref ref-type="bibr" rid="ref35">35</xref>]. To ensure the comprehensiveness of the search, we used a search strategy utilizing Boolean logic operators to combine free-text terms and Medical Subject Headings (MeSH) terms. The query included terms for social media platforms (eg, &#x201C;social media,&#x201D; &#x201C;Twitter,&#x201D; and &#x201C;Reddit&#x201D;), combined with terms related to data mining and analysis (eg, &#x201C;mining&#x201D; and &#x201C;data mining&#x201D;). We also applied filters to include only journal articles from January 01, 2015, to July 31, 2025 (excluding news and retracted articles). The literature download process was automated using the Entrez module. Necessary parameters, including <italic>Entrez.email</italic> and <italic>Entrez.api_key,</italic> were configured to comply with National Institutes of Health application programming interface (API) access guidelines and prevent connection errors during high-frequency requests. All retrieved results were stored to ensure transparency and traceability throughout the research process. Furthermore, to mitigate data loss due to API updates or network instability, we implemented comprehensive logging after each batch download and recorded completion timestamps, the search strategy used, the number of records returned, and other relevant metadata, thereby guaranteeing data integrity and reproducibility.</p><p>In addition to the metadata obtained from PubMed, this study integrated supplementary metrics from external databases and official journal websites. Specifically, we collected the H-index of relevant journals from the SCImago database to reflect their academic prestige and long-term impact. Concurrently, the most recently released impact factor (IF) and 5-year IF were acquired from official journal websites and matched with the publication records. Then, we retrieved the relative citation ratio (RCR) of the included articles, which is a normalized measure of citation influence across fields and time. The RCR was used only for post-hoc external comparison and was not included in embedding, clustering, parameter selection, or labeling. The integration of these supplementary data laid a solid foundation for subsequent external validity checks.</p><p>During data processing, we extracted the core metadata, including title, keywords, abstract, publication year, journal, etc. Then, we performed keyword standardization by unifying plurals and synonyms, expanding abbreviations, and mapping entry terms to MeSH descriptors with the NLM MeSH descriptor dataset (accessed on July 31, 2025; latest record revision date: July 15, 2025).</p></sec><sec id="s2-3"><title>Exploratory Descriptive Analysis</title><p>After processing the bibliographic dataset, we conducted an exploratory descriptive analysis to characterize SMM in health research. This stage provided a phenomenon-level overview to guide subsequent structural modeling. We explored annual publication trends with autoregressive integrated moving average (ARIMA)-based exploratory projection, collaboration networks between institutions and researchers, countries, citation impact, and keyword dynamics (keyword statistics and Kleinberg burst detection). Overall, this multifaceted descriptive analysis established an overview of the research. The methods of keyword burst detection and parameter robustness assessment are described in <xref ref-type="supplementary-material" rid="app2">Multimedia Appendix 2</xref>.</p></sec><sec id="s2-4"><title>Thematic Clustering and Strategic Mapping</title><p>This study proposes an automated bibliometric pipeline specifically designed for literature on SMM in the health domain. Although we performed normalization, residual terminological variations may still exist. Therefore, we developed an automated process integrating structural and semantic similarity, where semantic proximity is derived from contextual embeddings (PubMedBERT and SPECTER2) computed from titles and abstracts to help reduce fragmentation caused by near-synonymous or variably expressed terms and to capture contextual semantics. A hybrid similarity matrix was reduced with Uniform Manifold Approximation and Projection (UMAP) and clustered via Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) under multiple parameter profiles. Outputs included interpretable topic clusters with representative keywords and their positions in a 3D strategic diagram (maturity, influence, and recency). In this study, the number of papers published for a keyword represents maturity, the number of connections a keyword has in co-occurrence networks represents influence, and the proportion of papers published within the last 5 years represents recency. Details of parameter tuning, thresholds, and ablation tests are provided in <xref ref-type="supplementary-material" rid="app3">Multimedia Appendix 3</xref>. After clustering, we performed an intercluster relationship analysis on the weighted keyword co-occurrence network to quantify cross-theme connectivity. We computed a cluster-to-cluster coupling metric based on the total cross-cluster co-occurrence strength, ranked cluster pairs to identify the most strongly connected themes, and interpreted these links (information is provided in the Results section). To explain what drives these cross-cluster connections, we further identified pair-specific bridging keywords contributing the most to the coupling between each highly coupled cluster pair.</p></sec><sec id="s2-5"><title>Dynamic Temporal Slicing</title><p>For the dynamic analysis, the study period was divided into 3 slices with the cutoff of explosive growth: 2015&#x2010;2019, 2020&#x2010;2023, and 2024&#x2010;2025. Within each slice, we performed spectral clustering on the keyword co-occurrence network. This allowed us to observe the emergence, growth, and decline of research themes over time. Details of the process are provided in <xref ref-type="supplementary-material" rid="app3">Multimedia Appendix 3</xref>.</p></sec><sec id="s2-6"><title>Validation</title><p>This study conducted either macro-level or micro-level validation. To assess the robustness of our methods, we conducted internal validation through a series of sensitivity analyses. Moreover, we validated the findings with external checks using RCR alignment.</p><p>In addition, we conducted micro-level interpretive triangulation through evidence mapping and expert content analysis. The 3D strategic diagram was divided into 8 quadrants based on a reference point (x=0, y=3.51, z=0.5). From each cluster in each quadrant, we sampled 1 representative keyword that was the farthest from the reference point. For each sampled keyword, we retrieved 2 full-text articles: 1 with the highest RCR and 1 published in a journal with the highest IF. Two reviewers examined the full texts to extract evidence on research introduction, methods, social media sites, contributions, challenges, and future outlook. The protocols for the analysis of the selected articles are described in <xref ref-type="supplementary-material" rid="app4">Multimedia Appendix 4</xref>. This interpretive triangulation was performed after the mathematical analyses were finalized and was not used to fine-tune parameters, select models, modify cluster assignments, or remove records and keywords.</p></sec><sec id="s2-7"><title>Tools and Software Utilized</title><p>All analyses were conducted primarily in Python (via PyCharm IDE), with the exception of burst detection, which was implemented in RStudio (Posit). Microsoft Excel was used for supplementary manual inspection and verification of selected outputs, and Microsoft Office Visio and Draw.io were used to draw flow charts.</p></sec><sec id="s2-8"><title>Ethical Considerations</title><p>This study was a bibliometric and secondary analysis of bibliographic data about academic publications from an open-access database. Neither humans nor animals were involved in this work. Therefore, ethical approval was not required. Moreover, the original data for this study consisted entirely of publicly available bibliographic information. During the research process, we did not collect any individual-level participant data or recruit human participants. Therefore, informed consent was not required, and no compensation was provided. Furthermore, the study exclusively processed publicly available article-level metadata and reported the results in summary form. None of the data collected, processed, or included in the manuscript text and supplementary materials contain any personally identifiable information. Consequently, there is no risk of individual privacy disclosure, and no additional authorization or consent documents are required.</p></sec></sec><sec id="s3" sec-type="results"><title>Results</title><sec id="s3-1"><title>Publications Identified</title><p>In the initial stage, we retrieved 250 publications from PubMed covering the period between January 1, 2015, and July 31, 2025. After the exclusion of articles without an abstract or author-supplied keywords, a total of 189 publications were retained for systematic analysis (<xref ref-type="supplementary-material" rid="app5">Multimedia Appendix 5</xref>). The PRISMA flow diagram is presented in <xref ref-type="fig" rid="figure1">Figure 1</xref>. Moreover, details of the search strategy and variations of the key search terms are presented in <xref ref-type="fig" rid="figure1">Figure 1</xref>.</p><fig position="float" id="figure1"><label>Figure 1.</label><caption><p>Adapted PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram for bibliometric dataset construction using PubMed. Exclusions were based on objective metadata completeness (missing author keywords/abstract), without manual title/abstract screening or full-text eligibility assessment. The image has been reproduced from Page et al [<xref ref-type="bibr" rid="ref36">36</xref>], which is published under Creative Commons Attribution 4.0 International License [<xref ref-type="bibr" rid="ref37">37</xref>].</p></caption><graphic alt-version="no" mimetype="image" position="float" xlink:type="simple" xlink:href="jmir_v28i1e86200_fig01.png"/></fig></sec><sec id="s3-2"><title>Exploratory Descriptive Analysis</title><p>As shown in <xref ref-type="fig" rid="figure2">Figure 2</xref> and Table S1 in <xref ref-type="supplementary-material" rid="app6">Multimedia Appendix 6</xref>, annual publication trends in SMM exhibited a consistent upward trajectory over the past decade. Between 2015 and 2024, the annual number of PubMed-indexed articles averaged 17.4. By July 31, 2025, 15 publications had already been recorded. Using an ARIMA-based exploratory projection fitted to the annual series, we estimated approximately 32 publications for 2025, with a wide 95% prediction interval (18-45) reflecting uncertainty due to the short annual time series. Overall, these findings indicate sustained growth and underscore the growing scholarly interest in the field.</p><fig position="float" id="figure2"><label>Figure 2.</label><caption><p>Annual publication trends and autoregressive integrated moving average (ARIMA)-based exploratory projection (January 1, 2015, to July 31, 2025, including the predicted number for 2025). In this study, each year is treated as a single observation, resulting in a short time series based on annual aggregation. The ARIMA model serves only as an exploratory projection, yielding a broad prediction interval. The predicted counts for 2025 should be interpreted with caution.</p></caption><graphic alt-version="no" mimetype="image" position="float" xlink:type="simple" xlink:href="jmir_v28i1e86200_fig02.png"/></fig><p>The distributions of countries, institutions, and journals are summarized in <xref ref-type="table" rid="table1">Table 1</xref>. The United States was the leading contributor (54/189, 28.6%), followed by China (27/189, 14.3%), Italy (11/189, 5.8%), and France (10/189, 5.3%). Among institutions, the University of Texas at Austin (5/189, 2.7%) and Kazan Federal University (3/189, 1.6%) appeared most frequently. The Journal of Medical Internet Research was the dominant journal (21/189, 11.1%), followed by the International Journal of Environmental Research and Public Health (7/189, 3.7%) and JMIR Public Health and Surveillance (7/189, 3.7%). Beyond these core contributors, a long tail of less-represented countries, institutions, and journals reflects the field&#x2019;s broadening boundaries and thematic heterogeneity.</p><table-wrap id="t1" position="float"><label>Table 1.</label><caption><p>Top 10 entities (countries, affiliations, and journals) by frequency in social media mining publications.</p></caption><table id="table1" frame="hsides" rules="groups"><thead><tr><td align="left" valign="top">Variable</td><td align="left" valign="top">Value (N=189), n (%)</td></tr></thead><tbody><tr><td align="left" valign="top">Country</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>United States</td><td align="left" valign="top">54 (28.6)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>China</td><td align="left" valign="top">27 (14.3)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Italy</td><td align="left" valign="top">11 (5.8)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>France</td><td align="left" valign="top">10 (5.3)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>United Kingdom</td><td align="left" valign="top">7 (3.7)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Canada</td><td align="left" valign="top">6 (3.2)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Korea</td><td align="left" valign="top">6 (3.2)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Republic of China (Taiwan)</td><td align="left" valign="top">5 (2.7)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Spain</td><td align="left" valign="top">5 (2.7)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Japan</td><td align="left" valign="top">4 (2.1)</td></tr><tr><td align="left" valign="top">Affiliation</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Center for Health Communication, The University of Texas at Austin, Austin, Texas, United States</td><td align="left" valign="top">5 (2.7)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Kazan Federal University, Kazan, Russian Federation</td><td align="left" valign="top">3 (1.6)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Center R&#x00E9;gional de Pharmacovigilance, H&#x00F4;pital Europ&#x00E9;en Georges-Pompidou, Paris, France</td><td align="left" valign="top">2 (1.1)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>College of Nursing and Public Health, Adelphi University, Garden City, New York, United States</td><td align="left" valign="top">2 (1.1)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Department of Psychiatry, Singapore General Hospital, Singapore</td><td align="left" valign="top">2 (1.1)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Division of General Internal Medicine and Primary Care, Brigham and Women&#x2019;s Hospital, Boston, Massachusetts, United States</td><td align="left" valign="top">2 (1.1)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Division of Rheumatology, Allergy and Immunology, Massachusetts General Hospital, Boston, Massachusetts, United States</td><td align="left" valign="top">2 (1.1)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Institute of Informatics and Telematics (IIT), National Research Council (CNR), Pisa, Italy</td><td align="left" valign="top">2 (1.1)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Kap Code, Paris, France</td><td align="left" valign="top">2 (1.1)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>National Research University Higher School of Economics, Moscow, Russian Federation</td><td align="left" valign="top">2 (1.1)</td></tr><tr><td align="left" valign="top">Journal</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Journal of Medical Internet Research</td><td align="left" valign="top">21 (11.1)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>International Journal of Environmental Research and Public Health</td><td align="left" valign="top">7 (3.7)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>JMIR Public Health and Surveillance</td><td align="left" valign="top">7 (3.7)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Journal of Biomedical Informatics</td><td align="left" valign="top">7 (3.7)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Frontiers in Psychology</td><td align="left" valign="top">6 (3.2)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Journal of the American Medical Informatics Association: JAMIA</td><td align="left" valign="top">6 (3.2)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>PeerJ Computer Science</td><td align="left" valign="top">6 (3.2)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Studies in Health Technology and Informatics</td><td align="left" valign="top">6 (3.2)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>JMIR Infodemiology</td><td align="left" valign="top">4 (2.1)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Social Network Analysis and Mining</td><td align="left" valign="top">4 (2.1)</td></tr></tbody></table></table-wrap><p>The author collaboration network in Figure S3 in <xref ref-type="supplementary-material" rid="app1">Multimedia Appendix 1</xref> displays a pattern in which a few highly connected researchers, such as Graciela Gonzalez-Hernandez, Davy Weissenbacher, and Abeed Sarker, occupy central hub positions, maintaining multiple cross-institutional collaborations. Their cooperative studies are summarized in Tables S2 and S3 in <xref ref-type="supplementary-material" rid="app6">Multimedia Appendix 6</xref>, highlighting broader collaboration portfolios, notably those of Nathalie Texier and colleagues. Overall, the field is shaped by a small number of core investigators complemented by a long tail of occasional contributors. Importantly, this authorship structure is mirrored in citation patterns, which further reflect the influence and reach of these core groups.</p><p>Citation analyses highlighted both globally influential work and highly interconnected contributions within the field. As shown in Figure S4 in <xref ref-type="supplementary-material" rid="app1">Multimedia Appendix 1</xref>, the ranking of publications by RCR identified a set of cornerstone articles, with articles by Ayyoubzadeh et al [<xref ref-type="bibr" rid="ref38">38</xref>] on COVID-19 trend prediction and Nikfarjam et al [<xref ref-type="bibr" rid="ref39">39</xref>] on adverse drug reactions consistently leading the field. Other high-RCR papers addressed vaccine hesitancy, mental health, and methodological advances, such as machine learning (ML) and natural language processing (NLP), reflecting both topical diversity and methodological innovation among the most impactful studies. Within the internal citation network (Figure S5 in <xref ref-type="supplementary-material" rid="app1">Multimedia Appendix 1</xref>), the article by Nikfarjam et al [<xref ref-type="bibr" rid="ref39">39</xref>] again occupied the central position with the most citations, followed by articles by Tapi Nzali et al [<xref ref-type="bibr" rid="ref40">40</xref>] on breast cancer and Lazard et al [<xref ref-type="bibr" rid="ref41">41</xref>] on public communication. Several COVID-19&#x2013;related studies (eg, Li et al [<xref ref-type="bibr" rid="ref42">42</xref>] and Zhang et al [<xref ref-type="bibr" rid="ref43">43</xref>]) also ranked prominently, underscoring the pandemic&#x2019;s role in driving recent scholarly influence.</p><p>The keyword statistics in <xref ref-type="table" rid="table2">Table 2</xref> and Figure S6 in <xref ref-type="supplementary-material" rid="app1">Multimedia Appendix 1</xref> revealed both stable core terms and dynamically emerging concepts. As shown in Figure S6 in <xref ref-type="supplementary-material" rid="app1">Multimedia Appendix 1</xref>, &#x201C;social media,&#x201D; &#x201C;data mining,&#x201D; and &#x201C;natural language processing&#x201D; consistently ranked among the most frequent keywords forming a foundation in SMM. In contrast, the occurrence of terms, such as &#x201C;COVID-19,&#x201D; &#x201C;vaccine hesitancy,&#x201D; and &#x201C;mental health,&#x201D; increased sharply in specific years, highlighting the responsiveness of the literature to external events and public health concerns. Certain topics (eg, &#x201C;sentiment analysis&#x201D;) extended beyond 2025, suggesting potential growth.</p><table-wrap id="t2" position="float"><label>Table 2.</label><caption><p>Top 20 keywords in social media mining publications (2015&#x2010;2025).</p></caption><table id="table2" frame="hsides" rules="groups"><thead><tr><td align="left" valign="top">Rank</td><td align="left" valign="top">Keyword</td><td align="left" valign="top">Overall frequency, n</td></tr></thead><tbody><tr><td align="left" valign="top">1</td><td align="left" valign="top">Social media</td><td align="left" valign="top">84</td></tr><tr><td align="left" valign="top">2</td><td align="left" valign="top">Data mining</td><td align="left" valign="top">58</td></tr><tr><td align="left" valign="top">3</td><td align="left" valign="top">COVID-19</td><td align="left" valign="top">43</td></tr><tr><td align="left" valign="top">4</td><td align="left" valign="top">Natural language processing</td><td align="left" valign="top">36</td></tr><tr><td align="left" valign="top">5</td><td align="left" valign="top">Twitter</td><td align="left" valign="top">34</td></tr><tr><td align="left" valign="top">6</td><td align="left" valign="top">Sentiment analysis</td><td align="left" valign="top">25</td></tr><tr><td align="left" valign="top">7</td><td align="left" valign="top">Social media mining</td><td align="left" valign="top">24</td></tr><tr><td align="left" valign="top">8</td><td align="left" valign="top">Machine learning</td><td align="left" valign="top">22</td></tr><tr><td align="left" valign="top">9</td><td align="left" valign="top">Infodemiology</td><td align="left" valign="top">14</td></tr><tr><td align="left" valign="top">10</td><td align="left" valign="top">Topic modeling</td><td align="left" valign="top">10</td></tr><tr><td align="left" valign="top">11</td><td align="left" valign="top">Deep learning</td><td align="left" valign="top">9</td></tr><tr><td align="left" valign="top">12</td><td align="left" valign="top">Pandemics</td><td align="left" valign="top">9</td></tr><tr><td align="left" valign="top">13</td><td align="left" valign="top">Pharmacovigilance</td><td align="left" valign="top">9</td></tr><tr><td align="left" valign="top">14</td><td align="left" valign="top">Coronavirus</td><td align="left" valign="top">8</td></tr><tr><td align="left" valign="top">15</td><td align="left" valign="top">Mental health</td><td align="left" valign="top">7</td></tr><tr><td align="left" valign="top">16</td><td align="left" valign="top">Big data</td><td align="left" valign="top">6</td></tr><tr><td align="left" valign="top">17</td><td align="left" valign="top">Latent Dirichlet allocation</td><td align="left" valign="top">6</td></tr><tr><td align="left" valign="top">18</td><td align="left" valign="top">Public health</td><td align="left" valign="top">6</td></tr><tr><td align="left" valign="top">19</td><td align="left" valign="top">Reddit</td><td align="left" valign="top">6</td></tr><tr><td align="left" valign="top">20</td><td align="left" valign="top">Social networking</td><td align="left" valign="top">6</td></tr></tbody></table></table-wrap><p>We applied the Kleinberg burst detection algorithm to capture topic dynamics. The resulting heatmap (<xref ref-type="fig" rid="figure3">Figure 3</xref>) and timeline (<xref ref-type="fig" rid="figure4">Figure 4</xref>) identified multiple high-intensity bursts, with peak activity observed in 2020&#x2010;2021. Burst events were not limited to pandemic-related terms, and method-focused keywords, such as &#x201C;topic modeling&#x201D; and &#x201C;ML,&#x201D; indicated that methodological innovation engages in dynamic interaction with real-world demands.</p><fig position="float" id="figure3"><label>Figure 3.</label><caption><p>Heatmap of keyword burst events in PubMed literature. Each row represents a keyword identified from PubMed articles, and columns denote 2015 to 2025. The central heatmap displays burst periods detected by the Kleinberg burst detection algorithm under parameter configuration G0.5_W3_L1. Burst levels are shown in 10 discrete tiers with specific hex colors, where darker shades indicate stronger bursts (eg, level &#x2265;18 in navy). Blue stars mark individual burst periods. The top bar plot shows the total number of keyword bursts per year, with counts labeled above each bar. The right bar plot indicates the total number of burst occurrences per keyword across all years. These bursts often correspond to emerging or intensifying topics in biomedical and social media research.</p></caption><graphic alt-version="no" mimetype="image" position="float" xlink:type="simple" xlink:href="jmir_v28i1e86200_fig03.png"/></fig><fig position="float" id="figure4"><label>Figure 4.</label><caption><p>Burst timelines of keywords with burst levels &#x2265;2 in social media mining from 2015 to 2027. The image shows the temporal burst patterns of 37 keywords identified as most significant in terms of burst strength. Each keyword is represented by a horizontal timeline from the start of 2015 to the end of 2027 (blue lines), with red solid lines indicating the duration of burst periods and red dots marking the onset of each burst. If a burst extends beyond the year 2025, the line becomes dashed to signify projected or ongoing trends. The visualization highlights the emergence and persistence of research interests within the field of digital health and social media analytics.</p></caption><graphic alt-version="no" mimetype="image" position="float" xlink:type="simple" xlink:href="jmir_v28i1e86200_fig04.png"/></fig></sec><sec id="s3-3"><title>Static Thematic Clustering, Strategic Positioning, and Intercluster Relationships</title><sec id="s3-3-1"><title>Overview</title><p>A total of 1157 author keywords were present in the included literature. Initially, low-frequency words with an occurrence frequency of &#x2264;2 were removed, along with the 2 most frequently occurring words. This approach prevents noise from affecting the co-occurrence network and clustering results while also avoiding overdominating the network outcomes. Then, to prevent isolated or incidentally collinear edges from affecting structural similarity, a minimum co-occurrence threshold of 1 was set. Only keywords with edges present in both the structural and semantic spaces were retained. Finally, 78 high-quality keywords (<xref ref-type="table" rid="table3">Table 3</xref>) were included for subsequent analysis.</p><table-wrap id="t3" position="float"><label>Table 3.</label><caption><p>Detailed metrics of keywords and clusters.</p></caption><table id="table3" frame="hsides" rules="groups"><thead><tr><td align="left" valign="top">Cluster and keywords</td><td align="left" valign="top">Freq<sup><xref ref-type="table-fn" rid="table3fn1">a</xref></sup>, n</td><td align="left" valign="top">Overall degree</td><td align="left" valign="top">Network strength</td><td align="left" valign="top">Centroid distance<sup><xref ref-type="table-fn" rid="table3fn2">b</xref></sup></td><td align="left" valign="top">Quadrant</td></tr></thead><tbody><tr><td align="left" valign="top" colspan="6">Cluster 1<sup><xref ref-type="table-fn" rid="table3fn3">c</xref></sup> (noise; n=19)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Mental health</td><td align="left" valign="top">7</td><td align="left" valign="top">11</td><td align="left" valign="top">4.195</td><td align="left" valign="top">2.019</td><td align="left" valign="top">Core emerging hotspot (x&#x2265;0, y&#x2265;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Big data</td><td align="left" valign="top">6</td><td align="left" valign="top">10</td><td align="left" valign="top">3.829</td><td align="left" valign="top">1.903</td><td align="left" valign="top">Potential breakthrough theme (x&#x003C;0, y&#x2265;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Public health</td><td align="left" valign="top">6</td><td align="left" valign="top">24</td><td align="left" valign="top">9.121</td><td align="left" valign="top">0.872</td><td align="left" valign="top">Core emerging hotspot (x&#x2265;0, y&#x2265;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Depression</td><td align="left" valign="top">3</td><td align="left" valign="top">5</td><td align="left" valign="top">1.722</td><td align="left" valign="top">2.099</td><td align="left" valign="top">Peripheral mature theme (x&#x2265;0, y&#x003C;3.51, z&#x003C;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Google trend</td><td align="left" valign="top">3</td><td align="left" valign="top">7</td><td align="left" valign="top">2.460</td><td align="left" valign="top">1.330</td><td align="left" valign="top">Peripheral mature theme (x&#x2265;0, y&#x003C;3.51, z&#x003C;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Perception</td><td align="left" valign="top">3</td><td align="left" valign="top">15</td><td align="left" valign="top">5.166</td><td align="left" valign="top">1.918</td><td align="left" valign="top">Potential breakthrough theme (x&#x003C;0, y&#x2265;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Cyberincivility</td><td align="left" valign="top">2</td><td align="left" valign="top">3</td><td align="left" valign="top">1.705</td><td align="left" valign="top">1.758</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Digital health</td><td align="left" valign="top">2</td><td align="left" valign="top">3</td><td align="left" valign="top">1.199</td><td align="left" valign="top">1.822</td><td align="left" valign="top">Peripheral emerging theme (x&#x2265;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Drug repositioning</td><td align="left" valign="top">2</td><td align="left" valign="top">6</td><td align="left" valign="top">0.983</td><td align="left" valign="top">2.100</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>LSTM</td><td align="left" valign="top">2</td><td align="left" valign="top">7</td><td align="left" valign="top">2.198</td><td align="left" valign="top">0.980</td><td align="left" valign="top">Peripheral emerging theme (x&#x2265;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Medium</td><td align="left" valign="top">2</td><td align="left" valign="top">1</td><td align="left" valign="top">0.492</td><td align="left" valign="top">0.989</td><td align="left" valign="top">Peripheral obsolete theme (x&#x003C;0, y&#x003C;3.51, z&#x003C;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Neoplasms</td><td align="left" valign="top">2</td><td align="left" valign="top">5</td><td align="left" valign="top">1.979</td><td align="left" valign="top">1.688</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Nurse</td><td align="left" valign="top">2</td><td align="left" valign="top">6</td><td align="left" valign="top">2.240</td><td align="left" valign="top">1.669</td><td align="left" valign="top">Peripheral emerging theme (x&#x2265;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Republic of Korea</td><td align="left" valign="top">2</td><td align="left" valign="top">3</td><td align="left" valign="top">0.944</td><td align="left" valign="top">1.542</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Social networking site</td><td align="left" valign="top">2</td><td align="left" valign="top">3</td><td align="left" valign="top">1.705</td><td align="left" valign="top">1.661</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Suicide</td><td align="left" valign="top">2</td><td align="left" valign="top">4</td><td align="left" valign="top">1.426</td><td align="left" valign="top">2.450</td><td align="left" valign="top">Peripheral emerging theme (x&#x2265;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Thematic analysis</td><td align="left" valign="top">2</td><td align="left" valign="top">6</td><td align="left" valign="top">2.165</td><td align="left" valign="top">1.401</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Web scraping</td><td align="left" valign="top">2</td><td align="left" valign="top">3</td><td align="left" valign="top">0.757</td><td align="left" valign="top">1.685</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>YouTube</td><td align="left" valign="top">2</td><td align="left" valign="top">3</td><td align="left" valign="top">1.127</td><td align="left" valign="top">1.599</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top" colspan="6">Cluster 2<sup><xref ref-type="table-fn" rid="table3fn4">d</xref></sup> (n=15)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Natural language processing</td><td align="left" valign="top">36</td><td align="left" valign="top">41</td><td align="left" valign="top">13.465</td><td align="left" valign="top">0.752</td><td align="left" valign="top">Core emerging hotspot (x&#x2265;0, y&#x2265;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Social media mining</td><td align="left" valign="top">24</td><td align="left" valign="top">27</td><td align="left" valign="top">8.489</td><td align="left" valign="top">0.067</td><td align="left" valign="top">Core emerging hotspot (x&#x2265;0, y&#x2265;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Machine learning</td><td align="left" valign="top">22</td><td align="left" valign="top">28</td><td align="left" valign="top">8.820</td><td align="left" valign="top">0.473</td><td align="left" valign="top">Core emerging hotspot (x&#x2265;0, y&#x2265;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Pharmacovigilance</td><td align="left" valign="top">9</td><td align="left" valign="top">8</td><td align="left" valign="top">2.776</td><td align="left" valign="top">0.511</td><td align="left" valign="top">Peripheral mature theme (x&#x2265;0, y&#x003C;3.51, z&#x003C;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Information storage and retrieval</td><td align="left" valign="top">5</td><td align="left" valign="top">6</td><td align="left" valign="top">2.311</td><td align="left" valign="top">0.511</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Drug-related side effects and adverse reactions</td><td align="left" valign="top">4</td><td align="left" valign="top">6</td><td align="left" valign="top">1.841</td><td align="left" valign="top">0.404</td><td align="left" valign="top">Peripheral emerging theme (x&#x2265;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Rare disease</td><td align="left" valign="top">3</td><td align="left" valign="top">4</td><td align="left" valign="top">1.080</td><td align="left" valign="top">0.452</td><td align="left" valign="top">Peripheral obsolete theme (x&#x003C;0, y&#x003C;3.51, z&#x003C;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Computational social science</td><td align="left" valign="top">2</td><td align="left" valign="top">2</td><td align="left" valign="top">0.360</td><td align="left" valign="top">0.551</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Epidemiology</td><td align="left" valign="top">2</td><td align="left" valign="top">4</td><td align="left" valign="top">1.285</td><td align="left" valign="top">0.788</td><td align="left" valign="top">Peripheral obsolete theme (x&#x003C;0, y&#x003C;3.51, z&#x003C;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Patient forum</td><td align="left" valign="top">2</td><td align="left" valign="top">5</td><td align="left" valign="top">1.318</td><td align="left" valign="top">0.469</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Patient-reported outcome measures</td><td align="left" valign="top">2</td><td align="left" valign="top">6</td><td align="left" valign="top">1.959</td><td align="left" valign="top">0.840</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Post-acute COVID-19 syndrome</td><td align="left" valign="top">2</td><td align="left" valign="top">5</td><td align="left" valign="top">1.755</td><td align="left" valign="top">0.925</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Psychological</td><td align="left" valign="top">2</td><td align="left" valign="top">3</td><td align="left" valign="top">0.786</td><td align="left" valign="top">1.153</td><td align="left" valign="top">Peripheral emerging theme (x&#x2265;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Quality of life</td><td align="left" valign="top">2</td><td align="left" valign="top">1</td><td align="left" valign="top">0.181</td><td align="left" valign="top">0.507</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Symptom</td><td align="left" valign="top">2</td><td align="left" valign="top">5</td><td align="left" valign="top">2.282</td><td align="left" valign="top">0.939</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top" colspan="6">Cluster 3<sup><xref ref-type="table-fn" rid="table3fn5">e</xref></sup> (n=8)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Pandemics</td><td align="left" valign="top">9</td><td align="left" valign="top">23</td><td align="left" valign="top">8.704</td><td align="left" valign="top">0.567</td><td align="left" valign="top">Core emerging hotspot (x&#x2265;0, y&#x2265;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Vaccination</td><td align="left" valign="top">6</td><td align="left" valign="top">18</td><td align="left" valign="top">7.074</td><td align="left" valign="top">0.347</td><td align="left" valign="top">Core emerging hotspot (x&#x2265;0, y&#x2265;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Vaccination hesitancy</td><td align="left" valign="top">4</td><td align="left" valign="top">7</td><td align="left" valign="top">2.706</td><td align="left" valign="top">0.163</td><td align="left" valign="top">Peripheral emerging theme (x&#x2265;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Attitude</td><td align="left" valign="top">3</td><td align="left" valign="top">14</td><td align="left" valign="top">5.578</td><td align="left" valign="top">0.340</td><td align="left" valign="top">Potential breakthrough theme (x&#x003C;0, y&#x2265;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>SARS-CoV-2</td><td align="left" valign="top">3</td><td align="left" valign="top">13</td><td align="left" valign="top">5.132</td><td align="left" valign="top">0.150</td><td align="left" valign="top">Potential breakthrough theme (x&#x003C;0, y&#x2265;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Human</td><td align="left" valign="top">2</td><td align="left" valign="top">4</td><td align="left" valign="top">1.927</td><td align="left" valign="top">0.480</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Influenza</td><td align="left" valign="top">2</td><td align="left" valign="top">4</td><td align="left" valign="top">1.927</td><td align="left" valign="top">0.480</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Twitter mining</td><td align="left" valign="top">2</td><td align="left" valign="top">4</td><td align="left" valign="top">1.447</td><td align="left" valign="top">0.411</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top" colspan="6">Cluster 4<sup><xref ref-type="table-fn" rid="table3fn6">f</xref></sup> (n=13)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>COVID-19</td><td align="left" valign="top">43</td><td align="left" valign="top">49</td><td align="left" valign="top">16.294</td><td align="left" valign="top">0.326</td><td align="left" valign="top">Core emerging hotspot (x&#x2265;0, y&#x2265;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Twitter</td><td align="left" valign="top">34</td><td align="left" valign="top">37</td><td align="left" valign="top">13.168</td><td align="left" valign="top">0.829</td><td align="left" valign="top">Core emerging hotspot (x&#x2265;0, y&#x2265;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Infodemiology</td><td align="left" valign="top">14</td><td align="left" valign="top">25</td><td align="left" valign="top">8.831</td><td align="left" valign="top">0.220</td><td align="left" valign="top">Core emerging hotspot (x&#x2265;0, y&#x2265;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Topic modeling</td><td align="left" valign="top">10</td><td align="left" valign="top">18</td><td align="left" valign="top">6.685</td><td align="left" valign="top">0.678</td><td align="left" valign="top">Core emerging hotspot (x&#x2265;0, y&#x2265;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Coronavirus</td><td align="left" valign="top">8</td><td align="left" valign="top">13</td><td align="left" valign="top">5.151</td><td align="left" valign="top">0.173</td><td align="left" valign="top">Core emerging hotspot (x&#x2265;0, y&#x2265;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Tweet</td><td align="left" valign="top">6</td><td align="left" valign="top">15</td><td align="left" valign="top">5.508</td><td align="left" valign="top">0.603</td><td align="left" valign="top">Immature but declining theme (x&#x003C;0, y&#x2265;3.51, z&#x003C;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Public opinion</td><td align="left" valign="top">4</td><td align="left" valign="top">5</td><td align="left" valign="top">1.906</td><td align="left" valign="top">0.800</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Communicable diseases</td><td align="left" valign="top">3</td><td align="left" valign="top">4</td><td align="left" valign="top">1.704</td><td align="left" valign="top">0.291</td><td align="left" valign="top">Peripheral mature theme (x&#x2265;0, y&#x003C;3.51, z&#x003C;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Internet</td><td align="left" valign="top">3</td><td align="left" valign="top">5</td><td align="left" valign="top">2.005</td><td align="left" valign="top">0.673</td><td align="left" valign="top">Peripheral mature theme (x&#x2265;0, y&#x003C;3.51, z&#x003C;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Communication</td><td align="left" valign="top">2</td><td align="left" valign="top">7</td><td align="left" valign="top">2.648</td><td align="left" valign="top">0.245</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Electronic nicotine delivery systems</td><td align="left" valign="top">2</td><td align="left" valign="top">4</td><td align="left" valign="top">1.250</td><td align="left" valign="top">0.668</td><td align="left" valign="top">Peripheral obsolete theme (x&#x003C;0, y&#x003C;3.51, z&#x003C;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Geolocation</td><td align="left" valign="top">2</td><td align="left" valign="top">6</td><td align="left" valign="top">1.965</td><td align="left" valign="top">0.770</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Neural network</td><td align="left" valign="top">2</td><td align="left" valign="top">6</td><td align="left" valign="top">2.124</td><td align="left" valign="top">0.273</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top" colspan="6">Cluster 5<sup><xref ref-type="table-fn" rid="table3fn7">g</xref></sup> (n=10)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Content analysis</td><td align="left" valign="top">5</td><td align="left" valign="top">28</td><td align="left" valign="top">10.239</td><td align="left" valign="top">0.354</td><td align="left" valign="top">Potential breakthrough theme (x&#x003C;0, y&#x2265;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Facebook</td><td align="left" valign="top">5</td><td align="left" valign="top">10</td><td align="left" valign="top">2.980</td><td align="left" valign="top">0.519</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Adolescent</td><td align="left" valign="top">3</td><td align="left" valign="top">9</td><td align="left" valign="top">3.117</td><td align="left" valign="top">0.618</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Association rule mining</td><td align="left" valign="top">3</td><td align="left" valign="top">7</td><td align="left" valign="top">2.490</td><td align="left" valign="top">0.518</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Health communication</td><td align="left" valign="top">2</td><td align="left" valign="top">3</td><td align="left" valign="top">1.296</td><td align="left" valign="top">0.175</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Health promotion</td><td align="left" valign="top">2</td><td align="left" valign="top">5</td><td align="left" valign="top">1.930</td><td align="left" valign="top">0.876</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Smoking</td><td align="left" valign="top">2</td><td align="left" valign="top">9</td><td align="left" valign="top">2.878</td><td align="left" valign="top">0.387</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Social media analysis</td><td align="left" valign="top">2</td><td align="left" valign="top">6</td><td align="left" valign="top">1.984</td><td align="left" valign="top">0.450</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Tobacco</td><td align="left" valign="top">2</td><td align="left" valign="top">8</td><td align="left" valign="top">2.620</td><td align="left" valign="top">0.498</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>User engagement</td><td align="left" valign="top">2</td><td align="left" valign="top">3</td><td align="left" valign="top">1.255</td><td align="left" valign="top">0.311</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top" colspan="6">Cluster 6<sup><xref ref-type="table-fn" rid="table3fn8">h</xref></sup> (n=13)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Sentiment analysis</td><td align="left" valign="top">25</td><td align="left" valign="top">21</td><td align="left" valign="top">7.476</td><td align="left" valign="top">0.462</td><td align="left" valign="top">Core emerging hotspot (x&#x2265;0, y&#x2265;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Deep learning</td><td align="left" valign="top">9</td><td align="left" valign="top">15</td><td align="left" valign="top">5.526</td><td align="left" valign="top">0.351</td><td align="left" valign="top">Potential breakthrough theme (x&#x003C;0, y&#x2265;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Latent Dirichlet allocation</td><td align="left" valign="top">6</td><td align="left" valign="top">19</td><td align="left" valign="top">7.123</td><td align="left" valign="top">0.805</td><td align="left" valign="top">Core emerging hotspot (x&#x2265;0, y&#x2265;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Reddit</td><td align="left" valign="top">6</td><td align="left" valign="top">14</td><td align="left" valign="top">4.554</td><td align="left" valign="top">0.699</td><td align="left" valign="top">Potential breakthrough theme (x&#x003C;0, y&#x2265;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Social networking</td><td align="left" valign="top">6</td><td align="left" valign="top">9</td><td align="left" valign="top">3.350</td><td align="left" valign="top">0.627</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Social network analysis</td><td align="left" valign="top">5</td><td align="left" valign="top">3</td><td align="left" valign="top">1.074</td><td align="left" valign="top">0.382</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Vaccine</td><td align="left" valign="top">5</td><td align="left" valign="top">19</td><td align="left" valign="top">7.156</td><td align="left" valign="top">1.018</td><td align="left" valign="top">Potential breakthrough theme (x&#x003C;0, y&#x2265;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Crowdsourcing</td><td align="left" valign="top">4</td><td align="left" valign="top">9</td><td align="left" valign="top">2.570</td><td align="left" valign="top">0.603</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Network analysis</td><td align="left" valign="top">3</td><td align="left" valign="top">8</td><td align="left" valign="top">2.729</td><td align="left" valign="top">0.637</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>HPV</td><td align="left" valign="top">2</td><td align="left" valign="top">4</td><td align="left" valign="top">1.603</td><td align="left" valign="top">0.453</td><td align="left" valign="top">Peripheral obsolete theme (x&#x003C;0, y&#x003C;3.51, z&#x003C;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Recurrent neural networks</td><td align="left" valign="top">2</td><td align="left" valign="top">3</td><td align="left" valign="top">0.789</td><td align="left" valign="top">0.382</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Social media data</td><td align="left" valign="top">2</td><td align="left" valign="top">3</td><td align="left" valign="top">0.746</td><td align="left" valign="top">0.202</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Word embedding</td><td align="left" valign="top">2</td><td align="left" valign="top">2</td><td align="left" valign="top">0.671</td><td align="left" valign="top">0.639</td><td align="left" valign="top">Exploratory emerging theme (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5)</td></tr></tbody></table><table-wrap-foot><fn id="table3fn1"><p><sup>a</sup>Freq: frequency.</p></fn><fn id="table3fn2"><p><sup>b</sup>Representative terms can be identified from centroid distance values, providing both structural and semantic insights into each cluster. A smaller centroid distance indicates that a keyword is closer to the cluster center and thus more representative of the cluster&#x2019;s overall theme.</p></fn><fn id="table3fn3"><p><sup>c</sup>Cluster 1: Candidate incubator pool of peripheral cross-cutting topics in health-related social media mining.</p></fn><fn id="table3fn4"><p><sup>d</sup>Cluster 2: Computational methods in health informatics.</p></fn><fn id="table3fn5"><p><sup>e</sup>Cluster 3: Public attitudes and sociopsychological determinants.</p></fn><fn id="table3fn6"><p><sup>f</sup>Cluster 4: Infodemiology and the COVID-19 information ecosystem.</p></fn><fn id="table3fn7"><p><sup>g</sup>Cluster 5: Health communication and public health engagement.</p></fn><fn id="table3fn8"><p><sup>h</sup>Cluster 6: Social media analysis and network methods.</p></fn></table-wrap-foot></table-wrap><p>The UMAP-HDBSCAN clustering of the author keywords yielded 6 thematic clusters (<xref ref-type="fig" rid="figure5">Figure 5</xref> and <xref ref-type="table" rid="table3">Table 3</xref>) with detailed statistics (Table S4 in <xref ref-type="supplementary-material" rid="app6">Multimedia Appendix 6</xref>). Their 3D strategic positioning has been summarized in <xref ref-type="fig" rid="figure6">Figure 6</xref> (2D projections are shown in Figures S7-S9 in <xref ref-type="supplementary-material" rid="app1">Multimedia Appendix 1</xref>) and <xref ref-type="table" rid="table3">Table 3</xref>.</p><fig position="float" id="figure5"><label>Figure 5.</label><caption><p>Keyword clusters identified by Uniform Manifold Approximation and Projection (UMAP)&#x2013;Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) with cluster-level metrics. This image illustrates the thematic structure obtained by performing semantic-structural integration of keywords, followed by UMAP to reduce high-dimensional similarity relationships in 2D space and HDBSCAN for density-based clustering. The horizontal and vertical axes represent the 2 UMAP embedding coordinates (UMAP-1 and UMAP-2), which are precomputed as 2D representations of distances. The relative positions of points on the plane indicate the comprehensive similarity between keywords (closer distances indicate greater semantic and co-occurrence similarity). Different colors denote distinct cluster labels (clusters 2&#x2010;6 represent assigned thematic clusters). The blue cross labeled cluster 1 (noise; HDBSCAN label=&#x2212;1) indicates a set of keywords not stably assigned to any high-density cluster by HDBSCAN (unassigned/peripheral set), which does not equate to meaninglessness. Point size is proportional to keyword frequency (Freq) in the sample literature, reflecting each keyword&#x2019;s relative importance and coverage within the dataset. Six labels were identified under this parameter configuration: 5 nonnoise thematic clusters and 1 unassigned set.</p></caption><graphic alt-version="no" mimetype="image" position="float" xlink:type="simple" xlink:href="jmir_v28i1e86200_fig05.png"/></fig><fig position="float" id="figure6"><label>Figure 6.</label><caption><p>Three-dimensional thematic map of research on social media mining from January 1, 2015, to July 31, 2025. The image maps the keyword thematic structure identified by Uniform Manifold Approximation and Projection (UMAP)&#x2013;Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) onto a 3D strategic coordinate system to assess the historical accumulation, structural influence, and recent growth potential of different research directions. The x-axis (maturity) represents the degree of research accumulation associated with each keyword, and it is characterized by the number of papers associated with the keyword, namely its occurrence frequency, and is normalized. The y-axis (influence) represents the structural importance of the keywords within the co-occurrence networks, and it is represented by network connection strength, namely centrality metrics, and is normalized. The z-axis (recency) represents the recent activity level of the keywords, and it is represented by recency metrics, such as the proportion of publications in the last 5 years, and is normalized. Each point in the image represents a keyword, with color denoting its assigned cluster. Clusters 2&#x2010;6 represent assigned thematic clusters. Cluster 1 (noise) includes peripheral keywords that are not stably assigned, and it is used to present cross-thematic or sparse yet potentially valuable research signals. The red coordinate axis and central red dot denote the reference origin in 3D space. The black dashed planes mark the overall mean thresholds for the 3 axes (eg, mean of x=0.00, mean of y=3.51, and mean of z=0.66), which are used to divide the space into multiple strategic quadrants. This identifies distinct research directions such as core hotspots, potential breakthroughs, emerging frontiers, and declining themes.</p></caption><graphic alt-version="no" mimetype="image" position="float" xlink:type="simple" xlink:href="jmir_v28i1e86200_fig06.png"/></fig></sec><sec id="s3-3-2"><title>Cluster 1 (Noise): Candidate Incubator Pool of Peripheral Cross-Cutting Topics in Health-Related SMM</title><p>The UMAP-HDBSCAN analysis identified 6 thematic clusters, with 1 cluster identified as an unassigned peripheral noise set (cluster 1). We retained this unassigned set as a cluster for structural interpretation, as it contained meaningful cross-thematic elements, keywords disrupted by major event windows, and early-stage keywords. Its retention helped prevent information loss and aided in the capture of potentially meaningful cross-cutting and early-stage signals that did not form stable dense regions in the embedding space. The supporting rationale is presented in <xref ref-type="supplementary-material" rid="app7">Multimedia Appendix 7</xref>. This cluster is a collection of heterogeneous, peripheral, and thematically varied research that cannot be categorized into any other single specialized cluster. The themes in this group are highly fragmented, with representative keywords ranging from &#x201C;public health,&#x201D; &#x201C;mental health,&#x201D; and &#x201C;perception&#x201D; to specific technologies like &#x201C;big data&#x201D; and &#x201C;LSTM.&#x201D; However, this &#x201C;noise&#x201D; cluster is not without value. It represents the emerging and cross-disciplinary signals of the field, serving as an &#x201C;incubator&#x201D; for potential research directions and an important reference for defining the boundaries of current core research areas. The majority of keywords in this group fell within the <italic>exploratory emerging theme</italic> quadrant (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5), representing recently active, immature research topics such as &#x201C;neoplasms,&#x201D; &#x201C;YouTube,&#x201D; etc. Therefore, we treated cluster 1 as a candidate incubator pool of peripheral cross-cutting topics that may contain topics that could evolve into coherent thematic clusters as the field matures, and its detailed interpretation is provided in <xref ref-type="supplementary-material" rid="app7">Multimedia Appendix 7</xref>.</p></sec><sec id="s3-3-3"><title>Cluster 2: Computational Methods in Health Informatics</title><p>Cluster 2 represents the technical and methodological foundations of the entire research field. It focuses on the application of core computational technologies, particularly &#x201C;SMM,&#x201D; &#x201C;NLP,&#x201D; and &#x201C;ML,&#x201D; to diverse health scenarios. Its research involves the development and application of these technologies to address specific issues such as &#x201C;drug-related side effects and adverse reactions,&#x201D; &#x201C;pharmacovigilance,&#x201D; identification of &#x201C;rare diseases,&#x201D; and analysis of &#x201C;patient-reported outcomes&#x201D; and &#x201C;quality of life.&#x201D; It aims to efficiently extract valuable health insights from vast amounts of unstructured text data, serving as the technical foundation for all other applied clusters. Most topics in this cluster were located in the <italic>exploratory emerging theme</italic> quadrant (x&#x003C;0, y&#x003C;3.51, z&#x2265;0.5), particularly the keywords representing patient-reported outcomes and the topic about symptoms.</p></sec><sec id="s3-3-4"><title>Cluster 3: Public Attitudes and Sociopsychological Determinants</title><p>Cluster 3 represents a mature field dedicated to analyzing public psychology and reactions during health crises, particularly &#x201C;pandemics&#x201D; and &#x201C;vaccination hesitancy.&#x201D; Using methods like &#x201C;Twitter mining,&#x201D; it delves into public &#x201C;attitudes,&#x201D; sentiments, and concerns. This cluster aims to uncover the deep-seated sociopsychological drivers behind public health attitudes, providing scientific evidence for developing targeted communication strategies and public policies.</p></sec><sec id="s3-3-5"><title>Cluster 4: Infodemiology and the COVID-19 Information Ecosystem</title><p>Cluster 4 represents the most critical and dynamic research frontier, driven by the major global health crisis of &#x201C;COVID-19.&#x201D; This cluster focuses on &#x201C;infodemiology,&#x201D; examining how various types of information, including misinformation, spread, evolved, and influenced public opinion on social media platforms like Twitter during the pandemic. This is a quintessential crisis-driven research domain characterized by high timeliness and strategic importance. It aims to provide real-time monitoring tools and response strategies for addressing information challenges during public health emergencies.</p></sec><sec id="s3-3-6"><title>Cluster 5: Health Communication and Public Health Engagement</title><p>Cluster 5 represents a highly application-oriented proactive intervention field, centered on designing and evaluating effective digital health communication campaigns. Using platforms like Facebook as its primary arena and focusing on specific issues, such as tobacco control, it uses &#x201C;user engagement&#x201D; as a key metric. Through methods like &#x201C;content analysis,&#x201D; it explores which information frameworks most effectively reach and influence target audiences (eg, adolescents). Its findings offer direct practical guidance for public health institutions to optimize routine online health promotion efforts. All the themes in this group represent highly promising research trends for the coming decades.</p></sec><sec id="s3-3-7"><title>Cluster 6: Social Media Analysis and Network Methods</title><p>Cluster 6 represents a group of advanced analytical techniques, deepening and advancing the methodologies explored in cluster 2, which involves general computational methods. It incorporates established technologies like &#x201C;sentiment analysis&#x201D; and &#x201C;topic modeling&#x201D; while introducing more sophisticated approaches such as &#x201C;deep learning&#x201D; and &#x201C;social network analysis.&#x201D; Its core distinction lies in focusing not merely on the content of the text, but on revealing the mechanism of information dissemination. By analyzing user connections and the network structures of information flow, it uncovers the transmission dynamics of complex digital discourse.</p></sec><sec id="s3-3-8"><title>Global Thematic Landscape and Cross-Cluster Bridging</title><p>Overall, the representative keywords in each quadrant clearly outline the stratified research landscape: flagship core themes, such as NLP, ML, and COVID-19, drive the development of the entire field; traditional pillars like content analysis and health communication maintain high influence but have waned in popularity; emerging potential topics, such as vaccination hesitancy and deep learning, are on the rise; and marginal incubation and dormant areas provide a window for observing future dynamic evolution.</p><p>To further explore the relationships between different thematic clusters, we measured the cross-cluster coupling strength between thematic clusters in the keyword co-occurrence network, defined as the sum of weights on cross-cluster edges connecting 2 distinct clusters, and identified bridging keywords contributing the most to coupling (Figure S10 in <xref ref-type="supplementary-material" rid="app1">Multimedia Appendix 1</xref> and Tables S5-S8 in <xref ref-type="supplementary-material" rid="app6">Multimedia Appendix 6</xref>). The results indicated that cluster 4 occupied a central hub position in cross-cluster interactions, exhibiting strong coupling with multiple thematic clusters.</p><p>Among all cluster pairs, the highest coupling strength was observed between cluster 1 and cluster 4 (cross-cluster strength=10.649), followed by cluster 4 and cluster 6 (10.404), cluster 3 and cluster 4 (7.989), cluster 2 and cluster 4 (7.216), and cluster 2 and cluster 6 (6.967; Table S6 in <xref ref-type="supplementary-material" rid="app6">Multimedia Appendix 6</xref>). Overall, these highly coupled clusters primarily revolve around cross-cluster co-occurrence links between cluster 4 and methodological clusters (cluster 2/6). It should be noted that cluster 1 consists of a heterogeneous set of peripheral keywords annotated by HDBSCAN. Therefore, its high coupling with core clusters more likely reflects conceptual convergence and enhanced cross-domain connections within specific event windows, such as the peak of pandemic-related research, rather than a single cohesive theme.</p><p>Furthermore, we analyzed the top 5 cross-cluster strength pairs by identifying their pair-specific bridging keywords (<xref ref-type="table" rid="table4">Table 4</xref> and Table S7 in <xref ref-type="supplementary-material" rid="app6">Multimedia Appendix 6</xref>). For the cluster 1-cluster 4 pair, the primary bridging keywords were COVID-19 (contribution=3.052), public health (2.104), Twitter (2.018), topic modeling (1.549), and coronavirus (1.429). For cluster 4&#x2010;cluster 6, the bridging keywords were Twitter (2.724), COVID-19 (2.540), deep learning (2.271), sentiment analysis (1.650), and topic modeling (1.646). For cluster 3-cluster 4, the bridging keywords were COVID-19 (3.013), pandemics (2.182), Twitter (1.973), attitude (1.532), and tweet (1.440). For cluster 2-cluster 4, the bridging keywords primarily included NLP (3.031), COVID-19 (2.081), information epidemiology (1.541), Twitter (1.208), and ML (1.135). For cluster 2-cluster 6, the bridging keywords included NLP (2.476), SMM (2.436), deep learning (1.360), sentiment analysis (1.359), and LDA (1.288). These findings reflect significant tool and algorithm sharing across methodological clusters.</p><table-wrap id="t4" position="float"><label>Table 4.</label><caption><p>Top 5 intercluster pairs (ranked by cross-cluster strength) with their top 5 pair-specific bridging keywords.</p></caption><table id="table4" frame="hsides" rules="groups"><thead><tr><td align="left" valign="top">Cluster pair and keyword</td><td align="left" valign="top">Source cluster/keyword cluster</td><td align="left" valign="top">Keyword pair contribution<sup><xref ref-type="table-fn" rid="table4fn1">a</xref></sup></td><td align="left" valign="top">Cross-cluster strength<sup><xref ref-type="table-fn" rid="table4fn2">b</xref></sup></td></tr></thead><tbody><tr><td align="left" valign="top" colspan="3">Cluster 1<sup><xref ref-type="table-fn" rid="table4fn3">c</xref></sup> (noise)-cluster 4<sup><xref ref-type="table-fn" rid="table4fn4">d</xref></sup></td><td align="left" valign="top">10.649</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>COVID-19</td><td align="left" valign="top">Cluster 4</td><td align="left" valign="top">3.052</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Public health</td><td align="left" valign="top">Cluster 1 (noise)</td><td align="left" valign="top">2.104</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Twitter</td><td align="left" valign="top">Cluster 4</td><td align="left" valign="top">2.018</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Topic modeling</td><td align="left" valign="top">Cluster 4</td><td align="left" valign="top">1.549</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Coronavirus</td><td align="left" valign="top">Cluster 4</td><td align="left" valign="top">1.429</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top" colspan="3">Cluster 4-cluster 6<sup><xref ref-type="table-fn" rid="table4fn5">e</xref></sup></td><td align="left" valign="top">10.404</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Twitter</td><td align="left" valign="top">Cluster 4</td><td align="left" valign="top">2.724</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>COVID-19</td><td align="left" valign="top">Cluster 4</td><td align="left" valign="top">2.540</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Deep learning</td><td align="left" valign="top">Cluster 6</td><td align="left" valign="top">2.271</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Sentiment analysis</td><td align="left" valign="top">Cluster 6</td><td align="left" valign="top">1.650</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Topic modeling</td><td align="left" valign="top">Cluster 4</td><td align="left" valign="top">1.646</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top" colspan="3">Cluster 3<sup><xref ref-type="table-fn" rid="table4fn6">f</xref></sup>-cluster 4</td><td align="left" valign="top">7.989</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>COVID-19</td><td align="left" valign="top">Cluster 4</td><td align="left" valign="top">3.013</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Pandemics</td><td align="left" valign="top">Cluster 3</td><td align="left" valign="top">2.182</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Twitter</td><td align="left" valign="top">Cluster 4</td><td align="left" valign="top">1.973</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Attitude</td><td align="left" valign="top">Cluster 3</td><td align="left" valign="top">1.532</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Tweet</td><td align="left" valign="top">Cluster 4</td><td align="left" valign="top">1.440</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top" colspan="3">Cluster 2<sup><xref ref-type="table-fn" rid="table4fn7">g</xref></sup>-cluster 6</td><td align="left" valign="top">7.216</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Natural language processing</td><td align="left" valign="top">Cluster 2</td><td align="left" valign="top">2.476</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Social media mining</td><td align="left" valign="top">Cluster 2</td><td align="left" valign="top">2.436</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Deep learning</td><td align="left" valign="top">Cluster 6</td><td align="left" valign="top">1.360</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Sentiment analysis</td><td align="left" valign="top">Cluster 6</td><td align="left" valign="top">1.359</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Latent Dirichlet allocation</td><td align="left" valign="top">Cluster 6</td><td align="left" valign="top">1.288</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top" colspan="3">Cluster 2-cluster 4</td><td align="left" valign="top">6.967</td></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Natural language processing</td><td align="left" valign="top">Cluster 2</td><td align="left" valign="top">3.031</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>COVID-19</td><td align="left" valign="top">Cluster 4</td><td align="left" valign="top">2.081</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Infodemiology</td><td align="left" valign="top">Cluster 4</td><td align="left" valign="top">1.541</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Twitter</td><td align="left" valign="top">Cluster 4</td><td align="left" valign="top">1.208</td><td align="left" valign="top"/></tr><tr><td align="left" valign="top"><named-content content-type="indent">&#x00A0;&#x00A0;&#x00A0;&#x00A0;</named-content>Machine learning</td><td align="left" valign="top">Cluster 2</td><td align="left" valign="top">1.135</td><td align="left" valign="top"/></tr></tbody></table><table-wrap-foot><fn id="table4fn1"><p><sup>a</sup>Keyword pair contribution represents the contribution of the keyword to the cross-cluster coupling strength of the cluster, which is part of the cumulative contribution to cross-cluster edge weights.</p></fn><fn id="table4fn2"><p><sup>b</sup>Cross-cluster strength is the sum of weights of cross-cluster edges connecting keywords from 2 distinct thematic clusters, which is used to quantify the degree of overlap between thematic clusters.</p></fn><fn id="table4fn3"><p><sup>c</sup>Cluster 1: Candidate incubator pool of peripheral cross-cutting topics in health-related social media mining.</p></fn><fn id="table4fn4"><p><sup>d</sup>Cluster 4: Infodemiology and the COVID-19 information ecosystem.</p></fn><fn id="table4fn5"><p><sup>e</sup>Cluster 6: Social media analysis and network methods.</p></fn><fn id="table4fn6"><p><sup>f</sup>Cluster 3: Public attitudes and sociopsychological determinants.</p></fn><fn id="table4fn7"><p><sup>g</sup>Cluster 2: Computational methods in health informatics.</p></fn></table-wrap-foot></table-wrap><p>At the global level, keywords with the highest bridging strength clustered around event-driven themes and generic methodological/platform elements (Table S8 in <xref ref-type="supplementary-material" rid="app6">Multimedia Appendix 6</xref>): COVID-19 (cross-cluster strength=12.520), NLP (10.642), Twitter (9.408), public health (7.669), content analysis (7.472), and pandemics (7.035). These bridging keywords jointly constructed a cross-cluster connection framework centered on event topics, platform data, and computational methods.</p></sec></sec><sec id="s3-4"><title>Dynamic Temporal Analysis</title><p>We divided the study period into 3 slices using the explosive growth time stamp as the temporal boundary: 2015&#x2010;2019, 2020&#x2010;2023, and 2024-2025 (July 31, 2025). For each time slice, we scanned K&#x2208;[2.20]. The Silhouette and Calinski-Harabasz (CH) indices were noninformative in the spectral space (yielding values near zero or remaining constant), and thus, we prioritized modularity and used the Davies-Bouldin Index elbow as a safeguard against overfragmentation. The decision to fix the number of spectral clusters at K=3 (2015&#x2010;2019), K=8 (2020&#x2010;2023), and K=5 (2024&#x2010;2025) was guided by systematic evaluations across multiple criteria. While the Silhouette and CH indices were largely uninformative in the spectral embedding space, modularity consistently exhibited clear local maxima at these values, and Davies-Bouldin curves indicated elbows in the same vicinity. Selecting K at these points avoided both undersegmentation and overfragmentation while maximizing structural separation and interpretability. This choice ensured that each slice captured the dominant thematic structure of the period&#x2014;technology-driven methodological consolidation before 2020, problem-driven thematic diversification from 2020 to 2023, and thematic convergence and strategic priorities after 2023. Figures S11-S14 in <xref ref-type="supplementary-material" rid="app1">Multimedia Appendix 1</xref> reveal the thematic changes across the 3 time slices.</p><sec id="s3-4-1"><title>Technology-Driven Methodological Consolidation (2015-2019)</title><p>During this phase, research themes emerged from the convergence of ML and NLP technological advancements with specific needs in pharmacovigilance and rare diseases. Though studies were relatively fragmented, they exhibited strong purposefulness, with technology driving applications. While this marked only the foundational stage of methodology and emerging applications, core methodological frameworks began to come into focus.</p></sec><sec id="s3-4-2"><title>Problem-Driven Thematic Diversification (2020-2023)</title><p>The COVID-19 pandemic, as a black swan event, provided unprecedented data, attention, and application scenarios for the field. Consequently, COVID-19 became the absolute domain focus, leading to explosive growth in research scope and rapid diversification of research themes. This phase was problem-driven&#x2014;specifically pandemic-driven&#x2014;mobilizing all methodological approaches (ML, NLP, and sentiment analysis) and application directions (vaccines, information monitoring, and public attitudes) to meet its requirements. This process fostered a mature, complex, and highly differentiated research ecosystem.</p></sec><sec id="s3-4-3"><title>Thematic Convergence and Strategic Priorities (2024-2025)</title><p>As the pandemic&#x2019;s impact began to recede, research themes started to reconsolidate. Methods like ML, NLP, and content analysis remained central. This indicates that technologies validated and refined during the pandemic, such as sentiment analysis and deep learning, are maturing and being systematically applied to health research. Furthermore, the latest research no longer pursues isolated hot topics, and instead, it applies validated methodologies to prior, long-term public health challenges like &#x201C;vaccines&#x201D; and &#x201C;mental health.&#x201D;</p></sec><sec id="s3-4-4"><title>Temporal Dynamics and Their Alignment With the Global Structure</title><p>Temporal spectral clusters aligned strongly with the global UMAP-HDBSCAN structure, with most spectral clusters mapping to a single global cluster at Jaccard scores above 0.3. The methodological cluster consistently matched the global &#x201C;ML/NLP-driven social media analysis&#x201D; domain. On the other hand, COVID-19 and mental health clusters overlapped with the global &#x201C;Public health and pandemic response&#x201D; and &#x201C;Psychosocial health&#x201D; domains, respectively.</p><p>Compared with the 3D strategic coordinate map, temporal slices provided a dynamic counterpart. The 2015&#x2010;2019 clusters coincided with the map&#x2019;s high-maturity but low-recency quadrant, representing methodological foundations. The 2020&#x2010;2023 diversification overlapped with clusters in the high-influence and high-recency space, reflecting crisis-driven impact. The 2024&#x2010;2025 convergence mapped onto the high-maturity and moderate-recency zone, indicating thematic stabilization.</p></sec></sec><sec id="s3-5"><title>Validation</title><p>We conducted sensitivity analyses on burst detection and clustering hyperparameters to validate the methods and findings at the macro-level.</p><p>In the parameter sensitivity analysis for burst detection, we selected the configuration with <italic>&#x03B3;</italic>=0.5, a slice width of 3 years, and a minimum burst length of 1, which demonstrated strong overall performance. This setting achieved optimal values in coverage (1.0) and realism (1.8), ensuring that detected bursts were both comprehensive and plausible. Although its standalone stability scores (Jaccard=0.108; Spearman=0.259) appeared modest, cross-parameter comparisons (Table S9 in <xref ref-type="supplementary-material" rid="app6">Multimedia Appendix 6</xref>) revealed that alternative configurations yielded Jaccard similarities &#x2265;0.97 and Spearman correlations of approximately 0.70 with the baseline, confirming that the results are consistently reproducible across the parameter space. Thus, G0.5_W3_L1 was chosen as the primary setting.</p><p>The internal metrics for clustering evaluation are presented in Table S10 in <xref ref-type="supplementary-material" rid="app6">Multimedia Appendix 6</xref>. As can be seen from the table, the balanced version with K=6 showed slightly better internal separation perception. In contrast, the fine version maintained &#x201C;good&#x201D; performance across major metrics while improving interpretability by an order of magnitude. Additionally, this study conducted 3 independent runs of fine_K6, all yielding nearly identical scores and key metrics. This indicates that the performance of fine_K6 is robust and insensitive to minor feature enhancements.</p><p>To assess the external validity of our clustering, we examined the relationship involving keyword network centrality and compared RCR distributions across clusters. As shown in Figure S15 in <xref ref-type="supplementary-material" rid="app1">Multimedia Appendix 1</xref>, keyword strength was significantly positively correlated with mean RCR (Spearman &#x03C1;=0.49; <italic>P</italic>&#x003C;.05), indicating that keywords occupying more central positions in the co-occurrence network tended to be associated with a higher citation impact (Figure S15 in <xref ref-type="supplementary-material" rid="app1">Multimedia Appendix 1</xref>). Moreover, RCR distributions varied substantially across clusters (Figure S15 in <xref ref-type="supplementary-material" rid="app1">Multimedia Appendix 1</xref>). Cluster 2 exhibited the highest citation impact (median RCR of approximately 1.5, with maximum values approaching 3), clearly exceeding other clusters. In contrast, clusters 1 and 4 showed lower mean RCR values (approximately 0.5&#x2010;0.7), while noise (cluster 1) had the lowest citation impact (mean RCR of approximately 0.45). Together, these findings demonstrate that the identified clusters not only reflect structural differences within the keyword network but also align with external citation impact, thereby validating the robustness and scientific relevance of our approach.</p><p>To enhance the interpretability of the clusters beyond quantitative indicators, we performed micro-level interpretive triangulation by systematically analyzing the introduction, contribution, limitations, and future directions reported in 28 representative studies (<xref ref-type="supplementary-material" rid="app4">Multimedia Appendix 4</xref>). The content analysis suggested that the thematic boundaries identified by clustering correspond closely to coherent conceptual domains. First, studies assigned to cluster 2 (NLP/ML-driven social media analysis) consistently emphasized methodological innovation. This validates the cluster&#x2019;s positioning as a methodological core in the strategic coordinate map where ML and NLP occupy high-influence and high-novelty regions [<xref ref-type="bibr" rid="ref39">39</xref>,<xref ref-type="bibr" rid="ref44">44</xref>]. Second, studies linked to clusters 1 and 2 highlighted empirical applications such as psychological health and suicide-related discussions [<xref ref-type="bibr" rid="ref9">9</xref>,<xref ref-type="bibr" rid="ref45">45</xref>]. Their reported contributions emphasized the utility of social media for monitoring population health behaviors, aligning with our dynamic slicing results where such content clusters emerged strongly during the COVID-19 period (2020&#x2010;2023) and then diversified into related subthemes. Third, the limitations articulated in these studies mirror the structural signals captured by clustering. Frequent reliance on Twitter and US-based data explains why Twitter appeared as a central hub in cluster 2 [<xref ref-type="bibr" rid="ref7">7</xref>,<xref ref-type="bibr" rid="ref46">46</xref>], while non-Twitter platforms (eg, Weibo, Facebook, and Google Maps reviews) were less frequently studied, corresponding to their marginal or emerging placement in the coordinate map. Similarly, methodological constraints in sentiment analysis, cross-lingual generalizability, and demographic representativeness corroborated the lower maturity scores of several peripheral clusters. Finally, the future research directions proposed by the authors include cross-platform data integration [<xref ref-type="bibr" rid="ref7">7</xref>,<xref ref-type="bibr" rid="ref46">46</xref>]. Deep learning methods, population stratification, and longitudinal prediction directly overlap with the potential breakthroughs located in the high-influence but low-maturity quadrants of our strategic map (eg, vaccine attitudes, digital health, and Weibo-based research). This convergence suggests that our framework not only captures the current structure of the field but also anticipates trajectories already envisioned by domain experts.</p><p>Collectively, this micro-level evidence mapping supports the interpretability of the thematic clusters and provides qualitative context that complements the quantitative indicators and external validation results, without being used as confirmatory evidence of model correctness. It also demonstrates that the clusters are semantically meaningful and consistent with the intellectual agenda of the field, thereby strengthening confidence in both the static and dynamic analyses.</p></sec></sec><sec id="s4" sec-type="discussion"><title>Discussion</title><sec id="s4-1"><title>Principal Findings</title><p>This study used a hybrid methodology that combined ML-based clustering with qualitative evidence mapping to systematically draw a strategic research landscape of SMM. The findings clearly demonstrated that this field has evolved into a complex and coherent ecosystem. This ecosystem revolves around &#x201C;infodemiology&#x201D; as its absolute core, powered by a dual engine of &#x201C;computational methods&#x201D; and &#x201C;analytical techniques,&#x201D; extending its influence toward multiple critical application frontiers, including pharmacovigilance, public listening, and health management. Our findings not only depict the current state of the field but also reveal its underlying driving mechanisms and future developmental trajectory.</p><p>This study systematically analyzed health-related SMM literature collected in PubMed from 2015 to mid-2025, revealing significant evolutionary trends over the past decade. Overall, publication output showed steady growth, with explosive increases from 2019 to 2020, indicating SMM&#x2019;s rising strategic importance as a tool for digital health and public health research. By integrating natural language model embeddings with network algorithm clustering, we identified 6 relatively independent yet interconnected thematic clusters. These spanned methodological anchors (NLP, ML, and social network analysis), socially driven topics (vaccine hesitancy and public listening), and priority themes reconverging in the postexplosive growth phase (health communication, mental health, and oncology). The distribution of these themes across a strategic 3D coordinate system (maturity, influence, and recency) further revealed the historical accumulation, current impact, and future potential of different research directions.</p><p>Time-slice analysis indicated that the knowledge structure of this field underwent three phased transitions: (1) the methodological consolidation from 2015 to 2019 laid the technical foundation; (2) the pandemic-driven surge from 2020 to 2023 accelerated thematic diversification, concentrating on various public health issues; and (3) by 2024&#x2010;2025, research refocused and converged on long-term strategic issues like cancer care and mental health. Through multiple validations, this study not only confirmed the robustness of clustering and strategic positioning but also demonstrated that these patterns align closely with the field&#x2019;s actual development trajectory. Collectively, the study&#x2019;s key findings provide a dynamic knowledge map for the digital health and health management field, spanning macro trends to micro evidence and offering a robust strategic reference for academic research, policy formulation, and practical applications.</p></sec><sec id="s4-2"><title>Comparison With Prior Work</title><p>Our finding that NLP and ML constitute the backbone of SMM aligns with recent reviews cataloging the rapid growth of computational methods in digital health [<xref ref-type="bibr" rid="ref47">47</xref>-<xref ref-type="bibr" rid="ref49">49</xref>]. However, some overviews have framed NLP as experimental in health care [<xref ref-type="bibr" rid="ref50">50</xref>,<xref ref-type="bibr" rid="ref51">51</xref>]. Our findings empirically demonstrate that, from a structural perspective, these technological pipelines have become firmly established as core architectures for social media&#x2013;based health analytics (<xref ref-type="table" rid="table3">Table 3</xref> and <xref ref-type="fig" rid="figure6">Figure 6</xref>). Additional empirical studies reinforce this interpretation. For instance, Ren et al [<xref ref-type="bibr" rid="ref52">52</xref>] demonstrated the effectiveness of NLP and emotion-based deep learning (ML) for depression detection on Reddit, while Low et al [<xref ref-type="bibr" rid="ref53">53</xref>] used NLP to identify vulnerable mental health support groups in online communities. Based on the above examples, these applications illustrate that methods are not only conceptually important but also practically operationalized in diverse health contexts. However, such research is often constrained by single-platform dependency and insufficient sample representativeness [<xref ref-type="bibr" rid="ref52">52</xref>,<xref ref-type="bibr" rid="ref54">54</xref>,<xref ref-type="bibr" rid="ref55">55</xref>]. This is precisely why this strategic map positions this cluster as mature but still requiring methodological improvement and extended applications. Its strategic implications, from a sociological perspective, suggest that the dominance of a single-language Twitter, Facebook, Reddit, or Weibo corpus may structurally exclude speakers of other languages and marginalized voices, thereby exacerbating health inequalities [<xref ref-type="bibr" rid="ref56">56</xref>-<xref ref-type="bibr" rid="ref58">58</xref>]. From a health management perspective, mature NLP and ML pipelines are now deployable for real-time pharmacovigilance and attitude monitoring (eg, vaccine hesitancy dashboards), shifting management challenges from feasibility to governance [<xref ref-type="bibr" rid="ref59">59</xref>-<xref ref-type="bibr" rid="ref62">62</xref>]. Public health practice can benefit from earlier detection of emerging risks, enabling targeted, cost-effective preventive measures that reduce systemic burdens like disease incidence costs [<xref ref-type="bibr" rid="ref63">63</xref>,<xref ref-type="bibr" rid="ref64">64</xref>]. Collectively, the fundamental methods serve as strategic infrastructure in SMM.</p><p>Moreover, our analysis captured the explosion of crisis-driven research themes since 2020. Research during this period exhibited pronounced diversification and fragmentation, characterized by the rapid emergence of numerous exploratory subfields. Previous research accurately commented on this explosive growth but fragmented landscape in SMM [<xref ref-type="bibr" rid="ref65">65</xref>]. However, to our knowledge, we provide data-driven evidence of an evolutionary trend in this field, shifting from fragmentation toward a focus on solving priority problems, based on our burst detection and time-slice analysis. Starting in 2024, as shown in our time-slice analysis, SMM research entered an era of integration characterized by a focus on addressing critical health issues. This signifies the field&#x2019;s transition from an exploratory phase of data-driven research to a mission-driven phase oriented toward identifying problems to solve. While acknowledging the fragmented explosion as a necessary phase, we critically highlight the ongoing structural shift toward a more mature research paradigm.</p></sec><sec id="s4-3"><title>Methodological Highlights</title><p>Another major contribution of this study lies in methodological innovation. Unlike previous bibliometric research that relied primarily on co-occurrence&#x2013;based keyword analysis, we proposed a fully automated, reproducible multilevel hybrid bibliometric methodology [<xref ref-type="bibr" rid="ref17">17</xref>,<xref ref-type="bibr" rid="ref18">18</xref>,<xref ref-type="bibr" rid="ref24">24</xref>,<xref ref-type="bibr" rid="ref29">29</xref>]. This approach of embedding-based mapping significantly enhances the interpretability and strategic value of the results while ensuring transparency and robustness [<xref ref-type="bibr" rid="ref66">66</xref>].</p><p>First, at the semantic level, we used 2 natural language models&#x2014;SPECTER2 and PubMedBERT&#x2014;to vectorize titles and abstracts, aggregating them at the keyword level [<xref ref-type="bibr" rid="ref66">66</xref>-<xref ref-type="bibr" rid="ref68">68</xref>]. This approach overcomes the limitations of traditional methods that rely solely on manual synonym merging, enabling topic identification to capture latent semantic patterns and contextual associations [<xref ref-type="bibr" rid="ref30">30</xref>]. In addition, at the structural level, we combined keyword co-occurrence networks with embedded similarity. Building upon the foundation, we constructed a hybrid similarity matrix that better aligns with actual disciplinary structures, achieving dual integration of semantics and structure.</p><p>Second, we used the nonparametric UMAP-HDBSCAN method to classify research themes. Systematic parameter grid searches and multiple random seed runs ensured the stability of clustering outcomes [<xref ref-type="bibr" rid="ref69">69</xref>,<xref ref-type="bibr" rid="ref70">70</xref>]. Unlike most studies relying solely on single clustering outputs, we further conducted internal (Jaccard, Silhouette, and Adjusted Rand Index) and external (RCR) validation. This was supplemented by micro-level evidence mapping and content analysis of representative papers. Such double validation could enhance clustering reliability.</p><p>Finally, this study proposes a 3D strategic coordinate map. By mapping clustering results onto the axes of maturity, influence, and recency, this framework transforms clustering outcomes into a knowledge map with strategic implications. This approach provides an intuitive tool for identifying hotspots, emerging frontiers, and declining fields.</p></sec><sec id="s4-4"><title>Future Research Trends</title><p>Based on the above analysis, we identified several key research gaps and propose the following future research directions:</p><list list-type="bullet"><list-item><p>Research direction 1: While some studies have explored the use of SMM to understand cancer patients&#x2019; use of and perceptions toward traditional, complementary, and integrative medicine through micro-level content analysis, systematic evaluations of SMM research in chronic disease fields&#x2014;represented by cancer&#x2014;remain lacking [<xref ref-type="bibr" rid="ref20">20</xref>,<xref ref-type="bibr" rid="ref71">71</xref>,<xref ref-type="bibr" rid="ref72">72</xref>]. The presence of &#x201C;neoplasms&#x201D; in cluster 1 of the static cluster analysis further corroborates this conclusion.</p></list-item><list-item><p>Research direction 2: Most studies focus on correlational descriptions, with some literature exploring responses to public health initiatives among different populations. However, causal inference remains lacking, leaving unanswered questions such as &#x201C;Can social media content truly demonstrate the impact of health promotion initiatives on changing health behaviors?&#x201D; and &#x201C;How do these behaviors change?&#x201D; [<xref ref-type="bibr" rid="ref73">73</xref>]. Future research could use mixed methods, integrating qualitative approaches to explore the underlying logic of health behavior change [<xref ref-type="bibr" rid="ref61">61</xref>].</p></list-item><list-item><p>Research direction 3: Future potential hotspots lie within cluster 1, such as SMM research on mental health and psychological disorders. This has been confirmed by the 3D coordinate map. Proactively leveraging new technologies like large language models to explore this domain using social media data will drive a shift from retrospective summarization to prospective prevention [<xref ref-type="bibr" rid="ref74">74</xref>,<xref ref-type="bibr" rid="ref75">75</xref>].</p></list-item><list-item><p>Research direction 4: The 3D strategic coordinate map showed topics like adolescent and tobacco positioned in niche yet high-potential zones. This indicates that targeted health communication and health behavior&#x2013;related issues, particularly those addressing specific populations, may evolve into core hotspots in future research [<xref ref-type="bibr" rid="ref76">76</xref>].</p></list-item><list-item><p>Research direction 5: Keywords like vaccination and attitude occupied either the high-impact or high-novelty quadrant in the 3D strategic coordinate map, marking a strategic frontier hotspot. Topics, such as vaccination hesitancy and public listening, experienced short-term bursts during the pandemic but landed closer to the peripheral emerging zone, indicating limited long-term stability [<xref ref-type="bibr" rid="ref60">60</xref>,<xref ref-type="bibr" rid="ref77">77</xref>]. Nevertheless, vaccine-related issues and public listening remain core academic focuses and will continue to evolve.</p></list-item></list><p>The outlined future directions can be consolidated into a transformable research framework. At the data level, comparative analyses involving cross-databases, multilingual datasets, and cross-platform sources should be encouraged to mitigate representational biases arising from single-platform or single-language sources while testing the transferability of thematic structures [<xref ref-type="bibr" rid="ref78">78</xref>]. At the methodological level, semantic-structural integration can expand from static co-occurrence networks to dynamic networks to better capture thematic evolution and cross-cluster migration [<xref ref-type="bibr" rid="ref79">79</xref>]. At the validation level, time slices and burst detection results can be compared against external reference data, such as policy timelines and significant events, while exploring event study methods to enhance interpretability and real-world applicability [<xref ref-type="bibr" rid="ref73">73</xref>,<xref ref-type="bibr" rid="ref75">75</xref>,<xref ref-type="bibr" rid="ref80">80</xref>].</p></sec><sec id="s4-5"><title>Limitations</title><p>This study has several limitations. First, our analysis examined only English literature from PubMed, excluding other databases and other languages, which may not fully capture SMM applications across the entire field. In particular, this study relied on PubMed as its sole primary data source, resulting in a bias toward biomedical and clinical research. Consequently, representation from other fields was insufficient, potentially underestimating the scale of methodological innovations or interdisciplinary research. Second, due to the time lag in literature collection, recency may be underestimated. Third, the clustering analysis itself relied on keywords. While it considered the meaning of keywords within abstracts, it could not capture all the fine distinctions across studies. Moreover, we used keywords as analytical units in this study. Keywords themselves exhibit variations in naming and granularity, and have nonuniform usage. Hence, synonymous expressions may not be fully normalized. Although this study used PubMedBERT and SPECTER2 to learn structural and semantic similarity from title and abstract contexts, this approach only partially mitigates the fragmentation caused by relying on string matching. It cannot guarantee complete unification of all synonyms and polysemous terms, particularly when interdisciplinary terminology and task definitions are inconsistent. Beyond this, we did not conduct stratified external validation for each of the 3 time periods, such as comparing RCRs stage by stage. Such analyses may be statistically unstable with smaller samples and when reference metrics require time accumulation, potentially introducing systematic bias for the most recent stage, particularly the short and incomplete window of 2024&#x2010;2025 (July 2025). Future studies may explore stage-by-stage external validation with longer follow-up intervals. Additionally, SMM research can be influenced by structural biases associated with social media platforms and their user demographics [<xref ref-type="bibr" rid="ref81">81</xref>]. For example, different platforms have different user demographics, and regional disparities and the digital divide may also affect the visibility of research topics. Our included SMM literature often relied on data from platforms like Twitter, Reddit, and Weibo, which differ in user profiles, languages, and geographic distributions. Consequently, which topics receive more research attention largely depends on the ease of data accessibility and user activity on these platforms. Therefore, the research findings presented here should primarily be understood as reflecting the knowledge structure at the level of academic publications, rather than being directly representative of the complete picture in the real world. Finally, as a bibliometric analysis, this study aimed to reveal correlations and trends rather than validate strict causal relationships.</p></sec><sec id="s4-6"><title>Conclusion</title><p>This study systematically revealed the knowledge structure, strategic positioning, and evolutionary trajectory of health-related SMM from 2015 to 2025 by developing a fully automated, reproducible hybrid bibliometric methodology. The innovation of this study lies in constructing a semantic-structural hybrid similarity matrix while enhancing method robustness through dual-level validation to reduce synonym fragmentation and parameter sensitivity. Unlike traditional bibliometric reviews that primarily rely on co-occurrence statistics or single-pass clustering, this study offers a decision-oriented, comprehensive analysis integrating interpretable thematic clustering, 3D strategic positioning, intercluster relationships, and time-slice analysis. Substantively, we found that the research evolved through 3 distinct periods: an initial phase of methodological consolidation, a middle phase where themes were rapidly diversified and fragmented, and a later phase of long-term priority problem-solving, such as mental health and cancer care. The 3D strategic coordinate map further indicates that differences in maturity, influence, and recency among various themes determine their potential and challenges for future development.</p><p>Our findings hold practical implications for public health, health research, and health care. For public health, the findings confirm the necessity of infodemiology and provide methodological support for developing precise, audience-segmented public health communication strategies. For health research, the knowledge map offered by this study can be used to strategically set research priorities and optimize the allocation of research resources. For health care, technologies in this field can be applied to support real-world health surveillance systems and data-driven decision-making. In particular, mature analytic pipelines in SMM can support not only the development but also the validation of real-world surveillance and pharmacovigilance workflows and a better understanding of the patient journey through analysis of social media datasets, thereby supporting evidence-informed decision-making. Although this study provides an organized framework, further efforts are needed to support real-world decision-making. Regarding future work, the hybrid analytical framework proposed in this study will undergo continuous iteration across 3 dimensions. At the data level, we will conduct cross-database, multilingual, and cross-platform comparative analyses. Methodologically, we will extend semantic-structural integration from static co-occurrence networks to dynamic networks. For validation, we will combine temporal slicing analysis and burst detection with event analysis to enhance explainability and practical utility.</p></sec></sec></body><back><ack><p>The authors declare the use of generative artificial intelligence (GAI) in the research and writing process. According to the GAIDeT taxonomy (2025), the following tasks were delegated to GAI tools under full human supervision: proofreading and editing, adapting and adjusting emotional tone, translation, and reformatting. The GAI tools used were ChatGPT-4o, ChatGPT-5.2, and DeepSeek-V3. Responsibility for the final manuscript lies entirely with the authors. GAI tools are not listed as authors and do not bear responsibility for the final outcomes. For manuscript preparation, given that the authors are nonnative English speakers, writing assistance tools were used: DeepL for translation, Grammarly and Qullibot for grammar refinement, and large language models (GPT-4o, GPT-5, GPT-5.2, and DeepSeek-V3) for language polishing. These tools were used only for language editing.</p></ack><notes><sec><title>Funding</title><p>This study was conducted as part of the ATLAS project "Innovation and Digital Transformation in Healthcare," funded by the State of North Rhine-Westphalia, Germany (grant number: ITG-1-1).</p></sec><sec><title>Data Availability</title><p>The original code is not publicly available because the repository contains personal API keys/credentials, local computer paths, and project-specific configuration information. The code can be obtained from the corresponding author upon reasonable request.</p></sec></notes><fn-group><fn fn-type="con"><p>Conceptualization: MJY, SB-J</p><p>Data analysis: MJY</p><p>Supervision: SB-J</p><p>Writing &#x2013; original draft: MJY</p><p>Writing &#x2013; review &#x0026; editing: MJY, SB-J</p></fn><fn fn-type="conflict"><p>None declared.</p></fn></fn-group><glossary><title>Abbreviations</title><def-list><def-item><term id="abb1">API</term><def><p>application programming interface</p></def></def-item><def-item><term id="abb2">ARIMA</term><def><p>autoregressive integrated moving average</p></def></def-item><def-item><term id="abb3">BIBLIO</term><def><p>Preliminary Guideline for Reporting Bibliometric Reviews of the Biomedical Literature</p></def></def-item><def-item><term id="abb4">CH</term><def><p>Calinski-Harabasz</p></def></def-item><def-item><term id="abb5">HDBSCAN</term><def><p>Hierarchical Density-Based Spatial Clustering of Applications with Noise</p></def></def-item><def-item><term id="abb6">IF</term><def><p>impact factor</p></def></def-item><def-item><term id="abb7">LDA</term><def><p>latent Dirichlet allocation</p></def></def-item><def-item><term id="abb8">MeSH</term><def><p>Medical Subject Headings</p></def></def-item><def-item><term id="abb9">ML</term><def><p>machine learning</p></def></def-item><def-item><term id="abb10">NLP</term><def><p>natural language processing</p></def></def-item><def-item><term id="abb11">PRISMA</term><def><p>Preferred Reporting Items for Systematic Reviews and Meta-Analyses</p></def></def-item><def-item><term id="abb12">RCR</term><def><p>relative citation ratio</p></def></def-item><def-item><term id="abb13">SMM</term><def><p>social media mining</p></def></def-item><def-item><term id="abb14">UMAP</term><def><p>Uniform Manifold Approximation and Projection</p></def></def-item></def-list></glossary><ref-list><title>References</title><ref id="ref1"><label>1</label><nlm-citation citation-type="web"><article-title>Number of internet and social media users worldwide</article-title><source>Statista</source><access-date>2025-05-10</access-date><comment><ext-link ext-link-type="uri" xlink:href="https://www.statista.com/statistics/617136/digital-population-worldwide">https://www.statista.com/statistics/617136/digital-population-worldwide</ext-link></comment></nlm-citation></ref><ref id="ref2"><label>2</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Shah</surname><given-names>HA</given-names> </name><name name-style="western"><surname>Househ</surname><given-names>M</given-names> </name></person-group><article-title>Understanding loneliness through analysis of Twitter and Reddit data: comparative study</article-title><source>Interact J Med Res</source><year>2025</year><month>03</month><day>14</day><volume>14</volume><fpage>e49464</fpage><pub-id pub-id-type="doi">10.2196/49464</pub-id><pub-id pub-id-type="medline">40085832</pub-id></nlm-citation></ref><ref id="ref3"><label>3</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Valdez</surname><given-names>D</given-names> </name><name name-style="western"><surname>Mena-Mel&#x00E9;ndez</surname><given-names>L</given-names> </name><name name-style="western"><surname>Crawford</surname><given-names>BL</given-names> </name><name name-style="western"><surname>Jozkowski</surname><given-names>KN</given-names> </name></person-group><article-title>Analyzing Reddit forums specific to abortion that yield diverse dialogues pertaining to medical information seeking and personal worldviews: data mining and natural language processing comparative study</article-title><source>J Med Internet Res</source><year>2024</year><month>02</month><day>14</day><volume>26</volume><fpage>e47408</fpage><pub-id pub-id-type="doi">10.2196/47408</pub-id><pub-id pub-id-type="medline">38354044</pub-id></nlm-citation></ref><ref id="ref4"><label>4</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Sinnenberg</surname><given-names>L</given-names> </name><name name-style="western"><surname>Buttenheim</surname><given-names>AM</given-names> </name><name name-style="western"><surname>Padrez</surname><given-names>K</given-names> </name><name name-style="western"><surname>Mancheno</surname><given-names>C</given-names> </name><name name-style="western"><surname>Ungar</surname><given-names>L</given-names> </name><name name-style="western"><surname>Merchant</surname><given-names>RM</given-names> </name></person-group><article-title>Twitter as a tool for health research: a systematic review</article-title><source>Am J Public Health</source><year>2017</year><month>01</month><volume>107</volume><issue>1</issue><fpage>143</fpage><lpage>143</lpage><pub-id pub-id-type="doi">10.2105/AJPH.2016.303512a</pub-id></nlm-citation></ref><ref id="ref5"><label>5</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Koss</surname><given-names>J</given-names> </name><name name-style="western"><surname>Rheinlaender</surname><given-names>A</given-names> </name><name name-style="western"><surname>Truebel</surname><given-names>H</given-names> </name><name name-style="western"><surname>Bohnet-Joschko</surname><given-names>S</given-names> </name></person-group><article-title>Social media mining in drug development-fundamentals and use cases</article-title><source>Drug Discov Today</source><year>2021</year><month>12</month><volume>26</volume><issue>12</issue><fpage>2871</fpage><lpage>2880</lpage><pub-id pub-id-type="doi">10.1016/j.drudis.2021.08.012</pub-id><pub-id pub-id-type="medline">34481080</pub-id></nlm-citation></ref><ref id="ref6"><label>6</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Li</surname><given-names>J</given-names> </name><name name-style="western"><surname>Xu</surname><given-names>Q</given-names> </name><name name-style="western"><surname>Cuomo</surname><given-names>R</given-names> </name><name name-style="western"><surname>Purushothaman</surname><given-names>V</given-names> </name><name name-style="western"><surname>Mackey</surname><given-names>T</given-names> </name></person-group><article-title>Data mining and content analysis of the Chinese social media platform Weibo during the early COVID-19 outbreak: retrospective observational infoveillance study</article-title><source>JMIR Public Health Surveill</source><year>2020</year><month>04</month><day>21</day><volume>6</volume><issue>2</issue><fpage>e18700</fpage><pub-id pub-id-type="doi">10.2196/18700</pub-id><pub-id pub-id-type="medline">32293582</pub-id></nlm-citation></ref><ref id="ref7"><label>7</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Lazard</surname><given-names>AJ</given-names> </name><name name-style="western"><surname>Saffer</surname><given-names>AJ</given-names> </name><name name-style="western"><surname>Wilcox</surname><given-names>GB</given-names> </name><name name-style="western"><surname>Chung</surname><given-names>AD</given-names> </name><name name-style="western"><surname>Mackert</surname><given-names>MS</given-names> </name><name name-style="western"><surname>Bernhardt</surname><given-names>JM</given-names> </name></person-group><article-title>E-cigarette social media messages: a text mining analysis of marketing and consumer conversations on Twitter</article-title><source>JMIR Public Health Surveill</source><year>2016</year><month>12</month><day>12</day><volume>2</volume><issue>2</issue><fpage>e171</fpage><pub-id pub-id-type="doi">10.2196/publichealth.6551</pub-id><pub-id pub-id-type="medline">27956376</pub-id></nlm-citation></ref><ref id="ref8"><label>8</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Koss</surname><given-names>J</given-names> </name><name name-style="western"><surname>Bohnet-Joschko</surname><given-names>S</given-names> </name></person-group><article-title>Social media mining of long-COVID self-medication reported by Reddit users: feasibility study to support drug repurposing</article-title><source>JMIR Form Res</source><year>2022</year><month>10</month><day>3</day><volume>6</volume><issue>10</issue><fpage>e39582</fpage><pub-id pub-id-type="doi">10.2196/39582</pub-id><pub-id pub-id-type="medline">36007131</pub-id></nlm-citation></ref><ref id="ref9"><label>9</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Park</surname><given-names>A</given-names> </name><name name-style="western"><surname>Conway</surname><given-names>M</given-names> </name><name name-style="western"><surname>Chen</surname><given-names>AT</given-names> </name></person-group><article-title>Examining thematic similarity, difference, and membership in three online mental health communities from Reddit: a text mining and visualization approach</article-title><source>Comput Human Behav</source><year>2018</year><month>01</month><volume>78</volume><issue>98-112</issue><fpage>98</fpage><lpage>112</lpage><pub-id pub-id-type="doi">10.1016/j.chb.2017.09.001</pub-id><pub-id pub-id-type="medline">29456286</pub-id></nlm-citation></ref><ref id="ref10"><label>10</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Chi</surname><given-names>Y</given-names> </name><name name-style="western"><surname>Chen</surname><given-names>HY</given-names> </name></person-group><article-title>Investigating substance use via Reddit: systematic scoping review</article-title><source>J Med Internet Res</source><year>2023</year><month>10</month><day>25</day><volume>25</volume><fpage>e48905</fpage><pub-id pub-id-type="doi">10.2196/48905</pub-id><pub-id pub-id-type="medline">37878361</pub-id></nlm-citation></ref><ref id="ref11"><label>11</label><nlm-citation citation-type="book"><person-group person-group-type="author"><name name-style="western"><surname>Ru</surname><given-names>B</given-names> </name><name name-style="western"><surname>Yao</surname><given-names>L</given-names> </name></person-group><person-group person-group-type="editor"><name name-style="western"><surname>Bian</surname><given-names>J</given-names> </name><name name-style="western"><surname>Guo</surname><given-names>Y</given-names> </name><name name-style="western"><surname>He</surname><given-names>Z</given-names> </name><name name-style="western"><surname>Hu</surname><given-names>X</given-names> </name></person-group><article-title>A literature review of social media-based data mining for health outcomes research</article-title><source>Social Web and Health Research</source><year>2019</year><publisher-name>Springer</publisher-name><fpage>1</fpage><lpage>14</lpage><pub-id pub-id-type="doi">10.1007/978-3-030-14714-3_1</pub-id><pub-id pub-id-type="other">9783030147136</pub-id></nlm-citation></ref><ref id="ref12"><label>12</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Ford</surname><given-names>E</given-names> </name><name name-style="western"><surname>Shepherd</surname><given-names>S</given-names> </name><name name-style="western"><surname>Jones</surname><given-names>K</given-names> </name><name name-style="western"><surname>Hassan</surname><given-names>L</given-names> </name></person-group><article-title>Toward an ethical framework for the text mining of social media for health research: a systematic review</article-title><source>Front Digit Health</source><year>2020</year><volume>2</volume><fpage>592237</fpage><pub-id pub-id-type="doi">10.3389/fdgth.2020.592237</pub-id><pub-id pub-id-type="medline">34713062</pub-id></nlm-citation></ref><ref id="ref13"><label>13</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Chen</surname><given-names>J</given-names> </name><name name-style="western"><surname>Wang</surname><given-names>Y</given-names> </name></person-group><article-title>Social media use for health purposes: systematic review</article-title><source>J Med Internet Res</source><year>2021</year><month>05</month><day>12</day><volume>23</volume><issue>5</issue><fpage>e17917</fpage><pub-id pub-id-type="doi">10.2196/17917</pub-id><pub-id pub-id-type="medline">33978589</pub-id></nlm-citation></ref><ref id="ref14"><label>14</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Fu</surname><given-names>J</given-names> </name><name name-style="western"><surname>Li</surname><given-names>C</given-names> </name><name name-style="western"><surname>Zhou</surname><given-names>C</given-names> </name><etal/></person-group><article-title>Methods for analyzing the contents of social media for health care: scoping review</article-title><source>J Med Internet Res</source><year>2023</year><month>06</month><day>26</day><volume>25</volume><fpage>e43349</fpage><pub-id pub-id-type="doi">10.2196/43349</pub-id><pub-id pub-id-type="medline">37358900</pub-id></nlm-citation></ref><ref id="ref15"><label>15</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Cobelli</surname><given-names>N</given-names> </name><name name-style="western"><surname>Blasi</surname><given-names>S</given-names> </name></person-group><article-title>Combining topic modeling and bibliometric analysis to understand the evolution of technological innovation adoption in the healthcare industry</article-title><source>Eur J Innov Manag</source><year>2024</year><month>12</month><day>16</day><volume>27</volume><issue>9</issue><fpage>127</fpage><lpage>149</lpage><pub-id pub-id-type="doi">10.1108/EJIM-06-2023-0497</pub-id></nlm-citation></ref><ref id="ref16"><label>16</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Chien</surname><given-names>SC</given-names> </name><name name-style="western"><surname>Yen</surname><given-names>CM</given-names> </name><name name-style="western"><surname>Chang</surname><given-names>YH</given-names> </name><etal/></person-group><article-title>Use of artificial intelligence, internet of things, and edge intelligence in long-term care for older people: comprehensive analysis through bibliometric, Google trends, and content analysis</article-title><source>J Med Internet Res</source><year>2025</year><month>03</month><day>4</day><volume>27</volume><fpage>e56692</fpage><pub-id pub-id-type="doi">10.2196/56692</pub-id><pub-id pub-id-type="medline">40053718</pub-id></nlm-citation></ref><ref id="ref17"><label>17</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Hassan</surname><given-names>W</given-names> </name><name name-style="western"><surname>Duarte</surname><given-names>AE</given-names> </name></person-group><article-title>Bibliometric analysis: a few suggestions</article-title><source>Curr Probl Cardiol</source><year>2024</year><month>08</month><volume>49</volume><issue>8</issue><fpage>102640</fpage><pub-id pub-id-type="doi">10.1016/j.cpcardiol.2024.102640</pub-id><pub-id pub-id-type="medline">38740289</pub-id></nlm-citation></ref><ref id="ref18"><label>18</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Marzi</surname><given-names>G</given-names> </name><name name-style="western"><surname>Balzano</surname><given-names>M</given-names> </name><name name-style="western"><surname>Caputo</surname><given-names>A</given-names> </name><name name-style="western"><surname>Pellegrini</surname><given-names>MM</given-names> </name></person-group><article-title>Guidelines for bibliometric&#x2010;systematic literature reviews: 10 steps to combine analysis, synthesis and theory development</article-title><source>Int J Management Reviews</source><year>2025</year><month>01</month><volume>27</volume><issue>1</issue><fpage>81</fpage><lpage>103</lpage><pub-id pub-id-type="doi">10.1111/ijmr.12381</pub-id></nlm-citation></ref><ref id="ref19"><label>19</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Petit</surname><given-names>P</given-names> </name><name name-style="western"><surname>Vuillerme</surname><given-names>N</given-names> </name></person-group><article-title>Leveraging administrative health databases to address health challenges in farming populations: scoping review and bibliometric analysis (1975-2024)</article-title><source>JMIR Public Health Surveill</source><year>2025</year><month>01</month><day>9</day><volume>11</volume><fpage>e62939</fpage><pub-id pub-id-type="doi">10.2196/62939</pub-id><pub-id pub-id-type="medline">39787587</pub-id></nlm-citation></ref><ref id="ref20"><label>20</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Xie</surname><given-names>Y</given-names> </name><name name-style="western"><surname>Guo</surname><given-names>Z</given-names> </name><name name-style="western"><surname>Zeng</surname><given-names>X</given-names> </name><etal/></person-group><article-title>Social media-based cancer education: bibliometric and thematic analysis</article-title><source>JMIR Cancer</source><year>2025</year><month>10</month><day>6</day><volume>11</volume><fpage>e77214</fpage><pub-id pub-id-type="doi">10.2196/77214</pub-id><pub-id pub-id-type="medline">41052420</pub-id></nlm-citation></ref><ref id="ref21"><label>21</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Alhashmi</surname><given-names>SM</given-names> </name><name name-style="western"><surname>Hashem</surname><given-names>IAT</given-names> </name><name name-style="western"><surname>Al-Qudah</surname><given-names>I</given-names> </name></person-group><article-title>Artificial Intelligence applications in healthcare: a bibliometric and topic model-based analysis</article-title><source>Intelligent Systems with Applications</source><year>2024</year><month>03</month><volume>21</volume><fpage>200299</fpage><pub-id pub-id-type="doi">10.1016/j.iswa.2023.200299</pub-id></nlm-citation></ref><ref id="ref22"><label>22</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Hu</surname><given-names>J</given-names> </name><name name-style="western"><surname>Li</surname><given-names>C</given-names> </name><name name-style="western"><surname>Ge</surname><given-names>Y</given-names> </name><name name-style="western"><surname>Yang</surname><given-names>J</given-names> </name><name name-style="western"><surname>Zhu</surname><given-names>S</given-names> </name><name name-style="western"><surname>He</surname><given-names>C</given-names> </name></person-group><article-title>Mapping the evolution of digital health research: bibliometric overview of research hotspots, trends, and collaboration of publications in JMIR (1999-2024)</article-title><source>J Med Internet Res</source><year>2024</year><month>10</month><day>17</day><volume>26</volume><fpage>e58987</fpage><pub-id pub-id-type="doi">10.2196/58987</pub-id><pub-id pub-id-type="medline">39419496</pub-id></nlm-citation></ref><ref id="ref23"><label>23</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Hankar</surname><given-names>M</given-names> </name><name name-style="western"><surname>Kasri</surname><given-names>M</given-names> </name><name name-style="western"><surname>Beni-Hssane</surname><given-names>A</given-names> </name></person-group><article-title>A comprehensive overview of topic modeling: techniques, applications and challenges</article-title><source>Neurocomputing</source><year>2025</year><month>05</month><volume>628</volume><fpage>129638</fpage><pub-id pub-id-type="doi">10.1016/j.neucom.2025.129638</pub-id></nlm-citation></ref><ref id="ref24"><label>24</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Ogunleye</surname><given-names>B</given-names> </name><name name-style="western"><surname>Lancho Barrantes</surname><given-names>BS</given-names> </name><name name-style="western"><surname>Zakariyyah</surname><given-names>KI</given-names> </name></person-group><article-title>Topic modelling through the bibliometrics lens and its technique</article-title><source>Artif Intell Rev</source><year>2025</year><volume>58</volume><issue>3</issue><pub-id pub-id-type="doi">10.1007/s10462-024-11011-x</pub-id></nlm-citation></ref><ref id="ref25"><label>25</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Kukreja</surname><given-names>V</given-names> </name></person-group><article-title>Comic exploration and Insights: recent trends in LDA-based recognition studies</article-title><source>Expert Syst Appl</source><year>2024</year><month>12</month><volume>255</volume><fpage>124732</fpage><pub-id pub-id-type="doi">10.1016/j.eswa.2024.124732</pub-id></nlm-citation></ref><ref id="ref26"><label>26</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Sheng</surname><given-names>B</given-names> </name><name name-style="western"><surname>Wang</surname><given-names>Z</given-names> </name><name name-style="western"><surname>Qiao</surname><given-names>Y</given-names> </name><name name-style="western"><surname>Xie</surname><given-names>SQ</given-names> </name><name name-style="western"><surname>Tao</surname><given-names>J</given-names> </name><name name-style="western"><surname>Duan</surname><given-names>C</given-names> </name></person-group><article-title>Detecting latent topics and trends of digital twins in healthcare: a structural topic model-based systematic review</article-title><source>Digit Health</source><year>2023</year><volume>9</volume><fpage>20552076231203672</fpage><pub-id pub-id-type="doi">10.1177/20552076231203672</pub-id><pub-id pub-id-type="medline">37846404</pub-id></nlm-citation></ref><ref id="ref27"><label>27</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Yu</surname><given-names>JH</given-names> </name><name name-style="western"><surname>Chauhan</surname><given-names>D</given-names> </name></person-group><article-title>Trends in NLP for personalized learning: LDA and sentiment analysis insights</article-title><source>Educ Inf Technol</source><year>2025</year><month>03</month><volume>30</volume><issue>4</issue><fpage>4307</fpage><lpage>4348</lpage><pub-id pub-id-type="doi">10.1007/s10639-024-12988-2</pub-id></nlm-citation></ref><ref id="ref28"><label>28</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Lee</surname><given-names>S</given-names> </name><name name-style="western"><surname>Kim</surname><given-names>L</given-names> </name><name name-style="western"><surname>Shim</surname><given-names>MS</given-names> </name><name name-style="western"><surname>Kim</surname><given-names>GS</given-names> </name></person-group><article-title>Identifying health care services offered in the HIV care continuum via a machine learning-based topic modeling approach: exploratory literature review</article-title><source>JMIR Public Health Surveill</source><year>2025</year><month>07</month><day>9</day><volume>11</volume><fpage>e65081</fpage><pub-id pub-id-type="doi">10.2196/65081</pub-id><pub-id pub-id-type="medline">40632764</pub-id></nlm-citation></ref><ref id="ref29"><label>29</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Xie</surname><given-names>Q</given-names> </name><name name-style="western"><surname>Waltman</surname><given-names>L</given-names> </name></person-group><article-title>A comparison of citation-based clustering and topic modeling for science mapping</article-title><source>Scientometrics</source><year>2025</year><month>05</month><volume>130</volume><issue>5</issue><fpage>2497</fpage><lpage>2522</lpage><pub-id pub-id-type="doi">10.1007/s11192-025-05324-z</pub-id></nlm-citation></ref><ref id="ref30"><label>30</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Chen</surname><given-names>G</given-names> </name><name name-style="western"><surname>Hong</surname><given-names>S</given-names> </name><name name-style="western"><surname>Du</surname><given-names>C</given-names> </name><name name-style="western"><surname>Wang</surname><given-names>P</given-names> </name><name name-style="western"><surname>Yang</surname><given-names>Z</given-names> </name><name name-style="western"><surname>Xiao</surname><given-names>L</given-names> </name></person-group><article-title>Comparing semantic representation methods for keyword analysis in bibliometric research</article-title><source>J Informetr</source><year>2024</year><month>08</month><volume>18</volume><issue>3</issue><fpage>101529</fpage><pub-id pub-id-type="doi">10.1016/j.joi.2024.101529</pub-id></nlm-citation></ref><ref id="ref31"><label>31</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Deka</surname><given-names>P</given-names> </name><name name-style="western"><surname>Kalita</surname><given-names>H</given-names> </name><name name-style="western"><surname>Sarmah</surname><given-names>M</given-names> </name></person-group><article-title>Author keyword occurrence and trends in digital media research: a bibliometric study</article-title><source>Alexandria: The Journal of National and International Library and Information Issues</source><year>2025</year><pub-id pub-id-type="doi">10.1177/09557490251406852</pub-id></nlm-citation></ref><ref id="ref32"><label>32</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Kong</surname><given-names>M</given-names> </name><name name-style="western"><surname>Zhang</surname><given-names>Y</given-names> </name><name name-style="western"><surname>Sheng</surname><given-names>L</given-names> </name><name name-style="western"><surname>Hong</surname><given-names>K</given-names> </name></person-group><article-title>Citation structural diversity: a novel metric combining structure and semantics for literature evaluation</article-title><source>Scientometrics</source><year>2025</year><month>07</month><volume>130</volume><issue>7</issue><fpage>4027</fpage><lpage>4060</lpage><pub-id pub-id-type="doi">10.1007/s11192-025-05356-5</pub-id></nlm-citation></ref><ref id="ref33"><label>33</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Feng</surname><given-names>Y</given-names> </name><name name-style="western"><surname>Chen</surname><given-names>Z</given-names> </name><name name-style="western"><surname>Zhang</surname><given-names>Y</given-names> </name><name name-style="western"><surname>Huang</surname><given-names>W</given-names> </name><name name-style="western"><surname>Zhang</surname><given-names>X</given-names> </name><name name-style="western"><surname>He</surname><given-names>S</given-names> </name></person-group><article-title>BERTopic_Teen: a multi-module optimization approach for short text topic modeling in adolescent health</article-title><source>Front Public Health</source><year>2025</year><volume>13</volume><pub-id pub-id-type="doi">10.3389/fpubh.2025.1608241</pub-id></nlm-citation></ref><ref id="ref34"><label>34</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Montazeri</surname><given-names>A</given-names> </name><name name-style="western"><surname>Mohammadi</surname><given-names>S</given-names> </name><name name-style="western"><surname>M Hesari</surname><given-names>P</given-names> </name><name name-style="western"><surname>Ghaemi</surname><given-names>M</given-names> </name><name name-style="western"><surname>Riazi</surname><given-names>H</given-names> </name><name name-style="western"><surname>Sheikhi-Mobarakeh</surname><given-names>Z</given-names> </name></person-group><article-title>Preliminary guideline for reporting bibliometric reviews of the biomedical literature (BIBLIO): a minimum requirements</article-title><source>Syst Rev</source><year>2023</year><month>12</month><day>15</day><volume>12</volume><issue>1</issue><fpage>239</fpage><pub-id pub-id-type="doi">10.1186/s13643-023-02410-2</pub-id><pub-id pub-id-type="medline">38102710</pub-id></nlm-citation></ref><ref id="ref35"><label>35</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Rethlefsen</surname><given-names>ML</given-names> </name><name name-style="western"><surname>Kirtley</surname><given-names>S</given-names> </name><name name-style="western"><surname>Waffenschmidt</surname><given-names>S</given-names> </name><etal/></person-group><article-title>PRISMA-S: an extension to the PRISMA statement for reporting literature searches in systematic reviews</article-title><source>Syst Rev</source><year>2021</year><month>01</month><day>26</day><volume>10</volume><issue>1</issue><fpage>39</fpage><pub-id pub-id-type="doi">10.1186/s13643-020-01542-z</pub-id><pub-id pub-id-type="medline">33499930</pub-id></nlm-citation></ref><ref id="ref36"><label>36</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Page</surname><given-names>MJ</given-names> </name><name name-style="western"><surname>McKenzie</surname><given-names>JE</given-names> </name><name name-style="western"><surname>Bossuyt</surname><given-names>PM</given-names> </name><etal/></person-group><article-title>The PRISMA 2020 statement: an updated guideline for reporting systematic reviews</article-title><source>BMJ</source><year>2021</year><month>03</month><day>29</day><volume>372</volume><fpage>n71</fpage><pub-id pub-id-type="doi">10.1136/bmj.n71</pub-id><pub-id pub-id-type="medline">33782057</pub-id></nlm-citation></ref><ref id="ref37"><label>37</label><nlm-citation citation-type="web"><article-title>Attribution 4.0 International (CC BY 4.0)</article-title><source>Creative Commons</source><access-date>2026-04-30</access-date><comment><ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></comment></nlm-citation></ref><ref id="ref38"><label>38</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Ayyoubzadeh</surname><given-names>SM</given-names> </name><name name-style="western"><surname>Ayyoubzadeh</surname><given-names>SM</given-names> </name><name name-style="western"><surname>Zahedi</surname><given-names>H</given-names> </name><name name-style="western"><surname>Ahmadi</surname><given-names>M</given-names> </name><name name-style="western"><surname>R Niakan Kalhori</surname><given-names>S</given-names> </name></person-group><article-title>Predicting COVID-19 incidence through analysis of Google trends data in Iran: data mining and deep learning pilot study</article-title><source>JMIR Public Health Surveill</source><year>2020</year><month>04</month><day>14</day><volume>6</volume><issue>2</issue><fpage>e18828</fpage><pub-id pub-id-type="doi">10.2196/18828</pub-id><pub-id pub-id-type="medline">32234709</pub-id></nlm-citation></ref><ref id="ref39"><label>39</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Nikfarjam</surname><given-names>A</given-names> </name><name name-style="western"><surname>Sarker</surname><given-names>A</given-names> </name><name name-style="western"><surname>O&#x2019;Connor</surname><given-names>K</given-names> </name><name name-style="western"><surname>Ginn</surname><given-names>R</given-names> </name><name name-style="western"><surname>Gonzalez</surname><given-names>G</given-names> </name></person-group><article-title>Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features</article-title><source>J Am Med Inform Assoc</source><year>2015</year><month>05</month><volume>22</volume><issue>3</issue><fpage>671</fpage><lpage>681</lpage><pub-id pub-id-type="doi">10.1093/jamia/ocu041</pub-id><pub-id pub-id-type="medline">25755127</pub-id></nlm-citation></ref><ref id="ref40"><label>40</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Tapi Nzali</surname><given-names>MD</given-names> </name><name name-style="western"><surname>Bringay</surname><given-names>S</given-names> </name><name name-style="western"><surname>Lavergne</surname><given-names>C</given-names> </name><name name-style="western"><surname>Mollevi</surname><given-names>C</given-names> </name><name name-style="western"><surname>Opitz</surname><given-names>T</given-names> </name></person-group><article-title>What patients can tell us: topic analysis for social media on breast cancer</article-title><source>JMIR Med Inform</source><year>2017</year><month>07</month><day>31</day><volume>5</volume><issue>3</issue><fpage>e23</fpage><pub-id pub-id-type="doi">10.2196/medinform.7779</pub-id><pub-id pub-id-type="medline">28760725</pub-id></nlm-citation></ref><ref id="ref41"><label>41</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Lazard</surname><given-names>AJ</given-names> </name><name name-style="western"><surname>Wilcox</surname><given-names>GB</given-names> </name><name name-style="western"><surname>Tuttle</surname><given-names>HM</given-names> </name><name name-style="western"><surname>Glowacki</surname><given-names>EM</given-names> </name><name name-style="western"><surname>Pikowski</surname><given-names>J</given-names> </name></person-group><article-title>Public reactions to e-cigarette regulations on Twitter: a text mining analysis</article-title><source>Tob Control</source><year>2017</year><month>12</month><volume>26</volume><issue>e2</issue><fpage>e112</fpage><lpage>e116</lpage><pub-id pub-id-type="doi">10.1136/tobaccocontrol-2016-053295</pub-id><pub-id pub-id-type="medline">28341768</pub-id></nlm-citation></ref><ref id="ref42"><label>42</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Li</surname><given-names>D</given-names> </name><name name-style="western"><surname>Chaudhary</surname><given-names>H</given-names> </name><name name-style="western"><surname>Zhang</surname><given-names>Z</given-names> </name></person-group><article-title>Modeling spatiotemporal pattern of depressive symptoms caused by COVID-19 using social media data mining</article-title><source>Int J Environ Res Public Health</source><year>2020</year><month>07</month><day>10</day><volume>17</volume><issue>14</issue><fpage>4988</fpage><pub-id pub-id-type="doi">10.3390/ijerph17144988</pub-id><pub-id pub-id-type="medline">32664388</pub-id></nlm-citation></ref><ref id="ref43"><label>43</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Zhang</surname><given-names>J</given-names> </name><name name-style="western"><surname>Wang</surname><given-names>Y</given-names> </name><name name-style="western"><surname>Shi</surname><given-names>M</given-names> </name><name name-style="western"><surname>Wang</surname><given-names>X</given-names> </name></person-group><article-title>Factors driving the popularity and virality of COVID-19 vaccine discourse on Twitter: text mining and data visualization study</article-title><source>JMIR Public Health Surveill</source><year>2021</year><month>12</month><day>3</day><volume>7</volume><issue>12</issue><fpage>e32814</fpage><pub-id pub-id-type="doi">10.2196/32814</pub-id><pub-id pub-id-type="medline">34665761</pub-id></nlm-citation></ref><ref id="ref44"><label>44</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Blumenthal</surname><given-names>KG</given-names> </name><name name-style="western"><surname>Topaz</surname><given-names>M</given-names> </name><name name-style="western"><surname>Zhou</surname><given-names>L</given-names> </name><etal/></person-group><article-title>Mining social media data to assess the risk of skin and soft tissue infections from allergen immunotherapy</article-title><source>J Allergy Clin Immunol</source><year>2019</year><month>07</month><volume>144</volume><issue>1</issue><fpage>129</fpage><lpage>134</lpage><pub-id pub-id-type="doi">10.1016/j.jaci.2019.01.029</pub-id><pub-id pub-id-type="medline">30721764</pub-id></nlm-citation></ref><ref id="ref45"><label>45</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Cheng</surname><given-names>Q</given-names> </name><name name-style="western"><surname>Li</surname><given-names>TM</given-names> </name><name name-style="western"><surname>Kwok</surname><given-names>CL</given-names> </name><name name-style="western"><surname>Zhu</surname><given-names>T</given-names> </name><name name-style="western"><surname>Yip</surname><given-names>PS</given-names> </name></person-group><article-title>Assessing suicide risk and emotional distress in Chinese social media: a text mining and machine learning study</article-title><source>J Med Internet Res</source><year>2017</year><month>07</month><day>10</day><volume>19</volume><issue>7</issue><fpage>e243</fpage><pub-id pub-id-type="doi">10.2196/jmir.7276</pub-id><pub-id pub-id-type="medline">28694239</pub-id></nlm-citation></ref><ref id="ref46"><label>46</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Daughton</surname><given-names>AR</given-names> </name><name name-style="western"><surname>Shelley</surname><given-names>CD</given-names> </name><name name-style="western"><surname>Barnard</surname><given-names>M</given-names> </name><etal/></person-group><article-title>Mining and validating social media data for COVID-19-related human behaviors between January and July 2020: infodemiology study</article-title><source>J Med Internet Res</source><year>2021</year><month>05</month><day>25</day><volume>23</volume><issue>5</issue><fpage>e27059</fpage><pub-id pub-id-type="doi">10.2196/27059</pub-id><pub-id pub-id-type="medline">33882015</pub-id></nlm-citation></ref><ref id="ref47"><label>47</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Chen</surname><given-names>G</given-names> </name><name name-style="western"><surname>Tan</surname><given-names>B</given-names> </name><name name-style="western"><surname>Laham</surname><given-names>N</given-names> </name><name name-style="western"><surname>Tracey</surname><given-names>TJG</given-names> </name><name name-style="western"><surname>Lapinski</surname><given-names>S</given-names> </name><name name-style="western"><surname>Liu</surname><given-names>Y</given-names> </name></person-group><article-title>A bibliometric review of natural language processing applications in psychology from 1991 to 2023</article-title><source>Basic Appl Soc Psych</source><year>2025</year><month>03</month><day>4</day><volume>47</volume><issue>2</issue><fpage>105</fpage><lpage>119</lpage><pub-id pub-id-type="doi">10.1080/01973533.2024.2433720</pub-id></nlm-citation></ref><ref id="ref48"><label>48</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Habehh</surname><given-names>H</given-names> </name><name name-style="western"><surname>Gohel</surname><given-names>S</given-names> </name></person-group><article-title>Machine learning in healthcare</article-title><source>Curr Genomics</source><year>2021</year><month>12</month><day>16</day><volume>22</volume><issue>4</issue><fpage>291</fpage><lpage>300</lpage><pub-id pub-id-type="doi">10.2174/1389202922666210705124359</pub-id><pub-id pub-id-type="medline">35273459</pub-id></nlm-citation></ref><ref id="ref49"><label>49</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Kim</surname><given-names>J</given-names> </name><name name-style="western"><surname>Lee</surname><given-names>D</given-names> </name><name name-style="western"><surname>Park</surname><given-names>E</given-names> </name></person-group><article-title>Machine learning for mental health in social media: bibliometric study</article-title><source>J Med Internet Res</source><year>2021</year><month>03</month><day>8</day><volume>23</volume><issue>3</issue><fpage>e24870</fpage><pub-id pub-id-type="doi">10.2196/24870</pub-id><pub-id pub-id-type="medline">33683209</pub-id></nlm-citation></ref><ref id="ref50"><label>50</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Cunha Reis</surname><given-names>T</given-names> </name></person-group><article-title>Artificial intelligence and natural language processing for improved telemedicine: before, during and after remote consultation</article-title><source>Aten Primaria</source><year>2025</year><month>08</month><volume>57</volume><issue>8</issue><fpage>103228</fpage><pub-id pub-id-type="doi">10.1016/j.aprim.2025.103228</pub-id><pub-id pub-id-type="medline">39955812</pub-id></nlm-citation></ref><ref id="ref51"><label>51</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Casey</surname><given-names>A</given-names> </name><name name-style="western"><surname>Davidson</surname><given-names>E</given-names> </name><name name-style="western"><surname>Poon</surname><given-names>M</given-names> </name><etal/></person-group><article-title>A systematic review of natural language processing applied to radiology reports</article-title><source>BMC Med Inform Decis Mak</source><year>2021</year><month>06</month><day>3</day><volume>21</volume><issue>1</issue><fpage>179</fpage><pub-id pub-id-type="doi">10.1186/s12911-021-01533-7</pub-id><pub-id pub-id-type="medline">34082729</pub-id></nlm-citation></ref><ref id="ref52"><label>52</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Ren</surname><given-names>L</given-names> </name><name name-style="western"><surname>Lin</surname><given-names>H</given-names> </name><name name-style="western"><surname>Xu</surname><given-names>B</given-names> </name><name name-style="western"><surname>Zhang</surname><given-names>S</given-names> </name><name name-style="western"><surname>Yang</surname><given-names>L</given-names> </name><name name-style="western"><surname>Sun</surname><given-names>S</given-names> </name></person-group><article-title>Depression detection on Reddit with an emotion-based attention network: algorithm development and validation</article-title><source>JMIR Med Inform</source><year>2021</year><month>07</month><day>16</day><volume>9</volume><issue>7</issue><fpage>e28754</fpage><pub-id pub-id-type="doi">10.2196/28754</pub-id><pub-id pub-id-type="medline">34269683</pub-id></nlm-citation></ref><ref id="ref53"><label>53</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Low</surname><given-names>DM</given-names> </name><name name-style="western"><surname>Rumker</surname><given-names>L</given-names> </name><name name-style="western"><surname>Talkar</surname><given-names>T</given-names> </name><name name-style="western"><surname>Torous</surname><given-names>J</given-names> </name><name name-style="western"><surname>Cecchi</surname><given-names>G</given-names> </name><name name-style="western"><surname>Ghosh</surname><given-names>SS</given-names> </name></person-group><article-title>Natural language processing reveals vulnerable mental health support groups and heightened health anxiety on Reddit during COVID-19: observational study</article-title><source>J Med Internet Res</source><year>2020</year><month>10</month><day>12</day><volume>22</volume><issue>10</issue><fpage>e22635</fpage><pub-id pub-id-type="doi">10.2196/22635</pub-id><pub-id pub-id-type="medline">32936777</pub-id></nlm-citation></ref><ref id="ref54"><label>54</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Shankar</surname><given-names>K</given-names> </name><name name-style="western"><surname>Chandrasekaran</surname><given-names>R</given-names> </name><name name-style="western"><surname>Jeripity Venkata</surname><given-names>P</given-names> </name><name name-style="western"><surname>Miketinas</surname><given-names>D</given-names> </name></person-group><article-title>Investigating the role of nutrition in enhancing immunity during the COVID-19 pandemic: Twitter text-mining analysis</article-title><source>J Med Internet Res</source><year>2023</year><month>07</month><day>10</day><volume>25</volume><fpage>e47328</fpage><pub-id pub-id-type="doi">10.2196/47328</pub-id><pub-id pub-id-type="medline">37428522</pub-id></nlm-citation></ref><ref id="ref55"><label>55</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Zhang</surname><given-names>C</given-names> </name><name name-style="western"><surname>Xu</surname><given-names>S</given-names> </name><name name-style="western"><surname>Li</surname><given-names>Z</given-names> </name><name name-style="western"><surname>Hu</surname><given-names>S</given-names> </name></person-group><article-title>Understanding concerns, sentiments, and disparities among population groups during the COVID-19 pandemic via Twitter data mining: large-scale cross-sectional study</article-title><source>J Med Internet Res</source><year>2021</year><month>03</month><day>5</day><volume>23</volume><issue>3</issue><fpage>e26482</fpage><pub-id pub-id-type="doi">10.2196/26482</pub-id><pub-id pub-id-type="medline">33617460</pub-id></nlm-citation></ref><ref id="ref56"><label>56</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Sameto&#x011F;lu</surname><given-names>S</given-names> </name><name name-style="western"><surname>Pelt</surname><given-names>DHM</given-names> </name><name name-style="western"><surname>Eichstaedt</surname><given-names>JC</given-names> </name><name name-style="western"><surname>Ungar</surname><given-names>LH</given-names> </name><name name-style="western"><surname>Bartels</surname><given-names>M</given-names> </name></person-group><article-title>The value of social media language for the assessment of wellbeing: a systematic review and meta-analysis</article-title><source>J Posit Psychol</source><year>2024</year><month>05</month><day>3</day><volume>19</volume><issue>3</issue><fpage>471</fpage><lpage>489</lpage><pub-id pub-id-type="doi">10.1080/17439760.2023.2218341</pub-id></nlm-citation></ref><ref id="ref57"><label>57</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Western</surname><given-names>MJ</given-names> </name><name name-style="western"><surname>Smit</surname><given-names>ES</given-names> </name><name name-style="western"><surname>G&#x00FC;ltzow</surname><given-names>T</given-names> </name><etal/></person-group><article-title>Bridging the digital health divide: a narrative review of the causes, implications, and solutions for digital health inequalities</article-title><source>Health Psychol Behav Med</source><year>2025</year><volume>13</volume><issue>1</issue><fpage>2493139</fpage><pub-id pub-id-type="doi">10.1080/21642850.2025.2493139</pub-id><pub-id pub-id-type="medline">40276490</pub-id></nlm-citation></ref><ref id="ref58"><label>58</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Wu</surname><given-names>M</given-names> </name><name name-style="western"><surname>Xue</surname><given-names>Y</given-names> </name><name name-style="western"><surname>Ma</surname><given-names>C</given-names> </name></person-group><article-title>The association between the digital divide and health inequalities among older adults in China: nationally representative cross-sectional survey</article-title><source>J Med Internet Res</source><year>2025</year><month>01</month><day>15</day><volume>27</volume><fpage>e62645</fpage><pub-id pub-id-type="doi">10.2196/62645</pub-id><pub-id pub-id-type="medline">39813666</pub-id></nlm-citation></ref><ref id="ref59"><label>59</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Karapetiantz</surname><given-names>P</given-names> </name><name name-style="western"><surname>Audeh</surname><given-names>B</given-names> </name><name name-style="western"><surname>Redjdal</surname><given-names>A</given-names> </name><name name-style="western"><surname>Tiffet</surname><given-names>T</given-names> </name><name name-style="western"><surname>Bousquet</surname><given-names>C</given-names> </name><name name-style="western"><surname>Jaulent</surname><given-names>MC</given-names> </name></person-group><article-title>Monitoring adverse drug events in web forums: evaluation of a pipeline and use case study</article-title><source>J Med Internet Res</source><year>2024</year><month>06</month><day>18</day><volume>26</volume><fpage>e46176</fpage><pub-id pub-id-type="doi">10.2196/46176</pub-id><pub-id pub-id-type="medline">38888956</pub-id></nlm-citation></ref><ref id="ref60"><label>60</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Boatman</surname><given-names>D</given-names> </name><name name-style="western"><surname>Starkey</surname><given-names>A</given-names> </name><name name-style="western"><surname>Acciavatti</surname><given-names>L</given-names> </name><name name-style="western"><surname>Jarrett</surname><given-names>Z</given-names> </name><name name-style="western"><surname>Allen</surname><given-names>A</given-names> </name><name name-style="western"><surname>Kennedy-Rea</surname><given-names>S</given-names> </name></person-group><article-title>Using social listening for digital public health surveillance of human papillomavirus vaccine misinformation online: exploratory study</article-title><source>JMIR Infodemiology</source><year>2024</year><month>03</month><day>8</day><volume>4</volume><fpage>e54000</fpage><pub-id pub-id-type="doi">10.2196/54000</pub-id><pub-id pub-id-type="medline">38457224</pub-id></nlm-citation></ref><ref id="ref61"><label>61</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Timotin</surname><given-names>A</given-names> </name><name name-style="western"><surname>Paladi</surname><given-names>A</given-names> </name><name name-style="western"><surname>Mita</surname><given-names>V</given-names> </name><name name-style="western"><surname>Chihai</surname><given-names>V</given-names> </name><name name-style="western"><surname>Lozan</surname><given-names>O</given-names> </name></person-group><article-title>Social listening applied to tailor communication on immunization in the Republic of Moldova</article-title><source>Eur J Public Health</source><year>2025</year><month>04</month><day>1</day><volume>35</volume><issue>2</issue><fpage>270</fpage><lpage>275</lpage><pub-id pub-id-type="doi">10.1093/eurpub/ckae161</pub-id><pub-id pub-id-type="medline">39470446</pub-id></nlm-citation></ref><ref id="ref62"><label>62</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Wehrli</surname><given-names>S</given-names> </name><name name-style="western"><surname>Irrgang</surname><given-names>C</given-names> </name><name name-style="western"><surname>Scott</surname><given-names>M</given-names> </name><name name-style="western"><surname>Arnrich</surname><given-names>B</given-names> </name><name name-style="western"><surname>Boender</surname><given-names>TS</given-names> </name></person-group><article-title>The role of the (in)accessibility of social media data for infodemic management: a public health perspective on the situation in the European Union in March 2024</article-title><source>Front Public Health</source><year>2024</year><volume>12</volume><fpage>1378412</fpage><pub-id pub-id-type="doi">10.3389/fpubh.2024.1378412</pub-id><pub-id pub-id-type="medline">38651120</pub-id></nlm-citation></ref><ref id="ref63"><label>63</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Liscano</surname><given-names>Y</given-names> </name><name name-style="western"><surname>Anillo Arrieta</surname><given-names>LA</given-names> </name><name name-style="western"><surname>Montenegro</surname><given-names>JF</given-names> </name><name name-style="western"><surname>Prieto-Alvarado</surname><given-names>D</given-names> </name><name name-style="western"><surname>Ordo&#x00F1;ez</surname><given-names>J</given-names> </name></person-group><article-title>Early warning of infectious disease outbreaks using social media and digital data: a scoping review</article-title><source>Int J Environ Res Public Health</source><year>2025</year><month>07</month><day>13</day><volume>22</volume><issue>7</issue><fpage>1104</fpage><pub-id pub-id-type="doi">10.3390/ijerph22071104</pub-id><pub-id pub-id-type="medline">40724171</pub-id></nlm-citation></ref><ref id="ref64"><label>64</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Amin</surname><given-names>S</given-names> </name><name name-style="western"><surname>Zeb</surname><given-names>MA</given-names> </name><name name-style="western"><surname>Alshahrani</surname><given-names>H</given-names> </name><name name-style="western"><surname>Hamdi</surname><given-names>M</given-names> </name><name name-style="western"><surname>Alsulami</surname><given-names>M</given-names> </name><name name-style="western"><surname>Shaikh</surname><given-names>A</given-names> </name></person-group><article-title>Social media-based surveillance systems for health informatics using machine and deep learning techniques: a comprehensive review and open challenges</article-title><source>CMES</source><year>2024</year><volume>139</volume><issue>2</issue><fpage>1167</fpage><lpage>1202</lpage><pub-id pub-id-type="doi">10.32604/cmes.2023.043921</pub-id></nlm-citation></ref><ref id="ref65"><label>65</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Resnik</surname><given-names>P</given-names> </name><name name-style="western"><surname>De Choudhury</surname><given-names>M</given-names> </name><name name-style="western"><surname>Musacchio Schafer</surname><given-names>K</given-names> </name><name name-style="western"><surname>Coppersmith</surname><given-names>G</given-names> </name></person-group><article-title>Bibliometric studies and the discipline of social media mental health research. Comment on &#x201C;Machine learning for mental health in social media: bibliometric study&#x201D;</article-title><source>J Med Internet Res</source><year>2021</year><month>06</month><day>17</day><volume>23</volume><issue>6</issue><fpage>e28990</fpage><pub-id pub-id-type="doi">10.2196/28990</pub-id><pub-id pub-id-type="medline">34137722</pub-id></nlm-citation></ref><ref id="ref66"><label>66</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Gonz&#x00E1;lez-M&#x00E1;rquez</surname><given-names>R</given-names> </name><name name-style="western"><surname>Schmidt</surname><given-names>L</given-names> </name><name name-style="western"><surname>Schmidt</surname><given-names>BM</given-names> </name><name name-style="western"><surname>Berens</surname><given-names>P</given-names> </name><name name-style="western"><surname>Kobak</surname><given-names>D</given-names> </name></person-group><article-title>The landscape of biomedical research</article-title><source>Patterns (N Y)</source><year>2024</year><month>06</month><day>14</day><volume>5</volume><issue>6</issue><fpage>100968</fpage><pub-id pub-id-type="doi">10.1016/j.patter.2024.100968</pub-id><pub-id pub-id-type="medline">39005482</pub-id></nlm-citation></ref><ref id="ref67"><label>67</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Menke</surname><given-names>JD</given-names> </name><name name-style="western"><surname>Kilicoglu</surname><given-names>H</given-names> </name><name name-style="western"><surname>Smalheiser</surname><given-names>NR</given-names> </name></person-group><article-title>Publication type tagging using transformer models and multi-label classification</article-title><source>AMIA Annu Symp Proc</source><year>2024</year><volume>2024</volume><fpage>818</fpage><lpage>827</lpage><pub-id pub-id-type="medline">40417522</pub-id></nlm-citation></ref><ref id="ref68"><label>68</label><nlm-citation citation-type="confproc"><person-group person-group-type="author"><name name-style="western"><surname>Singh</surname><given-names>A</given-names> </name><name name-style="western"><surname>D&#x2019;Arcy</surname><given-names>M</given-names> </name><name name-style="western"><surname>Cohan</surname><given-names>A</given-names> </name><name name-style="western"><surname>Downey</surname><given-names>D</given-names> </name><name name-style="western"><surname>Feldman</surname><given-names>S</given-names> </name></person-group><article-title>SciRepEval: a multi-format benchmark for scientific document representations</article-title><conf-name>2023 Conference on Empirical Methods in Natural Language Processing</conf-name><conf-date>Dec 6-10, 2023</conf-date><pub-id pub-id-type="doi">10.18653/v1/2023.emnlp-main.338</pub-id></nlm-citation></ref><ref id="ref69"><label>69</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Lezhnina</surname><given-names>O</given-names> </name></person-group><article-title>Depression, anxiety, and burnout in academia: topic modeling of PubMed abstracts</article-title><source>Front Res Metr Anal</source><year>2023</year><volume>8</volume><fpage>1271385</fpage><pub-id pub-id-type="doi">10.3389/frma.2023.1271385</pub-id><pub-id pub-id-type="medline">38090103</pub-id></nlm-citation></ref><ref id="ref70"><label>70</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Bonakala</surname><given-names>S</given-names> </name><name name-style="western"><surname>Aupetit</surname><given-names>M</given-names> </name><name name-style="western"><surname>Bensmail</surname><given-names>H</given-names> </name><name name-style="western"><surname>El-Mellouhi</surname><given-names>F</given-names> </name></person-group><article-title>A human-in-the-loop approach for visual clustering of overlapping materials science data</article-title><source>Digital Discovery</source><year>2024</year><month>03</month><day>13</day><volume>3</volume><issue>3</issue><fpage>502</fpage><lpage>513</lpage><pub-id pub-id-type="doi">10.1039/D3DD00179B</pub-id></nlm-citation></ref><ref id="ref71"><label>71</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Lam</surname><given-names>CS</given-names> </name><name name-style="western"><surname>Zhou</surname><given-names>K</given-names> </name><name name-style="western"><surname>Loong</surname><given-names>HHF</given-names> </name><name name-style="western"><surname>Chung</surname><given-names>VCH</given-names> </name><name name-style="western"><surname>Ngan</surname><given-names>CK</given-names> </name><name name-style="western"><surname>Cheung</surname><given-names>YT</given-names> </name></person-group><article-title>The use of traditional, complementary, and integrative medicine in cancer: data-mining study of 1 million web-based posts from health forums and social media platforms</article-title><source>J Med Internet Res</source><year>2023</year><month>04</month><day>21</day><volume>25</volume><fpage>e45408</fpage><pub-id pub-id-type="doi">10.2196/45408</pub-id><pub-id pub-id-type="medline">37083752</pub-id></nlm-citation></ref><ref id="ref72"><label>72</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Wu</surname><given-names>X</given-names> </name><name name-style="western"><surname>Lam</surname><given-names>CS</given-names> </name><name name-style="western"><surname>Hui</surname><given-names>KH</given-names> </name><etal/></person-group><article-title>Perceptions in 3.6 million web-based posts of online communities on the use of cancer immunotherapy: data mining using BERTopic</article-title><source>J Med Internet Res</source><year>2025</year><month>02</month><day>10</day><volume>27</volume><fpage>e60948</fpage><pub-id pub-id-type="doi">10.2196/60948</pub-id><pub-id pub-id-type="medline">39928933</pub-id></nlm-citation></ref><ref id="ref73"><label>73</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Dahabreh</surname><given-names>IJ</given-names> </name><name name-style="western"><surname>Bibbins-Domingo</surname><given-names>K</given-names> </name></person-group><article-title>Causal inference about the effects of interventions from observational studies in medical journals</article-title><source>JAMA</source><year>2024</year><month>06</month><day>4</day><volume>331</volume><issue>21</issue><fpage>1845</fpage><lpage>1853</lpage><pub-id pub-id-type="doi">10.1001/jama.2024.7741</pub-id><pub-id pub-id-type="medline">38722735</pub-id></nlm-citation></ref><ref id="ref74"><label>74</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Hua</surname><given-names>Y</given-names> </name><name name-style="western"><surname>Na</surname><given-names>H</given-names> </name><name name-style="western"><surname>Li</surname><given-names>Z</given-names> </name><etal/></person-group><article-title>A scoping review of large language models for generative tasks in mental health care</article-title><source>NPJ Digit Med</source><year>2025</year><month>04</month><day>30</day><volume>8</volume><issue>1</issue><fpage>230</fpage><pub-id pub-id-type="doi">10.1038/s41746-025-01611-4</pub-id><pub-id pub-id-type="medline">40307331</pub-id></nlm-citation></ref><ref id="ref75"><label>75</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Deiner</surname><given-names>MS</given-names> </name><name name-style="western"><surname>Deiner</surname><given-names>NA</given-names> </name><name name-style="western"><surname>Hristidis</surname><given-names>V</given-names> </name><etal/></person-group><article-title>Use of large language models to assess the likelihood of epidemics from the content of tweets: infodemiology study</article-title><source>J Med Internet Res</source><year>2024</year><month>03</month><day>1</day><volume>26</volume><fpage>e49139</fpage><pub-id pub-id-type="doi">10.2196/49139</pub-id><pub-id pub-id-type="medline">38427404</pub-id></nlm-citation></ref><ref id="ref76"><label>76</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Vogel</surname><given-names>EA</given-names> </name><name name-style="western"><surname>Ranker</surname><given-names>LR</given-names> </name><name name-style="western"><surname>Harrell</surname><given-names>PT</given-names> </name><etal/></person-group><article-title>Characteristics of adolescents&#x2019; and young adults&#x2019; exposure to and engagement with nicotine and tobacco product content on social media</article-title><source>J Health Commun</source><year>2024</year><month>06</month><day>2</day><volume>29</volume><issue>6</issue><fpage>383</fpage><lpage>393</lpage><pub-id pub-id-type="doi">10.1080/10810730.2024.2355291</pub-id><pub-id pub-id-type="medline">38775659</pub-id></nlm-citation></ref><ref id="ref77"><label>77</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Xu</surname><given-names>J</given-names> </name><name name-style="western"><surname>Wu</surname><given-names>Z</given-names> </name><name name-style="western"><surname>Wass</surname><given-names>L</given-names> </name><name name-style="western"><surname>Larson</surname><given-names>HJ</given-names> </name><name name-style="western"><surname>Lin</surname><given-names>L</given-names> </name></person-group><article-title>Mapping global public perspectives on mRNA vaccines and therapeutics</article-title><source>NPJ Vaccines</source><year>2024</year><month>11</month><day>14</day><volume>9</volume><issue>1</issue><fpage>218</fpage><pub-id pub-id-type="doi">10.1038/s41541-024-01019-3</pub-id><pub-id pub-id-type="medline">39543153</pub-id></nlm-citation></ref><ref id="ref78"><label>78</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Shen</surname><given-names>Y</given-names> </name><name name-style="western"><surname>Luo</surname><given-names>Z</given-names> </name><name name-style="western"><surname>Song</surname><given-names>X</given-names> </name><name name-style="western"><surname>Liu</surname><given-names>C</given-names> </name></person-group><article-title>Research on the evolution of cross-platform online public opinion for public health emergencies considering stakeholders</article-title><source>PLoS ONE</source><year>2024</year><volume>19</volume><issue>6</issue><fpage>e0304877</fpage><pub-id pub-id-type="doi">10.1371/journal.pone.0304877</pub-id></nlm-citation></ref><ref id="ref79"><label>79</label><nlm-citation citation-type="confproc"><person-group person-group-type="author"><name name-style="western"><surname>Ma</surname><given-names>X</given-names> </name><name name-style="western"><surname>Strube</surname><given-names>M</given-names> </name><name name-style="western"><surname>Zhao</surname><given-names>W</given-names> </name></person-group><article-title>Graph-based clustering for detecting semantic change across time and languages</article-title><conf-name>18th Conference of the European Chapter of the Association for Computational Linguistics</conf-name><conf-date>Mar 17-22, 2024</conf-date><pub-id pub-id-type="doi">10.18653/v1/2024.eacl-long.93</pub-id></nlm-citation></ref><ref id="ref80"><label>80</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Yue</surname><given-names>Z</given-names> </name><name name-style="western"><surname>Yu</surname><given-names>G</given-names> </name></person-group><article-title>Effects of policy communication changes on social media: before and after policy adjustment</article-title><source>Systems</source><year>2025</year><volume>13</volume><issue>4</issue><fpage>248</fpage><pub-id pub-id-type="doi">10.3390/systems13040248</pub-id></nlm-citation></ref><ref id="ref81"><label>81</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Mourtzouchou</surname><given-names>A</given-names> </name><name name-style="western"><surname>Marin</surname><given-names>AL</given-names> </name><name name-style="western"><surname>Laveneziana</surname><given-names>L</given-names> </name><etal/></person-group><article-title>Comparative analysis of public and expert perceptions of electrified vehicles in the European Union</article-title><source>Sci Rep</source><year>2025</year><month>07</month><day>1</day><volume>15</volume><issue>1</issue><fpage>21695</fpage><pub-id pub-id-type="doi">10.1038/s41598-025-06071-0</pub-id><pub-id pub-id-type="medline">40594306</pub-id></nlm-citation></ref></ref-list><app-group><supplementary-material id="app1"><label>Multimedia Appendix 1</label><p>Supplementary figures to support the study.</p><media xlink:href="jmir_v28i1e86200_app1.pdf" xlink:title="PDF File, 4182 KB"/></supplementary-material><supplementary-material id="app2"><label>Multimedia Appendix 2</label><p>Keyword burst detection (Kleinberg bursts) and parameter robustness assessment.</p><media xlink:href="jmir_v28i1e86200_app2.pdf" xlink:title="PDF File, 236 KB"/></supplementary-material><supplementary-material id="app3"><label>Multimedia Appendix 3</label><p>Mathematical derivation and computational process documentation for the hybrid semantic-structural bibliometric analysis pipeline.</p><media xlink:href="jmir_v28i1e86200_app3.pdf" xlink:title="PDF File, 353 KB"/></supplementary-material><supplementary-material id="app4"><label>Multimedia Appendix 4</label><p>Micro-level interpretive triangulation with selected articles (evidence mapping).</p><media xlink:href="jmir_v28i1e86200_app4.pdf" xlink:title="PDF File, 632 KB"/></supplementary-material><supplementary-material id="app5"><label>Multimedia Appendix 5</label><p>List of included publications.</p><media xlink:href="jmir_v28i1e86200_app5.xlsx" xlink:title="XLSX File, 157 KB"/></supplementary-material><supplementary-material id="app6"><label>Multimedia Appendix 6</label><p>Supplementary tables to support the study.</p><media xlink:href="jmir_v28i1e86200_app6.pdf" xlink:title="PDF File, 394 KB"/></supplementary-material><supplementary-material id="app7"><label>Multimedia Appendix 7</label><p>Interpretation and rationale for retaining the Hierarchical Density-Based Spatial Clustering of Applications with Noise unassigned noise set (cluster 1: candidate incubator pool of peripheral cross-cutting topics in health-related social media mining).</p><media xlink:href="jmir_v28i1e86200_app7.pdf" xlink:title="PDF File, 188 KB"/></supplementary-material><supplementary-material id="app8"><label>Checklist 1</label><p>BIBLIO checklist.</p><media xlink:href="jmir_v28i1e86200_app8.pdf" xlink:title="PDF File, 162 KB"/></supplementary-material><supplementary-material id="app9"><label>Checklist 2</label><p>PRISMA-S checklist.</p><media xlink:href="jmir_v28i1e86200_app9.pdf" xlink:title="PDF File, 277 KB"/></supplementary-material></app-group></back></article>