Published on in Vol 26 (2024)

This is a member publication of National University of Singapore

Preprints (earlier versions) of this paper are available at, first published .
Comparing Open-Access Database and Traditional Intensive Care Studies Using Machine Learning: Bibliometric Analysis Study

Comparing Open-Access Database and Traditional Intensive Care Studies Using Machine Learning: Bibliometric Analysis Study

Comparing Open-Access Database and Traditional Intensive Care Studies Using Machine Learning: Bibliometric Analysis Study

Authors of this article:

Yuhe Ke1 Author Orcid Image ;   Rui Yang2 Author Orcid Image ;   Nan Liu2 Author Orcid Image

Original Paper

1Division of Anesthesiology and Perioperative Medicine, Singapore General Hospital, Singapore, Singapore

2Centre for Quantitative Medicine, Duke-NUS Medical School, National University of Singapore, Singapore, Singapore

*these authors contributed equally

Corresponding Author:

Nan Liu, PhD

Centre for Quantitative Medicine

Duke-NUS Medical School

National University of Singapore

8 College Road

Singapore, 169857


Phone: 65 66016503


Background: Intensive care research has predominantly relied on conventional methods like randomized controlled trials. However, the increasing popularity of open-access, free databases in the past decade has opened new avenues for research, offering fresh insights. Leveraging machine learning (ML) techniques enables the analysis of trends in a vast number of studies.

Objective: This study aims to conduct a comprehensive bibliometric analysis using ML to compare trends and research topics in traditional intensive care unit (ICU) studies and those done with open-access databases (OADs).

Methods: We used ML for the analysis of publications in the Web of Science database in this study. Articles were categorized into “OAD” and “traditional intensive care” (TIC) studies. OAD studies were included in the Medical Information Mart for Intensive Care (MIMIC), eICU Collaborative Research Database (eICU-CRD), Amsterdam University Medical Centers Database (AmsterdamUMCdb), High Time Resolution ICU Dataset (HiRID), and Pediatric Intensive Care database. TIC studies included all other intensive care studies. Uniform manifold approximation and projection was used to visualize the corpus distribution. The BERTopic technique was used to generate 30 topic-unique identification numbers and to categorize topics into 22 topic families.

Results: A total of 227,893 records were extracted. After exclusions, 145,426 articles were identified as TIC and 1301 articles as OAD studies. TIC studies experienced exponential growth over the last 2 decades, culminating in a peak of 16,378 articles in 2021, while OAD studies demonstrated a consistent upsurge since 2018. Sepsis, ventilation-related research, and pediatric intensive care were the most frequently discussed topics. TIC studies exhibited broader coverage than OAD studies, suggesting a more extensive research scope.

Conclusions: This study analyzed ICU research, providing valuable insights from a large number of publications. OAD studies complement TIC studies, focusing on predictive modeling, while TIC studies capture essential qualitative information. Integrating both approaches in a complementary manner is the future direction for ICU research. Additionally, natural language processing techniques offer a transformative alternative for literature review and bibliometric analysis.

J Med Internet Res 2024;26:e48330



The start of critical care as a medical subspecialty can be traced back to a polio epidemic during which a substantial number of patients needed prolonged mechanical ventilation [1]. Over time, the field of critical care has experienced significant growth and continual evolution. Research in this field has played a pivotal role in unraveling the complexities of numerous diseases and treatment modalities, driving substantial advancements in clinical practice over the past decades [2]. Groundbreaking studies have investigated critical areas such as sepsis, mechanical ventilation, acute lung and kidney injuries, intensive care unit (ICU) delirium, and sedation in critically ill patients [3].

These research studies have often been conducted in traditional ways such as prospective and randomized controlled trials [4], cohort and observational studies, clinical trials [5], and clinical and translational research [6]. These traditional methods have revolutionized patient care and improved outcomes significantly. For instance, the implementation of protocol-driven, goal-directed management of sepsis and appropriate fluid therapy has led to remarkable reductions in mortality rates [7,8], and these findings have been integral in developing evidence-based practice guidelines that are now the gold standard [9,10].

Despite their undeniable merits, traditional research methods in intensive care also come with several limitations [11]. Clinical trials are known for their high costs [12], stringent standardization requirements, and ethical oversight [13]. Data collection can be laborious, prone to human errors, and constrained in terms of quantity and granularity [14]. Moreover, obtaining patient consent for most randomized controlled trials in the ICU poses challenges [15], necessitating alternative consent models. These limitations have become increasingly apparent as medical complexity continues to grow exponentially [16].

The advent of electronic health records (EHRs) has heralded a new era in clinical research by facilitating the digitization of health care systems [17]. In this era of data science, a more integrated approach can be adopted, using machine learning (ML) algorithms to tackle the complexity of critical illness [18]. Open-access databases (OADs), such as the Medical Information Mart for Intensive Care (MIMIC) database [19] and the Philips eICU Collaborative Research Database (eICU-CRD) [20], have played a transformative role by enabling free data sharing.

The concept of free and open databases plays a pivotal role in promoting data sharing and advancing medical knowledge in accordance with the findable, accessible, interoperable, and reusable (FAIR) guiding principle. The FAIR principles, which emphasize that data should be findable, accessible, interoperable, and reusable, are essential for fostering a collaborative and transparent scientific research environment [21,22]. By removing barriers to access, free, and open databases allow researchers, regardless of their affiliations or resources, to contribute to and benefit from the collective pool of information. Accessibility fosters inclusivity and diversity in research, promoting a broader range of perspectives and approaches to medical challenges. This democratization of knowledge leads to a more equitable distribution of information. Researchers can now leverage these vast repositories of information for ML and artificial intelligence studies, marking a departure from traditional intensive care (TIC) research approaches.

Conducting a literature review [23] to investigate the disparities between traditional ICU research and studies based on open-access data sets holds significant importance as it provides a comprehensive understanding of the strengths and limitations of the latter. However, conventional methods of literature reviews and bibliometric analysis have their limitations, especially when dealing with large-scale literature due to computational complexity and the labor-intensive nature of manual interpretations [24-26]. To address these challenges, natural language processing (NLP) offers a promising avenue, while topic modeling techniques can be used to extract various topic themes from extensive data sets [27,28].

Built on the foundations of bidirectional encoder representations from transformers (BERT), BERTopic introduces a novel approach to topic modeling [29,30]. Unlike traditional unsupervised models like latent Dirichlet allocation, which rely on “bag-of-words” model [31], BERTopic overcomes the problem of semantic information loss, significantly enhancing the accuracy of generated topics, and providing more interpretable compositions for each topic, which greatly facilitates the classification of topics.

With the aid of BERTopic, this study aims to shed light on the disparities and commonalities between studies conducted through OADs and TIC research. By analyzing the overall trends and patterns in these 2 groups, we seek to identify knowledge gaps and explore avenues for complementary contributions between these research approaches.


Data Filtering

We performed an ML-based analysis of research abstracts in the Web of Science (WoS) database to automatically categorize the research papers to conduct this literature mapping analysis. There was no limit to the year of publication of the articles. The search query consisted of the following keyword to identify all the studies that were published under the umbrella of intensive care: (“ICU” OR “intensive care”). The search terms were deliberately left to be broad to cover broad spectrums of journals in the field.

The inclusion criteria were as follows: (1) written in English, (2) articles that had keywords related to intensive care, (3) articles that had the article type of “article” or “review.” We excluded articles with incomplete data fields (eg, title, abstract, publication year, and paper citation). The articles included were then further processed to identify if they were studies using OADs. These articles were labeled as “open-access database,” while the rest of the articles extracted were labeled as “traditional intensive care.”

The search used for this study was performed on January 18, 2023, from WoS. This generated 227,893 search results, which were subsequently reselected using Python. An advanced search from PubMed was done based on the broad search terms of ICU studies used from previous Cochrane ICU literature review [32] to ensure the accuracy of the results. The numbers corroborated with a discrepancy of 4.9% (227,893 WoS keyword search vs 239,748 PubMed ICU keyword search).

Selection Criteria for OADs

A title search using keywords from all currently existing OADs was conducted to identify OAD studies. These include (1) MIMIC [19], (2) eICU-CRD [20], (3) Amsterdam University Medical Centers Database (AmsterdamUMCdb) [33], (4) High Time Resolution ICU Dataset (HiRID) [34], and (5) Pediatric Intensive Care database [35]. We avoided including only keywords in the search and restricted the search years by the year that the OAD was made publicly available to reduce the inadvertent inclusion of incorrect articles due to keywords. For instance, the search term for OADs published with the MIMIC database included title keyword search with the terms (“MIMIC-IV” OR “MIMIC-III” OR “MIMIC-II” OR “MIMIC Dataset” OR “medical information mart for intensive care” OR “MIMIC IV” OR “MIMIC III” OR “MIMIC II”) in studies that were published after 2003. The title keyword search for the searches and the year of cutoff for each OAD are presented in Multimedia Appendix 1.

Furthermore, to ensure the accuracy of the supervised keyword classification, a manual review of the classification by 2 critical care physicians was done for 100 articles from each category that were randomly selected. The review was done independently with the physicians labeling the extract publications into OAD and TICs. An accuracy of 99% was achieved on independent reviews, and full agreement was achieved after discussion on the discrepancy. The final results were matched with the supervised keyword classification.

We performed a bibliometric analysis by directly extracting publication details from the WoS database using Python (Python Software Foundation). The analysis involved assessing the number of articles published per year, calculating total citation counts, and identifying the top journals that published intensive care-related articles. Comprehensive results are presented in Multimedia Appendix 2.

Data Analysis

Uniform Manifold Approximation and Projection

Uniform manifold approximation and projection (UMAP) is a manifold learning technique for dimension reduction, which can identify key structures in high-dimensional data space and map them to low-dimensional space to accomplish dimensionality reduction. Compared to other dimensionality reduction algorithms, such as principal component analysis [36], UMAP can retain more global features [37]. In this paper, we constructed a corpus consisting of abstract words from all studies. However, due to the massive size of the corpus, visualizing and analyzing the high-dimensional data to explore the differences in the vocabulary patterns between the OAD and TIC studies is a challenge. The UMAP package in Python, which implements the UMAP algorithm, was used to project the high-dimensional corpus to 4 dimensions. By cross plotting each dimension, we were able to investigate underlying differences in corpus distribution between OAD and TIC studies.


Topic modeling can help us explore the similarities and differences between research topics in OAD and TIC studies. Unlike conventional topic modeling models, BERTopic uses the BERT framework for embeddings, enabling a deeper understanding of semantic relationships [30]. The BERTopic model was implemented by the BERTopic package in Python and divided 146,727 studies into 30 topic IDs. We also performed latent Dirichlet allocation topic modeling through Python’s LdaModel package for comparison. Through the review of topic keywords by 2 critical care physicians, BERTopic exhibited superior accuracy and sophistication in topic identification, with enhanced interpretability and scientific rigor.

Consequently, the BERTopic model was used for the final analysis. Each of these topics was given a corresponding clinical research category. The overlapping categories were merged into topic families for easier comparisons. By using these advanced techniques, we were able to uncover hidden patterns and relationships within the literature and provide insights into the current state of intensive care research.

A total of 227,893 records were identified from the WoS database on January 18, 2023, of which 195,463 full records were subsequently processed. Records were excluded if they are not “article” or “review” or if they do not contain keywords related to intensive care. After exclusions, 145,426 articles were identified as TIC studies and 1301 articles were categorized as OAD (Figure 1).

Figure 1. PRISMA (Preferred Reporting Items for Systematic Review and Meta-Analysis) diagram of the study. The final studies were divided into open-access database (OAD) and traditional intensive care (TIC) studies.

We examined the number of articles published per year to analyze the trends in TIC and OAD studies (Figure 2). Over the past 2 decades, TIC studies have experienced exponential growth, culminating in a peak of 16,378 articles in 2021. A subsequent decline in the number of publications occurred in 2022, likely attributable to delayed indexing within the WoS database and a reduction in COVID-19–related studies as the pandemic stabilized [38]. In contrast, the first OAD study emerged in 2003, with its popularity experiencing a consistent upsurge since 2018. Nonetheless, the number of OAD publications remains markedly lower in comparison to TIC publications.

Figure 2. Number of publications of open-access database (OAD) and traditional intensive care (TIC) studies by year. The first study in the OAD category started in the year 2003.

The OAD studies were published most frequently in new open-access journals such as Frontiers in Medicine, Frontiers in Cardiovascular Medicine, and Scientific Reports while the TIC studies were published most frequently in established journals like Critical Care Medicine, Intensive Care Medicine, and Critical Care (Multimedia Appendix 2). Further analysis of keywords from the abstracts showed 2.4% (3492/145,426) TIC studies were meta-analyses or systematic reviews, while only 0.08% (1/1301) OAD study was in this category. There were 5.61% (73/1301) OAD studies, and 7.43% (10,799/145,426) TIC studies that had the keyword of “cost.” Examples of the data fields that are available within OADs such as MIMIC and eICU-CRD are listed in Textbox 1. Some information fields such as end-of-life goals and values and health care provider psychology are not available within the current EHRs extracted for OADs.

Textbox 1. Examples of information available in the current open-access databases (OADs) and examples of information not available in OADs.

Examples of information that is available in current OADs

  • Patient information: demographics and social set-up
  • Hospital context: admission time and discharge time, intensive care unit (ICU) and hospital admissions, and pre-ICU admission
  • Diagnosis: physician-curated ICU diagnosis and data-driven phenotypes
  • Intervention: medications, procedures, and organ support
  • Diagnostics: blood test, microbiology, and scans
  • Clinical texts: clinical notes and diagnostic reports
  • Physiological monitoring: basic monitoring and waveforms

Examples of information that is not readily available in current OADs

  • Patient information: family set up and visiting, financial information, and special populations
  • Hospital context: post-ICU discharge details, delayed admission or discharge, and health personnel psychology
  • Diagnosis: pre-ICU history and diagnosis requiring clinical symptoms
  • Intervention: indications for interventions, complications, and intraoperative and postoperative
  • Diagnostics: pathology photographs, imaging, and molecular or genetic studies
  • Clinical texts: patient narratives, end-of-life goals and patient value, and health personnel behavior
  • Physiological monitoring: advanced monitoring

The UMAP algorithm was used to project the high-dimension corpus to 4 dimensions and allowed exploration of the vocabulary patterns between the OAD and TIC studies (Figure 3). The projection values are represented by the x-axis, while the densities are represented by the y-axis. The presence of considerable overlap between TIC studies and OAD studies suggests that they share a substantial number of common terminologies, which may correspond to similar research topics. Nonetheless, TIC studies exhibit a more extensive coverage than OAD studies, which may stem from broader research scope and extended research duration.

Figure 3. Corpus distribution of open-access database and traditional intensive care studies along dimensions 1-4. OAD: open-access database; TIC: traditional intensive care.

Subsequently, the BERTopic model was then used to generate 30 topic IDs (Figure 4). The internal commonalities of each topic ID were reviewed by critical care physicians and assigned a specific subtopic in intensive care research. The model was able to automatically classify the topics with high interpretability and the topic components were interpreted with ease. For instance, components in topic ID 5 consist of, in decreasing order of weightage: “learning,” “model,” “machine,” “machine learning,” “models,” “data,” “prediction,” and “performance.” This topic was consequently labeled “predictive model” (topic ID 5 in Multimedia Appendix 3).

Figure 4. The ratio of open-access database and traditional intensive care studies within each topic was identified by BERTopic.

The overall topic distribution in TIC studies was more uniform, while the OAD studies tended to be concentrated on several topics including topic ID 2 (kidney injury), 5 (predictive model), and 13 (sepsis). Some topics that were missing in OAD studies included 6 (pediatrics care), 21 (viral infections), 23 (health personnel and psychology), and 28 (nutrition and rehabilitation).

The similarity matrix shows that there was little overlap between the topics (Multimedia Appendix 4). To facilitate the interpretability of the categories, the overlapping topic IDs were merged to form the final 22 topic families (Multimedia Appendix 3).

Topics such as “healthcare associated infection,” “thoracic surgeries,” and “pregnancy related” research were among the more frequently discussed 15 topics in TIC studies but have limited publications in OAD studies. The topics of “predictive model,” “obesity,” and “fungal infections” were popular in OAD studies but not the TIC studies. Overall, the topic distributions of the TIC studies were distributed more evenly with the topic family of sepsis accounting for a quarter of the studies, while publications in the OAD studies were heavily skewed toward the predictive model (>40%) and sepsis (>30%; Figure 5).

Figure 5. The top 15 topic families represented in open-access database and traditional intensive care studies.

Principal Results

This study conducted a comprehensive review and bibliometric analysis of OAD and TIC studies. NLP was used to facilitate this large-scale literature review. Studies using OADs mainly concentrated on a few topics, such as predictive modeling, while TIC studies covered a wider range of topics with a more balanced distribution.

Advantages of OAD Studies

OAD studies offer several advantages that have contributed to their increasing popularity in intensive care research. The granularity of data and easy access to large-cohort databases, such as MIMIC [39], has enabled researchers to perform predictive modeling and conduct various secondary analyses efficiently [40,41]. This accessibility has provided valuable opportunities for exploring specific aspects of patient care, evident in studies investigating phenomena like “weekend effects” and circadian rhythms in ICU patients before discharge [42-46]. The vast amount of longitudinal and time series data available in OADs has also facilitated the implementation of complex ML and deep learning methods [47].

Limitations of OAD Studies

However, it is crucial to acknowledge the retrospective nature of OAD data, which inherently limits the assessment of confounding factors and the ability to draw strong causal conclusions. The observational design of OAD studies may result in lower-quality evidence according to the GRADE (Grading of Recommendations, Assessment, Development, and Evaluations) framework [48,49], and thus, the research from OAD studies has yet to be fully integrated into existing evidence-based guidelines, as exemplified by the omission of OAD studies in the 2021 sepsis guidelines [50]. Nevertheless, OADs remain a valuable resource for supplementing and complementing TIC studies, providing unique insights and enhanced predictive scores for intensive care settings.

Furthermore, approximately 50% of the studies using OADs published focused on predictive modeling. The increased usage of ML methods in predictive modeling has not been without critique. Some medical prediction problems inherently possess linear characteristics, and the selection of features may predominantly focus on already known strong predictors, leading to limited improvements in prediction accuracy with ML [51]. Additionally, interstudy heterogeneity poses a challenge in comparing results obtained from different ML models applied to the same data sets [52]. The ethical implications of relying solely on ML models to make high-risk health care decisions instead of involving clinical expertise are also relevant considerations [51,53].

While OADs provide comprehensive patient data, there are certain limitations in their ability to capture specific information essential for certain critical care research areas. Notably, data fields related to qualitative aspects such as ethics and end-of-life care [54,55], and health care personnel psychology [56] may be challenging, if not impossible, to obtain through OADs generated from EHRs. Consequently, TIC studies have played a pivotal role in addressing these limitations by capturing critical information that is integral to understanding ethical considerations, patient experiences, and health care provider psychology in intensive care [57,58].

Synergy Between OAD and TIC Studies

The synergy between OAD and TIC studies is a promising approach to enhance the comprehensiveness and robustness of intensive care research. OADs, with their large cohort sizes, can serve as external validation cohorts for ML models developed from TIC studies, potentially reducing the sample sizes required for prospective research. Furthermore, OAD studies can corroborate the results of TIC studies, benefiting from larger sample sizes and real-world data, thus providing more practical insights for implementation in intensive care settings [43]. The integration of OAD and TIC studies presents an opportunity to bridge the gaps in data availability and research methodologies, ultimately enriching the understanding and practice of critical care medicine.

Potential Impact of NLP

The usage of large language models such as BERTopic has proven to be a valuable tool for large-scale literature review and topic extraction [58]. This approach has enabled accurate, reliable, and granular topic generation, offering clinicians a more effective means of interpreting data compared to traditional bag-of-words models [59]. The potential of NLP to analyze scientific articles and identify trends and knowledge gaps holds promise for shaping the future of research in critical care medicine. As the volume of publications in critical care continues to grow and large language modeling continues to advance in health care [60], AI technology will be crucial in efficiently identifying and predicting emerging trends.

Future Directions

Future research in the field of critical care can explore novel applications of ML beyond predictive modeling. For instance, using ML to study patterns in how papers are cited, shared, and discussed on the web could help predict their potential impact on the scientific community. This analysis can aid in identifying highly influential papers and understanding the factors that contribute to their recognition. Additionally, investigations into methods for enhancing the interpretability and transparency of ML algorithms in critical care research would further facilitate the ethical and responsible use of AI technologies.

Strengths and Limitations

The study’s application of NLP-driven in analyzing scientific articles and identifying trends highlights the potential impact of AI technologies in streamlining literature reviews and identifying emerging trends more efficiently.

Another notable strength of this study is the usage of the WoS database, the world’s oldest and most extensively used repository of research publications and citations, encompassing approximately 34,000 journals [61]. The comprehensiveness of this database provides a robust representation of the literature in the field of intensive care research. Nevertheless, it is essential to acknowledge that some articles published in nonindexed journals might not have been captured, and future studies could benefit from considering additional databases to supplement our findings.

One other limitation lies in the classification of OAD and TIC studies, which may be subject to variations in the interpretation of keywords. However, we optimized the keyword combinations during the search process in the WoS database and implemented Python filtering techniques, resulting in a relatively high level of accuracy in our classifications. The number of studies was further corroborated with a manual search on PubMed and a review of the classifications of the studies was done by critical care physicians.

Although there were no specific language restrictions, the nature of the search term being in English inadvertently excluded valuable contributions from non-English research. This may potentially limit the generalizability of our findings to a broader international audience. In future investigations, the inclusion of articles from various languages could offer a more comprehensive and diverse perspective on intensive care research.


This study has provided valuable insights into the expanding landscape of intensive care research through a comprehensive bibliometric analysis of a large number of publications by leveraging NLP technologies. While OAD studies have demonstrated significant promise, it is essential to view them as a complementary approach rather than a replacement for TIC studies. The unique strength of TIC studies lies in their ability to capture crucial qualitative information, which is essential for comprehensive and ethical decision-making. The integration of both OAD and TIC studies offers a synergistic approach to enriching our understanding of critical care medicine and advancing patient care outcomes. As NLP technology continues to advance, it holds the potential to offer a feasible and transformative alternative for literature review and bibliometric analysis.


We thank Dr Nicholas Brian Shannon for assistance with the manual review of the supervised keyword classification. This work was supported by the Duke-NUS Signature Research Programme, funded by the Ministry of Health, Singapore.

Data Availability

The data sets generated during and/or analyzed during this study are available from the corresponding author on reasonable request. The complete set of code used in this study is readily available for download on GitHub [62].

Authors' Contributions

YK and NL played key roles in the conceptualization of the project. RY was responsible for formalizing the methodology and conducting data curation with the advisory of YK. YK contributed to the validation of the data, ensuring its relevance to the research objectives. RY took the lead in visualizing the data. Both YK and RY drafted the original manuscript. NL served as the project supervisor, overseeing the implementation, and providing valuable input in the writing, review, and editing phases.

Conflicts of Interest

None declared.

Multimedia Appendix 1

Search terms for open-access database (OAD) studies with the cutoff by the years of publications.

DOCX File , 15 KB

Multimedia Appendix 2

Top 20 journals ranked by total citation in which the open-access database and traditional intensive care studies were published. The average citation per article was obtained with the total citation/total number of articles. The citation counts were obtained from Web of Science.

DOCX File , 12 KB

Multimedia Appendix 3

Topic ID and topic family and the components and weightage in each of the categories.

DOCX File , 18 KB

Multimedia Appendix 4

Similarity matrix of 30 topics.

DOCX File , 107 KB

  1. Kelly FE, Fong K, Hirsch N, Nolan JP. Intensive care medicine is 60 years old: the history and future of the intensive care unit. Clin Med (Lond). 2014;14(4):376-379. [FREE Full text] [CrossRef] [Medline]
  2. Cook D, Brower R, Cooper J, Brochard L, Vincent JL. Multicenter clinical research in adult critical care. Crit Care Med. 2002;30(7):1636-1643. [CrossRef] [Medline]
  3. Rosenberg AL, Tripathi RS, Blum J. The most influential articles in critical care medicine. J Crit Care. 2010;25(1):157-170. [FREE Full text] [CrossRef] [Medline]
  4. Granholm A, Alhazzani W, Derde LPG, Angus DC, Zampieri FG, Hammond NE, et al. Randomised clinical trials in critical care: past, present and future. Intensive Care Med. 2022;48(2):164-178. [FREE Full text] [CrossRef] [Medline]
  5. Markey KA, Ottridge R, Mitchell JL, Rick C, Woolley R, Ives N, et al. Assessing the efficacy and safety of an 11β-hydroxysteroid dehydrogenase type 1 inhibitor (AZD4017) in the idiopathic intracranial hypertension drug trial, IIH:DT: clinical methods and design for a phase II randomized controlled trial. JMIR Res Protoc. 2017;6(9):e181. [FREE Full text] [CrossRef] [Medline]
  6. Verdonk F, Feyaerts D, Badenes R, Bastarache JA, Bouglé A, Ely W, et al. Upcoming and urgent challenges in critical care research based on COVID-19 pandemic experience. Anaesth Crit Care Pain Med. 2022;41(5):101121. [FREE Full text] [CrossRef] [Medline]
  7. Gurnani PK, Patel GP, Crank CW, Vais D, Lateef O, Akimov S, et al. Impact of the implementation of a sepsis protocol for the management of fluid-refractory septic shock: a single-center, before-and-after study. Clin Ther. 2010;32(7):1285-1293. [CrossRef] [Medline]
  8. Wang JL, Chin CS, Chang MC, Yi CY, Shih SJ, Hsu JY, et al. Key process indicators of mortality in the implementation of protocol-driven therapy for severe sepsis. J Formos Med Assoc. 2009;108(10):778-787. [FREE Full text] [CrossRef] [Medline]
  9. Levy MM, Evans LE, Rhodes A. The surviving sepsis campaign bundle: 2018 update. Crit Care Med. 2018;46(6):997-1000. [FREE Full text] [CrossRef] [Medline]
  10. Fan E, Del Sorbo L, Goligher EC, Hodgson CL, Munshi L, Walkey AJ, et al. An official American Thoracic Society/European Society of Intensive Care Medicine/Society of Critical Care Medicine clinical practice guideline: mechanical ventilation in adult patients with acute respiratory distress syndrome. Am J Respir Crit Care Med. 2017;195(9):1253-1263. [FREE Full text] [CrossRef] [Medline]
  11. Goldfrad C, Vella K, Bion JF, Rowan KM, Black NA. Research priorities in critical care medicine in the UK. Intensive Care Med. 2000;26(10):1480-1488. [CrossRef] [Medline]
  12. Moore TJ, Heyward J, Anderson G, Alexander GC. Variation in the estimated costs of pivotal clinical benefit trials supporting the US approval of new therapeutic agents, 2015-2017: a cross-sectional study. BMJ Open. 2020;10(6):e038863. [FREE Full text] [CrossRef] [Medline]
  13. Umscheid CA, Margolis DJ, Grossman CE. Key concepts of clinical trials: a narrative review. Postgrad Med. 2011;123(5):194-204. [FREE Full text] [CrossRef] [Medline]
  14. Maré IA, Kramer B, Hazelhurst S, Nhlapho MD, Zent R, Harris PA, et al. Electronic data capture system (REDCap) for health care research and training in a resource-constrained environment: technology adoption case study. JMIR Med Inform. 2022;10(8):e33402. [FREE Full text] [CrossRef] [Medline]
  15. O'Hearn K, Gibson J, Krewulak K, Porteous R, Saigle V, Sampson M, et al. Consent models in Canadian critical care randomized controlled trials: a scoping review. Can J Anaesth. 2022;69(4):513-526. [FREE Full text] [CrossRef] [Medline]
  16. Ghassemi M, Celi LA, Stone DJ. State of the art review: the data revolution in critical care. Crit Care. 2015;19(1):118. [FREE Full text] [CrossRef] [Medline]
  17. Bates DW, Saria S, Ohno-Machado L, Shah A, Escobar G. Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health Aff (Millwood). 2014;33(7):1123-1131. [CrossRef] [Medline]
  18. Mlodzinski E, Wardi G, Viglione C, Nemati S, Crotty Alexander L, Malhotra A. Assessing barriers to implementation of machine learning and artificial intelligence-based tools in critical care: web-based survey study. JMIR Perioper Med. 2023;6:e41056. [FREE Full text] [CrossRef] [Medline]
  19. Johnson AEW, Bulgarelli L, Shen L, Gayles A, Shammout A, Horng S, et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci Data. 2023;10(1):1. [FREE Full text] [CrossRef] [Medline]
  20. Pollard TJ, Johnson AEW, Raffa JD, Celi LA, Mark RG, Badawi O. The eICU collaborative research database, a freely available multi-center database for critical care research. Sci Data. 2018;5:180178. [FREE Full text] [CrossRef] [Medline]
  21. Inau ET, Sack J, Waltemath D, Zeleke AA. Initiatives, concepts, and implementation practices of FAIR (findable, accessible, interoperable, and reusable) data principles in health data stewardship practice: protocol for a scoping review. JMIR Res Protoc. 2021;10(2):e22505. [FREE Full text] [CrossRef] [Medline]
  22. Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A, et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data. 2016;3:160018. [FREE Full text] [CrossRef] [Medline]
  23. Bahadoran Z, Mirmiran P, Kashfi K, Ghasemi A. Importance of systematic reviews and meta-analyses of animal studies: challenges for animal-to-human translation. J Am Assoc Lab Anim Sci. 2020;59(5):469-477. [FREE Full text] [CrossRef] [Medline]
  24. Haidich AB. Meta-analysis in medical research. Hippokratia. 2010;14(Suppl 1):29-37. [FREE Full text] [Medline]
  25. Thompson DF, Walker CK. A descriptive and historical review of bibliometrics with applications to medical sciences. Pharmacotherapy. 2015;35(6):551-559. [CrossRef] [Medline]
  26. Donthu N, Kumar S, Mukherjee D, Pandey N, Lim WM. How to conduct a bibliometric analysis: an overview and guidelines. J Bus Res. 2021;133:285-296. [FREE Full text] [CrossRef]
  27. Zhao W, Chen JJ, Perkins R, Liu Z, Ge W, Ding Y, et al. A heuristic approach to determine an appropriate number of topics in topic modeling. BMC Bioinformatics. 2015;16(Suppl 13):S8. [FREE Full text] [CrossRef] [Medline]
  28. Doanvo A, Qian X, Ramjee D, Piontkivska H, Desai A, Majumder M. Machine learning maps research needs in COVID-19 literature. Patterns (N Y). 2020;1(9):100123. [FREE Full text] [CrossRef] [Medline]
  29. Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. ArXiv. Preprint posted online on May 24 2019. 2018. [CrossRef]
  30. Grootendorst M. BERTopic: neural topic modeling with a class-based TF-IDF procedure. ArXiv. Preprint posted online on March 11 2022. 2022. [CrossRef]
  31. Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res. 2003;3:993-1022. [FREE Full text]
  32. Delaney A, Bagshaw SM, Ferland A, Manns B, Laupland KB, Doig CJ. A systematic evaluation of the quality of meta-analyses in the critical care literature. Crit Care. 2005;9(5):R575-R582. [FREE Full text] [CrossRef] [Medline]
  33. Thoral PJ, Peppink JM, Driessen RH, Sijbrands EJG, Kompanje EJO, Kaplan L, et al. Sharing ICU patient data responsibly under the Society of Critical Care Medicine/European Society of Intensive Care Medicine joint data science collaboration: the Amsterdam university medical centers database (AmsterdamUMCdb) example. Crit Care Med. 2021;49(6):e563-e577. [FREE Full text] [CrossRef] [Medline]
  34. Faltys M, Zimmermann M, Lyu X, Hüser M, Hyland S, Rätsch G, et al. HiRID, a high time-resolution ICU dataset. PhysioNet. 2021. URL: [accessed 2024-04-02]
  35. Zeng X, Yu G, Lu Y, Tan L, Wu X, Shi S, et al. PIC, a paediatric-specific intensive care database. Sci Data. 2020;7(1):14. [FREE Full text] [CrossRef] [Medline]
  36. Maćkiewicz A, Ratajczak W. Principal components analysis (PCA). Comput Geosci. 1993;19(3):303-342. [CrossRef]
  37. McInnes L, Healy J, Melville J. UMAP: uniform manifold approximation and projection for dimension reduction. ArXiv. Preprint posted online on September 18 2020. 2018. [CrossRef]
  38. Murray CJL. COVID-19 will continue but the end of the pandemic is near. Lancet. 2022;399(10323):417-419. [FREE Full text] [CrossRef] [Medline]
  39. Mark R. The story of MIMIC. In: Secondary Analysis of Electronic Health Records. Cham, Switzerland. Springer International Publishing; 2016;43-49.
  40. Alghatani K, Ammar N, Rezgui A, Shaban-Nejad A. Predicting intensive care unit length of stay and mortality using patient vital signs: machine learning model development and validation. JMIR Med Inform. 2021;9(5):e21347. [FREE Full text] [CrossRef] [Medline]
  41. Liu D, Zheng M, Sepulveda NA. Using artificial neural network condensation to facilitate adaptation of machine learning in medical settings by reducing computational burden: model design and evaluation study. JMIR Form Res. 2021;5(12):e20767. [FREE Full text] [CrossRef] [Medline]
  42. Zhang Z, Ho KM, Hong Y. Machine learning for the prediction of volume responsiveness in patients with oliguric acute kidney injury in critical care. Crit Care. 2019;23(1):112. [FREE Full text] [CrossRef] [Medline]
  43. Meyer A, Zverinski D, Pfahringer B, Kempfert J, Kuehne T, Sündermann SH, et al. Machine learning for real-time prediction of complications in critical care: a retrospective study. Lancet Respir Med. 2018;6(12):905-914. [CrossRef] [Medline]
  44. Chen H, Zhu Z, Zhao C, Guo Y, Chen D, Wei Y, et al. Central venous pressure measurement is associated with improved outcomes in septic patients: an analysis of the MIMIC-III database. Crit Care. 2020;24(1):433. [FREE Full text] [CrossRef] [Medline]
  45. Faust L, Feldman K, Chawla NV. Examining the weekend effect across ICU performance metrics. Crit Care. 2019;23(1):207. [FREE Full text] [CrossRef] [Medline]
  46. Davidson S, Villarroel M, Harford M, Finnegan E, Jorge J, Young D, et al. Vital-sign circadian rhythms in patients prior to discharge from an ICU: a retrospective observational analysis of routinely recorded physiological data. Crit Care. 2020;24(1):181. [FREE Full text] [CrossRef] [Medline]
  47. Xie F, Yuan H, Ning Y, Ong MEH, Feng M, Hsu W, et al. Deep learning for temporal data representation in electronic health records: a systematic review of challenges and methodologies. J Biomed Inform. 2022;126:103980. [FREE Full text] [CrossRef] [Medline]
  48. Guyatt GH, Oxman AD, Kunz R, Vist GE, Falck-Ytter Y, Schünemann HJ, et al. GRADE Working Group. What is "quality of evidence" and why is it important to clinicians? BMJ. 2008;336(7651):995-998. [FREE Full text] [CrossRef] [Medline]
  49. Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso-Coello P, et al. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ. 2008;336(7650):924-926. [FREE Full text] [CrossRef] [Medline]
  50. Evans L, Rhodes A, Alhazzani W, Antonelli M, Coopersmith CM, French C, et al. Surviving sepsis campaign: international guidelines for management of sepsis and septic shock 2021. Crit Care Med. 2021;49(11):e1063-e1143. [FREE Full text] [CrossRef] [Medline]
  51. Volovici V, Syn NL, Ercole A, Zhao JJ, Liu N. Steps to avoid overuse and misuse of machine learning in clinical research. Nat Med. 2022;28(10):1996-1999. [FREE Full text] [CrossRef] [Medline]
  52. Fleuren LM, Klausch TLT, Zwager CL, Schoonmade LJ, Guo T, Roggeveen LF, et al. Machine learning for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test accuracy. Intensive Care Med. 2020;46(3):383-400. [FREE Full text] [CrossRef] [Medline]
  53. Yoon CH, Torrance R, Scheinerman N. Machine learning in medicine: should the pursuit of enhanced interpretability be abandoned? J Med Ethics. 2022;48(9):581-585. [FREE Full text] [CrossRef] [Medline]
  54. Gillett GR. Intensive care unit research ethics and trials on unconscious patients. Anaesth Intensive Care. 2015;43(3):309-312. [CrossRef] [Medline]
  55. Aulisio MP, Chaitin E, Arnold RM. Ethics and palliative care consultation in the intensive care unit. Crit Care Clin. 2004;20(3):505-523, x-xi. [CrossRef] [Medline]
  56. Raudenská J, Steinerová V, Javůrková A, Urits I, Kaye AD, Viswanath O, et al. Occupational burnout syndrome and post-traumatic stress among healthcare professionals during the novel coronavirus disease 2019 (COVID-19) pandemic. Best Pract Res Clin Anaesthesiol. 2020;34(3):553-560. [FREE Full text] [CrossRef] [Medline]
  57. Davidson JE, Jones C, Bienvenu OJ. Family response to critical illness: postintensive care syndrome-family. Crit Care Med. 2012;40(2):618-624. [CrossRef] [Medline]
  58. White DB, Angus DC, Shields AM, Buddadhumaruk P, Pidro C, Paner C, et al. A randomized trial of a family-support intervention in intensive care units. N Engl J Med. 2018;378(25):2365-2375. [FREE Full text] [CrossRef] [Medline]
  59. Popoff B, Occhiali É, Grangé S, Bergis A, Carpentier D, Tamion F, et al. Trends in major intensive care medicine journals: a machine learning approach. J Crit Care. 2022;72:154163. [CrossRef] [Medline]
  60. Yang R, Tan TF, Lu W, Thirunavukarasu AJ, Ting DSW, Liu N. Large language models in health care: development, applications, and challenges. Health Care Sci. 2023;2(4):255-263. [FREE Full text] [CrossRef]
  61. Birkle C, Pendlebury DA, Schnell J, Adams J. Web of science as a data source for research on scientific and scholarly activity. Quant Sci Stud. 2020;1(1):363-376. [FREE Full text] [CrossRef]
  62. GitHub. URL:

AmsterdamUMCdb: Amsterdam University Medical Centers Database
BERT: bidirectional encoder representations from transformers
EHR: electronic health record
eICU-CRD: eICU Collaborative Research Database
FAIR: findable, accessible, interoperable, and reusable
GRADE: Grading of Recommendations, Assessment, Development, and Evaluations
HiRID: High Time Resolution ICU Dataset
ICU: intensive care unit
MIMIC: Medical Information Mart for Intensive Care
ML: machine learning
NLP: natural language processing
OAD: open-access database
TIC: traditional intensive care
UMAP: uniform manifold approximation and projection
WoS: Web of Science

Edited by A Mavragani; submitted 19.04.23; peer-reviewed by D Chrimes, S Pesälä; comments to author 14.07.23; revised version received 01.08.23; accepted 14.01.24; published 17.04.24.


©Yuhe Ke, Rui Yang, Nan Liu. Originally published in the Journal of Medical Internet Research (, 17.04.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.