This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
Although the potential of big data analytics for health care is well recognized, evidence is lacking on its effects on public health.
The aim of this study was to assess the impact of the use of big data analytics on people’s health based on the health indicators and core priorities in the World Health Organization (WHO) General Programme of Work 2019/2023 and the European Programme of Work (EPW), approved and adopted by its Member States, in addition to SARS-CoV-2–related studies. Furthermore, we sought to identify the most relevant challenges and opportunities of these tools with respect to people’s health.
Six databases (MEDLINE, Embase, Cochrane Database of Systematic Reviews via Cochrane Library, Web of Science, Scopus, and Epistemonikos) were searched from the inception date to September 21, 2020. Systematic reviews assessing the effects of big data analytics on health indicators were included. Two authors independently performed screening, selection, data extraction, and quality assessment using the AMSTAR-2 (A Measurement Tool to Assess Systematic Reviews 2) checklist.
The literature search initially yielded 185 records, 35 of which met the inclusion criteria, involving more than 5,000,000 patients. Most of the included studies used patient data collected from electronic health records, hospital information systems, private patient databases, and imaging datasets, and involved the use of big data analytics for noncommunicable diseases. “Probability of dying from any of cardiovascular, cancer, diabetes or chronic renal disease” and “suicide mortality rate” were the most commonly assessed health indicators and core priorities within the WHO General Programme of Work 2019/2023 and the EPW 2020/2025. Big data analytics have shown moderate to high accuracy for the diagnosis and prediction of complications of diabetes mellitus as well as for the diagnosis and classification of mental disorders; prediction of suicide attempts and behaviors; and the diagnosis, treatment, and prediction of important clinical outcomes of several chronic diseases. Confidence in the results was rated as “critically low” for 25 reviews, as “low” for 7 reviews, and as “moderate” for 3 reviews. The most frequently identified challenges were establishment of a well-designed and structured data source, and a secure, transparent, and standardized database for patient data.
Although the overall quality of included studies was limited, big data analytics has shown moderate to high accuracy for the diagnosis of certain diseases, improvement in managing chronic diseases, and support for prompt and real-time analyses of large sets of varied input data to diagnose and predict disease outcomes.
International Prospective Register of Systematic Reviews (PROSPERO) CRD42020214048; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=214048
Big data analytics tools handle complex datasets that traditional data processing systems cannot efficiently and economically store, manage, or process. Through the application of artificial intelligence (AI) algorithms and machine learning (ML), big data analytics has potential to revolutionize health care, supporting clinicians, providers, and policymakers for planning or implementing interventions [
In 2018, the World Health Organization (WHO) proposed the expedited 13th General Programme of Work (GPW13), which was approved and adopted by its 194 Member States, focusing on measurable impacts on people’s health at the state level to transform public health with three core features: enhanced universal health coverage, health emergencies protection, and better health and well-being [
Therefore, the aim of this study was to provide an overview of systematic reviews that assessed the effects of the use of big data analytics on people’s health according to the WHO core features defined in the GPW13 and the EPW. We included complex reviews that assessed multiple interventions, different populations, and differing outcomes resulting from big data analytics on people’s health, and identified the challenges, opportunities, and best practices for future research.
This study was designed to provide an overview of systematic reviews in accordance with guidelines from the Cochrane Handbook for Systematic Reviews of Interventions, along with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and the QUOROM (Quality of Reporting of Meta-analyses) guidelines [
To identify records assessing the effect of big data analytics on people’s health, aligned with the WHO health indicators defined in the GPW13 (
References were imported into reference management software (EndNote X9) and duplicates were removed. Unique records were uploaded onto the Covidence Platform (Veritas Health Innovation) for screening, data extraction, and quality assessment. A manual search of reference lists was performed to supplement the search.
Number of persons affected by disasters (per 100,000 population)
Domestic general government health expenditure (% of general government expenditure)
Prevalence of stunting in children under 5 (%)
Prevalence of wasting in children under 5 (%)
Prevalence of overweight in children under 5 (%)
Maternal mortality ratio (per 100,000 live births)
Proportion of births attended by skilled health personnel (%)
Under 5 mortality rate (per 1000 live births)
Neonatal mortality rate (per 1000 live births)
New HIV infections (per 1000 uninfected population)
Tuberculosis incidence (per 100,000 population)
Malaria incidence (per 1000 population at risk)
Hepatitis B incidence (measured by surface antigen [HBsAg] prevalence among children under 5 years)
Number of people requiring interventions against neglected tropical diseases (NTDs)
Probability of dying from any of cardiovascular disease (CVD), cancer, diabetes, chronic renal disease (CRD) (aged 30-70 years) (%)
Suicide mortality rate (per 100,000 population)
Coverage of treatment interventions for substance-use disorders (%)
Total alcohol per capita consumption in adults aged >15 years (liters of pure alcohol)
Road traffic mortality rate (per 100,000 population)
Proportion of women (aged 15-49 years) having need for family planning satisfied with modern methods (%)
Universal Health Coverage (UHC) Service Coverage Index
Population with household expenditures on health >10% of total household expenditure or income (%)
Mortality rate attributed to air pollution (per 100,000 population)
Mortality rate attributed to exposure to unsafe water, sanitation, and hygiene (WASH) services (per 100,000 population)
Mortality rate from unintentional poisoning (per 100,000 population)
Prevalence of tobacco use in adults aged ≥15 years (%)
Proportion of population covered by all vaccines included in national programs (diphtheria-tetanus-pertussis vaccine, measles-containing-vaccine second dose, pneumococcal conjugated vaccine) (%)
Proportion of health facilities with essential medicines available and affordable on a sustainable basis (%)
Density of health workers (doctors, nurse and midwives, pharmacists, dentists per 10,000 population)
International Health Regulations capacity and health emergency preparedness
Proportion of bloodstream infections due to antimicrobial-resistant organisms (%)
Proportion of children under 5 years developmentally on track (health, learning, and psychosocial well-being) (%)
Proportion of women (aged 15-49 years) subjected to violence by current or former intimate partner (%)
Proportion of women (aged 15-49 years) who make their own decisions regarding sexual relations, contraceptive use, and reproductive health care (%)
Proportion of population using safely managed drinking-water services (%)
Proportion of population using safely managed sanitation services and hand-washing facilities (%)
Proportion of population with primary reliance on clean fuels (%)
Annual mean concentrations of fine particulate matter (PM2.5) in urban areas (μg/m3)
Proportion of children (aged 1-17 years) experiencing physical or psychological aggression (%)
Vaccine coverage for epidemic-prone diseases
Proportion of vulnerable people in fragile settings provided with essential health services (%)
Prevalence of raised blood pressure in adults aged ≥18 years
Effective policy/regulation for industrially produced trans-fatty acids
Prevalence of obesity (%)
Number of cases of poliomyelitis caused by wild poliovirus
Patterns of antibiotic consumption at the national level
Peer-reviewed publications categorized as systematic reviews assessing the effects of big data analytics on any of the GPW13 and EPW health indicators and core priorities were included, regardless of language and study design. We only considered studies in which the search was performed in at least two databases, and included a description of the search strategy and the methodology used for study selection and data extraction. We only included studies that evaluated concrete relationships between the use of big data analytics and its effect on people’s lives, according to the WHO strategic priorities and indicators. Along with the 46 indicators listed in
Although big data analysis is capable of handling large volumes of data, rather than focusing on the data volume/size, we focused on the process that defines big data analytics, which includes the following phases [
The following data were extracted from the retrieved articles: publication information, journal name and impact factor, study characteristics, big data characteristics, outcomes, lessons and barriers for implementation, and main limitations. Data were individually extracted by team members and cross-checked for accuracy by a second investigator.
Two researchers independently assessed the studies using the AMSTAR 2 (A Measurement Tool to Assess Systematic Reviews 2) checklist, which includes the following critical domains, assessed in 16 items: protocol registered prior to review, adequacy of literature search, justification for excluded studies, risk of bias in included studies, appropriateness of meta-analytic methods, consideration of bias risk when interpreting results, and assessing the presence and likely impact of publication bias [
Results are reported in summary tables and through a narrative synthesis, grouping studies assessing the same disease or condition, and identifying challenges and opportunities. We also schematically represent the evidence and gaps from these reviews as an overall synthesis.
The search retrieved 1536 publications, 112 of which were duplicates. Most of the studies were excluded after title and abstract analysis (n=1237), leaving 185 selected for full-text screening, and 35 [
Flow chart of the different phases of article retrieval.
No standard critical appraisal tools were mentioned. Among the 12 reviews that performed any quality assessment, the Quality Assessment of Diagnostic Accuracy Studies 2 tool was used in four reviews demonstrating an overall low risk of bias [
Summary features and main findings of the 35 systematic reviews are presented in
Many reviews included data collected from electronic medical records, hospital information systems, or any databank that used individual patient data to create predictive models or evaluate collective patterns [
The purposes of the reviews varied broadly. Generally, they (1) outlined AI applications in different medical specialties; (2) analyzed features for the detection, prediction, or diagnosis of multiple diseases or conditions; or (3) pinpointed challenges and opportunities.
Most of the studies assessed the effects of big data analytics on noncommunicable diseases [
AI tools associated with big data analytics in the care of patients with diabetes mellitus (DM) were assessed in six reviews that included 345 primary studies [
Various studies assessed the ability of big data analytics to predict individual DM complications such as hypoglycemia, nephropathy, and others [
Five reviews reported on AI, data mining, and ML in psychiatry/psychology [
The use of ML algorithms for early detection of psychiatric conditions was also reported [
Only one review used social media to generate analyzable data on the prevention, recognition, and support for severe mental illnesses [
Two reviews reported the application of big data analytics and ML to better understand the current novel coronavirus pandemic [
Another review focused on SARS-CoV-2 immunization, and proposed that AI could expedite vaccine discovery through studying the virus’s capabilities, virulence, and genome using genetic databanks. That study merged discussions of deep learning–based drug screening for predicting the interaction between protein and ligands, and using imaging results linked to AI tools for detecting SARS-CoV-2 infections.
Four studies described the utility of ML, computerized clinical decision systems, and deep learning in oncology [
One study evaluated ML techniques in a genomic study of head and neck cancers, and found a wide range of accuracy rates (56.7% to 99.4%) based on the use of genomic data in prognostic prediction. Lastly, two studies reported accuracy levels ranging from 68% to 99.6% when using deep learning algorithms in the automatic detection of pulmonary nodules in computerized tomography images.
Six studies described the effect of big data analytics in cardiology [
Similarly, two studies assessed the use of intelligent systems in diagnosing acute coronary syndrome and heart failure [
Scores to identify patients at higher risk to develop QT-interval prolongation have been developed, and predictive analytics incorporated into clinical decision support tools have been tested for their ability to alert physicians of individuals who are at risk of or have QT-interval prolongation [
Regarding stroke, two systematic reviews evaluated using ML models for predicting outcomes and diagnosing cerebral ischemic events [
Several studies reported significant improvement in disease diagnosis and event prediction using big data analytics tools, including remarkable enhancement of sepsis prediction using ML techniques [
One review focused on the diagnostic accuracy of AI systems in analyzing radiographic images for pulmonary tuberculosis, mostly referring to development instead of clinical evaluation [
One review also assessed multiple sclerosis diagnosis. Among detection methodologies, rule-based and natural language processing methods were deemed to have superior diagnostic performance based of elevated accuracy and positive predictive value [
Asthma exacerbation events and predictive models for early detection were evaluated in one review, which reached a pooled diagnostic ability of 77% (95% CI 73%-80%) [
Two reviews analyzed the use of big data analytics and AI in public health [
Two systematic reviews provided narrative evaluations of the challenges of big data analytics in health care [
This overview is the first to assess the effects of big data analytics on the prioritized WHO indicators, which offers utility for noncommunicable diseases and the ongoing COVID-19 pandemic. Although the research question focused on the impact of big data analytics on people’s health, studies assessing the impact on clinical outcomes are still scarce. Most of the reviews assessed performance values using big data tools and ML techniques, and demonstrated their applications in medical practice. Most of the reviews were associated with the GPW13 indicator “probability of dying from any cardiovascular disease, cancer, diabetes, chronic respiratory disease.” This indicator outranks others because of the incidence, prevalence, premature mortality, and economic impact of these diseases [
The low to moderate quality of evidence suggests that big data analytics has moderate to high accuracy for the (1) diagnosis and prediction of complications of DM, (2) diagnosis of mental diseases, (3) prediction of suicidal behaviors, and (4) diagnosis of chronic diseases. Most studies presented performance values, although no study assessed whether big data analytics or ML could improve the early detection of specific diseases.
Clinical research and clinical trials significantly contribute to understanding the patterns and characteristics of diseases, as well as for improving detection of acute or chronic pathologies and to guide the development of novel medical interventions [
Many systematic reviews reported simple or inappropriate evaluation measures for the task at hand. The most common metric used to evaluate the performance of a classification predictive model is accuracy, which is calculated as the proportion of correct predictions in the test set divided by the total number of predictions that were made on the test set. This metric is easy to use and to interpret, as a single number summarizes the model capability. However, accuracy values and error rate, which is simply the complement of accuracy, are not adequate for skewed or imbalanced classification tasks (ie, when the distribution of observations in the training dataset across the classes is not equal), because of the bias toward the majority class. When the distribution is slightly skewed, accuracy can still be a useful metric; however, when the distribution is severely skewed, accuracy becomes an unreliable measure of model performance.
For instance, in a binary classification task with a distribution of (95%, 5%) for the classes (eg, healthy vs sick), a “dumb classifier” that simply chooses the class “healthy” for all instances will have 95% of accuracy in this task, although the most important issue in this task would be correctly classifying the “sick” class. Precision (also called the positive predictive value), which captures the fraction of correctly classified instances among the instances predicted for a given class (eg, “sick”); recall or sensitivity, which captures the fraction of instances of a class (eg, “sick”) that were correctly classified; and F-measure, the harmonic mean of precision and recall calculated per class of interest, are more robust metrics for several practical situations. The proper choice of an evaluation metric should be carefully determined, as these indices ought to be used by regulatory bodies for screening tests and not for diagnostic reasoning [
Another pitfall identified among the included reviews was the lack of reporting the precise experimental protocols used for testing ML algorithms and the specific type of replication performed.
There is no formal tool for assessing quality and risk of bias in big data studies. This is an area that is ripe for development. In
High variability in the results was evident across different ML techniques and approaches among the 35 reviews, even for those assessing the same disease or condition. Indeed, designing big data analysis and ML experiments involves elevated model complexity and commonly requires testing of several modeling algorithms [
Only two published systematic reviews evaluated the impact of big data analytics on the COVID-19 pandemic. Primary studies on COVID-19 are lacking, which indicates an opportunity to apply big data and ML to this and future epidemics/pandemics [
Although DSS are an important application of big data analytics and may benefit patient care [
This overview of systematic reviews updates the available evidence from multiple primary studies intersecting computer science, engineering, medicine, and public health. We used a comprehensive search strategy (performed by an information specialist) with a predefined published protocol, precise inclusion criteria, rigorous data extraction, and quality assessment of retrieved records. We avoided reporting bias through the dual and blinded examination of systematic reviews and by having one review author standardizing the extracted data.
Different evaluation measures such as accuracy, area under the receiver operating characteristic curve, precision, recall, and F-measure capture different aspects of the task and are influenced by data characteristics such as skewness (ie, imbalance), sampling bias, etc. Choose your measures wisely and justify your choice based on the aforementioned aspects of the task and the data.
Authors should use experimental protocols based on cross-validation or multiple training/validation/test splits of the employed datasets with more than one repetition of the experimental procedure. The objective of this criterion is to analyze whether the study assesses the capacity of generalization of each method compared in the experiments. The use of a single default split of the input dataset with only one training/test split does not fit this requirement. Repetitions are essential to demonstrate the generalization of the investigated methods for multiple training and test sets, and to avoid any suspicion of a “lucky” (single) partition that favors the authors’ method.
The effectiveness of big data solutions and machine-learning methods is highly affected by the choice of the parameters of these methods (ie, parameter tuning). The wrong or improper choice of parameters may make a highly effective method exhibit very poor behavior in a given task. Ideally, the parameters should be chosen for each specific task and dataset using a partition of the training set (ie, validation), which is different from the dataset used to train and to test the model. This procedure is known as cross-validation on the training set or nested cross-validation.
Even if the tuning of all methods is properly executed, this should be explicitly reported in the paper, with the exact values (or range of values) used for each parameter and the best choices used. When the tuning information is missing or absent, it is impossible to determine whether the methods have been implemented appropriately and if they have achieved their maximum potential in a given task. It is also impossible to assess whether the comparison is fair, as some methods may have been used at their maximum capacity and others not.
Authors should employ statistical significance tests to contrast the compared strategies in their experimental evaluation. Statistical tests are essential to assess whether the performance of the analyzed methods in the sample (ie, the considered datasets) is likely to reflect, with certain confidence, their actual performance in the whole population. As such, they are key to support any claim of superiority of a particular method over others. Without such tests, the relative performance observed in the sample cannot, by any means, be extrapolated to the population. The choice of the tests should also reflect the characteristics of the data (ie, determining whether the data follow a normal distribution).
One of the issues that hampers reproducibility of studies, and therefore scientific progress, is the lack of original implementation (with proper documentation) of the methods and techniques, and the unavailability of the original data used to test the methods. Therefore, it is important to make all data, models, code, documentation, and other digital artifacts used in the research available for others to reuse. The artifacts made available must be sufficient to ensure that published results can be accurately reproduced.
Effectiveness of the solutions, as captured by accuracy-oriented measures, is not the only dimension that should be evaluated. Indeed, if the effectiveness of the studied models is similar and sufficient for a given health-related application, other dimensions such as time efficiency (or the costs) to train and deploy (test) the models are essential to evaluate the practical applicability of such solutions. Another dimension that may influence the decision for the practical use of a big data or a machine-learning method in a real practical situation is the ability to understand why the model has produced certain outputs (ie, explainability). Solutions such as those based on neural networks may be highly effective when presented with huge amounts of data, but their training and deployment costs as well as their opaqueness may not make them the best choice for a given health-related application.
However, limitations exist. The inferior quality scores based on the AMSTAR 2 tool might reflect incomplete reporting and lack of adherence to substandardized review methods. There is neither an established bias risk tool specifically for big data or ML studies nor any systematic way of presenting the findings of such studies. Furthermore, most studies provided a narrative description of results, requiring summarization. Nevertheless, all of the reviews were inspected by most authors, and the most relevant data were condensed in the text or in descriptive tables.
Big data analytics provide public health and health care with powerful instruments to gather and analyze large volumes of heterogeneous data. Although research in this field has been growing exponentially in the last decade, the overall quality of evidence is found to be low to moderate. High variability of results was observed across different ML techniques and approaches, even for the same disease or condition. The diversity of big data tools and ML algorithms require proper standardization of protocols and comparative approaches, and the process of tuning the hyperparameters of the algorithms is not uniformly reported. Important characteristics essential for replicability and external validation were not frequently available.
Additionally, the included reviews in this systematic review addressed different health-related tasks; however, studies assessing the impact on clinical outcomes remain scarce. Thus, evidence of applicability in daily medical practice is still needed. Further studies should focus on how big data analytics impact clinical outcomes and on creating proper methodological guidelines for reporting big data/ML studies, as well as using robust performance metrics to assess accuracy.
Search strategy used in the research.
Quality assessment judgment using the AMSTAR 2 tool.
Main characteristics of included studies.
Results and limitations of included systematic reviews.
artificial intelligence
area under the receiver operating characteristic curve
A Measurement Tool to Assess Systematic Reviews 2
convolutional neural network
diabetes mellitus
decision support system
European Programme of Work
Thirteenth General Programme of Work
machine learning
Preferred Reporting Items for Systematic Reviews and Meta-analyses
Quality of Reporting of Meta-analyses
random forest
support vector machine
World Health Organization
We highly appreciate the efforts provided by our experienced librarian Maria Björklund from Lund University, who kindly prepared the search strategy used in this research. In addition, we thank Anneliese Arno (University College of London and Covidence Platform) for providing guidance in performing this research through Covidence. We also thank Raisa Eda de Resende, Edson Amaro Júnior, and Kaíque Amâncio Alvim for helping the group with data extraction and double-checking the input data.
IJBdN, MM, MG, NAM, and DNO designed the study. HA, IW, and IJBdN performed first- and second-stage screening, and extracted the presented data. MM solved any disagreements. HA, IW, and IBdN carried out the quality assessment. IJBdN, MM, MG, and DNO drafted the manuscript and its final version. DNO and NAM are staff members of the WHO. The authors alone are responsible for the views expressed in this article and they do not necessarily represent the decisions, policy, or views of the WHO.
None declared.