This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.

Machine learning applications in health care have increased considerably in the recent past, and this review focuses on an important application in psychiatry related to the detection of depression. Since the advent of computational psychiatry, research based on functional magnetic resonance imaging has yielded remarkable results, but these tools tend to be too expensive for everyday clinical use.

This review focuses on an affordable data-driven approach based on electroencephalographic recordings. Web-based applications via public or private cloud-based platforms would be a logical next step. We aim to compare several different approaches to the detection of depression from electroencephalographic recordings using various features and machine learning models.

To detect depression, we reviewed published detection studies based on resting-state electroencephalogram with final machine learning, and to predict therapy outcomes, we reviewed a set of interventional studies using some form of stimulation in their methodology.

We reviewed 14 detection studies and 12 interventional studies published between 2008 and 2019. As direct comparison was not possible due to the large diversity of theoretical approaches and methods used, we compared them based on the steps in analysis and accuracies yielded. In addition, we compared possible drawbacks in terms of sample size, feature extraction, feature selection, classification, internal and external validation, and possible unwarranted optimism and reproducibility. In addition, we suggested desirable practices to avoid misinterpretation of results and optimism.

This review shows the need for larger data sets and more systematic procedures to improve the use of the solution for clinical diagnostics. Therefore, regulation of the pipeline and standard requirements for methodology used should become mandatory to increase the reliability and accuracy of the complete methodology for it to be translated to modern psychiatry.

As the World Health Organization has warned since 2007, depression may become the most frequent cause of global disability by 2030 [

Combining the knowledge and methodology used in computational neuroscience and psychiatry results in a discipline known as computational psychiatry. This field aims to determine the neurobiological underpinnings behind clusters of clinical symptoms, making it easier to adjust the treatment to patients on an individual level [

Computational psychiatry may be divided into 2 approaches: theory driven and data driven. The data-driven approach typically involves some type of machine learning and appears to be much more applicable than the theory-driven approach owing to the comparably lower data collection costs. Although the most popular work published over the last period applies the data-driven approach through the use of MRI or functional MRI (fMRI) data, the drawbacks of this approach are the subject of debate among researchers. In our opinion, it would be much more appropriate to rely on electroencephalographic data, given the lower costs and higher patient accessibility. Electroencephalogram (EEG) is the oldest form of neuroimaging (1924, Hans Berger) and is noninvasive and solidly based on neurology and neuroscience. In psychiatry, it is only used to confirm the existence of epileptiform. As compared with fMRI, for example, EEG is more suitable for frequent testing owing to the lesser time required for recording and the lower price of processing. Witten and Frank [

Another research area, physiological complexity, continues to be considered novel by many medical professionals. It is based on a complex systems dynamic theory (commonly called the

Over the past 10 years, the number of research studies using some form of machine learning on an EEG data set to detect depression or predict treatment outcomes related to the same is booming. This study aims to review the literature to offer a cross-section that is useful for determining current best practices. We have chosen to focus on the combination of physiological complexity (the application of nonlinear measures of analysis of EEG) and data-driven computational psychiatry approaches, as we believe that this combination may offer faster improvement in current clinical practices focusing on the treatment of depression.

This systematic literature review aims to find and compare published studies using nonlinear (and spectral) methods of analysis in combination with various machine learning methods for the detection of depression. Therefore, we established an inclusion criteria, as we were aware that many studies were published over the past decade. As we followed the literature for a significant amount of time, we established a start date of 2008 and an end date of May 2019.

Given the rapid development in this research area because of faster computers, cloud utilization, and improved internet performance, we believe that this is a sufficient inclusion period. We systematically searched the Web of Science and PubMed databases on May 24, 2019, using the following combination of keywords: (“Data mining” OR “machine learning”) AND (“EEG” OR “Electroencephalography”) AND (“Depression” OR “MDD”).

In addition, databases indexing both fields, such as Springer, Scopus, and ScienceDirect, were searched for relevant literature, including the Cornell repository.

After an original search yielding 197 papers, we reviewed all the titles and abstracts to determine which were in line with our search criteria.

Our eligibility criteria (eligibility testing) consisted of the following requirements: a study published between 2008 and 2019, detection of depression or predicting the outcome of treatment for depression, sample consisting of patients diagnosed with depression (major depressive disorder [MDD]) and healthy controls (HCs), EEG data set (preferably resting-state EEG), use of fractal and nonlinear analysis as features for machine learning, and use of machine learning for detection of depression. After a primary selection phase (in which we read all the publications independently), our sample consisted of 32 publications, which was decreased to 26 based on internal discussion and comparative analysis. After reading the entire text of each publication, we decided to include 14 detection studies and 12 interventional studies. In short, we only included EEG studies that were published over the past 12 years, using task classification performed by humans with electroencephalographic signals (excluding power analyses only, nonhuman feature selection, or those with no end classification studies) that carried out a machine-based learning task aimed at detecting depression. Many studies described mobile phone apps and web-based data collection (web-based psychiatry) using machine learning, but this has already been reviewed in another work [

Before conducting this systematic search, we created a list of study characteristics for comparison and to discuss the best practices and results. First, we compared the sample sizes, with only 1 intervention study being sufficiently large to analyze a sample of over 100 participants (and only 1 study consisted of only female subjects [

Our idea, from a nonlinear analysis perspective, is useful for analyzing resting-state records, as previous research has shown that they are the most information-rich [

The next stage of comparison considered the method used for data preprocessing, some of which used standard subbands (although there is yet to be any published data or evidence that dividing EEG into subbands has any physiological significance [

Studies also differed in how they chose to extract or select the features.

We also noted whether internal and external cross-validation was performed (and reported) and whether the study could potentially be replicated. Finally, we compared the methods of machine learning used in each work as well as their accuracy after the testing phase and their sensitivity and specificity. Another question considered was whether the studies used receiver operating characteristic (ROC) curves to verify their accuracy. We attempted to carry out an exhaustive analysis of those publications that complied with our eligibility criteria.

We reviewed 14 studies (classified as

Flow diagram showing common stages in the analysis of resting-state EEG in all studies with varying approaches to classification. EEG: electroencephalogram.

After recording the EEG (based on the previously specified method on resting-state EEG recorded with open or closed eyes, the method used to confirm the depression status, whether patients were medicated, what EEG recording standard was used, and how many electrode positions were involved), the preprocessing phase followed. Apart from standard filtering and the selection of sampling frequency, in physiological terms, the most important part involved artifact removal (manual, automatic, or no removal at all). After defining exact epochs for analysis (or better time series for further analysis), the following steps were discussed: feature extraction, feature selection (or dimensionality reduction phase), classification, validation, and the accuracy achieved in the machine learning testing phase. We also compared the conditions for study reproducibility.

One of the first studies using resting-state EEG to classify individuals with depression and HCs was carried out by Ahmadlou et al [

The differences between samples vary in terms of the tests used to confirm the status of the patients with MDD (DSM-IV, International Classification of Diseases, Beck Depression Scale, and Montgomery-Asberg Scale) as well as the state when recording (open or closed eyes or both). Studies also differed in terms of medication status of patients with MDD, with some being all unmedicated participants [

Another important aspect when comparing the selected studies was how the researchers recorded the resting-state EEG, under what conditions, and using how many electrodes (concerning the standard used in the

Ahmadlou et al [

Of the numerous potential options for preprocessing the signal, some common practices may be found in all the papers that were reviewed. For example, artifact removal may be performed either automatically or manually [

We are aware that a mathematical connection exists between the Fourier transform and, for example, the Higuchi fractal dimension [

Feature extraction refers to the creation of features, such as calculating various fractal and nonlinear measures from chosen epochs (time series) of raw signal traces. On the other hand, feature selection (or reduction of the problem dimensionality) helps to remove those features that are redundant or irrelevant. In this group of publications, different authors used different combinations of the two: Ahmadlou et al [

The results of our cross-section analysis are summarized in chronological order in

The extracted subbands were input to calculate several entropy measures: bispectral entropy (Ph, including higher order spectra technique, from Fourier analysis), Renyi entropy, approximate entropy, and sample entropy (SampEn).

The extraction of the subband process consisted of sending the original data through a sequence of down-sampling and low-pass filters that defined the transfer function (similar to classical spectra analysis, which distorts the information content of the signal, according to Klonowski [

A comparison of the previously mentioned studies comparing several characteristics, including their accuracy on the classification task.

Study | Sample (MDD^{a}+HC^{b}) |
Electrodes, frequency (Hz) | Preprocessing | Features | ML^{c} models |
Accuracy (%) |

Ahmadlou et al, 2012 [ |
12+12 | 7, 256 | Wavelets and spectral bands (Fourier), bootstrap | Higuchi and Katz FD^{d} |
Enhanced probabilistic neural networks | 91.30 |

Puthankattil and Joseph, 2012 [ |
30 (16 M^{e}+14 F^{f})+30 |
4, 256 | Wavelet, total variation filtering, multiresolution decomposition | Wavelet entropy | RWE^{g}, artificial feed forward networks |
98.11 |

Hosseinifard et al, 2014 [ |
45+45 | 19, 1 kHz | Standard spectral bands | Power, DFA^{h}, Higuchi, correlation dimension, Lyapunov exponent |
KNN^{i}, LR^{j}, linear discriminant |
90 |

Faust et al, 2014 [ |
30+30 | 4 (2 left, 2 right), 256 | Wavelet package decomposition | ApEn^{k}, SampEn^{l}, REN^{m}, bispectral phase entropy |
PNN^{n}, SVM^{o}, DT^{p}, KNN, NB^{q}, GMM^{r}, Fuzzy Gueno Classifier |
99.50 |

Bairy et al, 2015 [ |
30+30 (left brain only) | N/A^{s} |
Discrete cosine transform | SampEn, FD, CD^{t}, Hurst exp, LLE^{u}, DFA |
DT, KNN, NB, SVM | 93.80 |

Acharya et al, 2015 [ |
15+15 | 2 left, 2 right, 256 | Broadband | FD, LLE, SampEn, DFA, H^{v}, W-Bx^{w}, W_By^{x}, EntPh^{y}, Ent1^{z}, DET ^{aa}, ENTR^{ab}, LAM^{ac}, T2 (DDI)^{ad} |
SVM, KNN, NB, PNN, DT | 98 |

Mohammadi et al, 2015 [ |
53+43 | 28 (10/10), 500 | Standard bands/FFT^{ae}, LDA^{af}, genetic algorithm |
Spectral only | DT | 80 |

Puthankattil and Joseph, 2014 [ |
30+30 | 4, 256 | Wavelet package decomposition | Wavelet entropy, approximate entropy | NN^{ag} |
98 |

Liao et al, 2017 [ |
12+12 | 30, 500 | Common spatial pattern | Spectral (common spatial pattern) | KEFB-CSP^{ah} |
80 |

Mumtaz et al, 2018 [ |
34/18 F+30/9 F^{ai} |
19, 256 | REST^{aj} |
Synchronization likelihood | SVM, LR, NB | 87.50 |

Mumtaz et al, 2017 [ |
33+30 | 19 (EO^{ak}, EC^{al}), 256 |
Fourier | Alpha interhemispheric asymmetry | LR, SVM, NB | 98.40 |

Mumtaz et al, 2018 [ |
34+30 | 19, 256 | 10-fold cross-validation | Power, asymmetry, wavelet coefficients, Z-score | LR | 94 |

Bachmann et al, 2018 [ |
13+13 | 1, 1 kHz | Fourier | HFD^{am}, DFA, Lempel-Ziv complexity, and SASI^{an} |
Logistic regression | 88 |

Čukić et al, 2018/2020 [ |
26+20 | 19, 1 kHz | Broadband EEG^{ao}, 10-fold cross-validation, PCA^{ap} |
HFD+SampEn | MP^{aq}, LR, SVM (with linear and polynomial kernel), DT, RF^{ar}, NB |
97.50 |

^{a}MDD: major depressive disorder.

^{b}HC: healthy control.

^{c}ML: machine learning.

^{d}FD: fractal dimension.

^{e}M: male.

^{f}F: female.

^{g}RWE: relative wavelet energy.

^{h}DFA: detrended fluctuation analysis.

^{i}KNN: K-nearest neighbor.

^{j}LR: linear regression.

^{k}ApEn: approximate entropy.

^{l}SampEn: sample entropy.

^{m}REN: Renyi entropy.

^{n}PNN: probabilistic neural network.

^{o}SVM: support vector machine.

^{p}DT: decision tree.

^{q}NB: naïve Bayes.

^{r}GMM: Gaussian mixture model.

^{s}N/A: not applicable.

^{t}CD: correlation dimension.

^{u}LLE: largest Lyapunov exponent.

^{v}H: Hurst exponent.

^{w}W-Bx: higher order spectra features (weighted center of bispectrum [W_Bx]; Acharya et al [

^{x}W_By: higher order spectra features (weighted center of bispectrum [W_By]; Acharya et al [

^{y}EntPh: bispectrum phase entropy.

^{z}Ent1: normalized bispectral entropy.

^{aa}DET: determinism.

^{ab}ENTR: entropy.

^{ac}LAM: laminarity.

^{ad}T2 (DDI): recurrent times.

^{ae}FFT: fast Fourier transform.

^{af}LDA: linear discriminant analysis.

^{ag}NN: neural network.

^{ah}KEFB-CSP: kernel eigen-filter-bank common spatial pattern.

^{ai}34 depression patients (among them 18 females) and 30 healthy controls (of those 9 were female).

^{aj}REST: reference electrode standardization technique.

^{ak}EO: eyes opened.

^{al}EC: eyes closed.

^{am}HFD: Higuchi fractal dimension.

^{an}SASI: spectral asymmetry index.

^{ao}EEG: electroencephalogram.

^{ap}PCA: principal component analysis.

^{aq}MP: multilayer perceptron.

^{ar}RF: random forest.

Ahmadlou et al [

Ahmadlou et al [

In conclusion, we cannot state that all the mentioned studies provide sufficient information for replication, as it is clearly not the case with Bairy et al [

Summary of the abovementioned comparisons of analysis of signals in the literature.

Analysis of signal | Number of electrodes | Subbands | Filtering | Method of analysis | Feature extraction |

Common | 1, 3, or 7 (prefrontal) | Standard subbands | Preprocessing on site | Fourier and its derivatives | ^{a} |

Recommended | 19+ (all electrodes) | Broadband | Minimal preprocessing | Fractal and nonlinear | PCA^{b} or GA^{c} |

^{a}ANOVA: analysis of variance.

^{b}PCA: principal component analysis.

^{c}GA: genetic algorithm.

Summary of the abovementioned comparisons with regard to the classifications applied.

Classification | Sample size | Data collection | Feature selection | Validation | Model | Accuracy |

Common | 12-40 | 1 site | Spectral analysis | Often missing | SVM^{a} |
Typically >95% or 99% |

Recommended | >50-100 | Multiple sites/collaborative (possible extraction from MRI^{b} sets) |
Nonlinear analysis | Internal plus external validation on unseen data | LASO^{c}, embedded regularization |
ROC^{d} curve application/more realistic results |

^{a}SVM: support vector machine.

^{b}MRI: magnetic resonance imaging.

^{c}LASO: the name of the algorithm; a type of linear regression that uses shrinkage.

^{d}ROC: receiver operating characteristic.

Studies have also been published during the same time interval (2008-2019) based on EEG registration, but unlike the previously mentioned work, they opted to use a stimulus (so, not resting-state EEG), a sound stimulation, or evoked response potentials (ERPs). Therefore, we briefly discuss their results. Kalatzis et al [

A study from 2014 attempted to predict the depression treatment response [

Similar to Shahaf et al [

Erguzel et al [

Most of the publications included in our review presented high accuracy in classifying individuals with depression and healthy participants based on their resting-state EEG, although they utilized various combinations of features and machine learning models. Although direct comparison is challenging, the common denominator for all presented studies can be summarized as a comparison of the methodological steps that are inevitable in this kind of research, in which certain features, previously found to be characteristic for depression, were used to feed classifiers of their choosing.

Several approaches may be used to examine the changes in the complexity of the EEG characteristic of depression. Researchers have reached a consensus that depression is characterized by high EEG complexity, compared with healthy peers [

The classification of patients diagnosed with depression and HCs can be considered as a first step in exploring the potential for prediction. Differentiation between episode and remission is also possible [

Whelan and Garavan [

Although it has been shown that small sample sizes and a lack of external validation lead to unwarranted optimism, most published research does not embrace these principles as standard practice [

A minimum rate of 10 cases per predictor is common [

In conclusion, when discussing the importance of maintaining completely separate training and test subsets, Whelan and Garavan [

detrended fluctuation analysis

Diagnostic and Statistical Manual of Mental Disorders, fourth edition

decision tree

discrete wavelet transform

electroencephalogram

evoked response potential

functional magnetic resonance imaging

genetic algorithm

healthy control

Higuchi fractal dimension

independent component analysis

kernel eigen-filter-bank common spatial pattern

Katz fractal dimension

K-nearest neighbors

laminarity

linear discriminant analysis

largest Lyapunov exponent

leave-one-out cross-validation

linear regression

Lempel-Ziv complexity

major depressive disorder

magnetic resonance imaging

naïve Bayes

naïve Bayes classifier

principal component analysis

poststroke patients with depression

patients with ischemic stroke but no depression

receiver operating characteristic

repetitive transcranial magnetic stimulation

relative wavelet energy

sample entropy

spectral asymmetry index

support vector machine

transcranial direct current stimulation

wavelet packet decomposition

Part of this work has been supported by the projects Social Big Data – CM (S2015/HUM-3427) and RISEWISE (H2020-MSCA-RISE-2015-690874). Since VL and JP are co-authors of this work, they can be contacted at vlopezlo@ucm.es and jpavon@ucm.es.

None declared.