This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
Facial expressions require the complex coordination of 43 different facial muscles. Parkinson disease (PD) affects facial musculature leading to “hypomimia” or “masked facies.”
We aimed to determine whether modern computer vision techniques can be applied to detect masked facies and quantify drug states in PD.
We trained a convolutional neural network on images extracted from videos of 107 self-identified people with PD, along with 1595 videos of controls, in order to detect PD hypomimia cues. This trained model was applied to clinical interviews of 35 PD patients in their on and off drug motor states, and seven journalist interviews of the actor Alan Alda obtained before and after he was diagnosed with PD.
The algorithm achieved a test set area under the receiver operating characteristic curve of 0.71 on 54 subjects to detect PD hypomimia, compared to a value of 0.75 for trained neurologists using the United Parkinson Disease Rating Scale-III Facial Expression score. Additionally, the model accuracy to classify the on and off drug states in the clinical samples was 63% (22/35), in contrast to an accuracy of 46% (16/35) when using clinical rater scores. Finally, each of Alan Alda’s seven interviews were successfully classified as occurring before (versus after) his diagnosis, with 100% accuracy (7/7).
This proof-of-principle pilot study demonstrated that computer vision holds promise as a valuable tool for PD hypomimia and for monitoring a patient’s motor state in an objective and noninvasive way, particularly given the increasing importance of telemedicine.
Facial expressions are an essential component of interpersonal communication [
Parkinson disease (PD) is a neurodegenerative disease that produces a gradual and generalized loss of motor functions, including the ability to contract facial muscles during spontaneous and voluntary emotional expressions [
Disease progression does not seem to produce uniform facial masking across people. Studies of differential deficits in specific muscles [
In experimental settings, the quantification of masked facies in patients with PD has been traditionally performed with manual scoring [
Computational methods based on known face components (eyes, mouth/lips, action units, skin, and shape) have been proposed [
Although it is well accepted that PD produces a generalized loss of the ability to produce facial expressions, it is unclear how this deficit evolves with disease progression and what are the effects of dopamine replacement therapy on masked facies [
The neural network model was trained using two data sets of faces, comprising people with PD and controls. The first was the YouTube Faces Database [
To preprocess the videos, faces were extracted from each frame of each video. Thereafter, each image was converted to grayscale, the intensity was normalized (mean=0.51, standard deviation=0.25), and the image was resized to a standardized 224 × 224 pixels. The neural network was trained using stochastic gradient descent.
After training, for each new video in the test set, the algorithm assigned each frame a score between 0 and 1, based on the degree of hypomimia that was detected by the algorithm in that frame. The scores of all of the frames of a video together formed a density distribution for that video (
In order to classify each video, we needed to characterize this density distribution for each video as a single number. To do so, we took the fifth quantile (Q) of that video’s frame score density distribution (other quantiles can be used without loss of generality as discussed in the Results section). A video that exhibits low hypomimia should have a positively skewed distribution, as the bulk of the probability mass will be closer to 0, and therefore, will have a lower value of Q. In contrast, a video that exhibits high hypomimia should have a negatively skewed distribution, with the bulk of the probability mass closer to 1, and therefore, will have a higher value of Q. Thus, the value of Q can be used to characterize how strongly hypomimia is detected in a given video, by representing how far along the 0 to 1 continuum is required to achieve 5% of the video’s frames. Using this metric, we hypothesize that control videos will have a relatively lower Q (more frames concentrated toward 0) and PD videos will have a relatively higher Q (more frames concentrated toward 1;
The preprocessing pipeline for the input videos. Faces are extracted, greyscaled, and normalized. Then, each frame in the video is assigned a probabilistic classification assignment from 0 to 1 representing the degree of hypomimia. Thus, each video is represented by a probability distribution of frame scores. SGD: stochastic gradient descent.
Video scoring. To classify a video, a probability distribution is created for all of a video’s frames, and the fifth percentile of the distribution is defined as Q. A video that has a Q value above T (ie, closer to 1 or more evidence of hypomimia) is categorized as PD hypomimia; a video with a Q value below T (ie, closer to 0 or less evidence of hypomimia) is categorized as Control. PD: Parkinson disease.
Finally, a classification threshold T was selected. Any video that had a Q value lower than the threshold T was classified as not PD (ie, 0). Any video that had a Q value higher than the threshold T was classified as PD (ie, 1). T was selected such that it maximized classification accuracy in the testing set and was validated using the separate held-out validation set consisting of the Alan Alda videos.
The difference in video scores between the PD and control groups was tested on a set of 54 videos (middle-aged and older patients, 37 males). Of these, half (n=27) featured people with self-identified PD, and the other half (n=27) featured people without PD (controls). The control videos were selected to include people who self-reported having other neurological or psychiatric disorders, with the following breakdown: 18 healthy people, four people with depression, one person with posttraumatic stress disorder, one person with traumatic brain injury, one person with bipolar disorder, one person with schizophrenia, and one person with chronic back pain. For the videos that were categorized as PD or other disorders, identification was performed based on the uploader’s self-report (ie, the title of the video), not a clinical evaluation. However, many of the videos were created by disease associations, clinicians, academics, documentaries, or celebrities who publicly revealed their diagnoses, providing some degree of confidence of the reliability of the self-report.
The Tufts Clinical data set consists of 35 participants (mean age 68 years, SD 8 years; 23 males and 12 females; mean total UPDRS-III score 25, SD 13) with idiopathic PD. The protocol was run at Tufts Medical Center in Boston, Massachusetts and was approved by the Tufts Health Sciences Campus Institutional Review Board (IRB #12371) (the complete study design [
Only 33 patients participated in a clinical interview in both their
All 35 patients performed the UPDRS-III scripted tasks (including pronation-supination, finger tapping, and walking) and simulated activities of daily living [
PD medication state (
The UPDRS-III Facial Expression item, which rates the impairment of facial expressions, was used as the reference outcome variable in the present analyses. We characterized a strictly positive UPDRS-III Facial Expression score (ie, a rating greater than 0) as a positive PD classification by the examining neurologist. Additionally, we characterized a strictly positive difference of the UPDRS-III Facial Expression score between
Participants should have less dysfunction (and a lower UPDRS score) when they are in the
The longitudinal data set consisted of seven videos of public appearances of Alan Alda from 1974 to 2019 (age 38-83 years), in which he was engaged in public speaking. Alan Alda is an actor, director, and screenwriter who was diagnosed with PD in 2014. This data set consists of four videos before diagnosis and three videos after diagnosis, and is used to evaluate the present algorithm’s ability to quantify hypomimia. In this data set, a mean of 9642 frames per video (5.3 minutes) was extracted and analyzed by the algorithm. In these interviews, Mr Alda is recorded in diverse poses and lightning conditions, making the longitudinal data set qualitatively similar to the training data set and Tufts Clinical data set.
As expected for the PD videos, a greater proportion of frames were classified as “PD hypomimia” than were for the control videos. The skewness of the PD subject video distributions was significantly smaller than that of the control videos (one-tailed Mann-Whitney
Example test set of PD and control distributions for four videos from the test data set. Two control videos (in blue) and two PD videos (in red). The distributions of the PD videos have higher weight on score values closer to 1 (negatively skewed) compared to the control videos, thereby demonstrating greater incidence of hypomimia. PD: Parkinson disease.
We experimentally quantify this difference in skewness by selecting the fifth quantile Q, which becomes the video score. The greater the incidence of hypomimia in the frames of a given video, the higher the quantile Q.
To provide a baseline accuracy measure, two professional neurologists rated each video in the test data set on the UPDRS-III Facial Expression score (score between 0 and 4). The neurologists performed the evaluation on the video, not an in-person clinical examination, and were told just to focus on the Facial Expression score and attempt to avoid being influenced by other cues present in the subject’s behavior, to the extent possible. Using this scoring system, one neurologist’s ratings produced an AUROC of 0.64 and the other neurologist’s ratings produced an AUROC of 0.79. Averaging both neurologists’ UPDRS-III Facial Expression scores produced an AUROC of 0.75. These scores were taken as an approximation of baseline classification accuracy that could be achieved using expert human raters. It is important to note, however, that this accuracy is an approximation and a true in-person clinical rating would incorporate substantially more information than just the UPDRS-III Facial Expression score.
Test set AUROC (PD vs Control) as a function of the chosen video distribution quantile to use as the “Q” threshold. A wide range of quantiles achieves AUROC > 0.7 (15th quantile and below). The fifth quantile, selected as our video score threshold, is shown in green. AUROC: area under the receiver operating characteristic curve; PD: Parkinson disease.
Finally, we assessed the performance of our algorithm on the held-out validation sets, after being trained on the training set and accuracy maximized on the testing set. The first validation set was the Tufts Clinical data set used to assess hypomimia changes associated with the drug state. For each patient, we extracted and analyzed all frames in the video. Our goal was to quantify PD hypomimia for each patient’s visit and see if
To test if the algorithm was able to correctly classify PD hypomimia, we computed the score of each clinical interview video to see if it exceeded the decision threshold T. If the score was above the threshold, the video was categorized as PD. The algorithm detected PD in 76% (25/33) of the
To quantitatively evaluate the difference between the
To quantify the sensitivity of our analysis, we provided a plot highlighting the differences in detection as given by different thresholds (the x-axis is scaled by T). Thresholds to separate the
On-off sensitivity of the Tufts data set. On-off classification for a clinical visit is displayed as a function of threshold. On-off differences are far more subtle than PD versus Control differences (on which the model was originally trained). The red line shows the percentage of participants for which the neurologists rated the UPDRS-III Facial Expression score higher in the off state than in the on state. PD: Parkinson disease; UPDRS: United Parkinson Disease Rating Scale.
We sought to retrospectively validate the algorithm’s ability to characterize PD symptomology in an individual longitudinally. The algorithm was applied to seven interview videos featuring Alan Alda (officially diagnosed with PD in 2014) from 1974 to 2019. There was an increase in the algorithm’s PD classification before PD diagnosis to after diagnosis. Indeed, all videos before PD diagnosis were below the optimal threshold T for positive classification, and all videos after diagnosis were well above the threshold, highlighting the fact that the algorithm was able to capture hypomimia cues. Finally, we included a confidence interval (as given by the third and seventh video distribution quantiles) associated with the video scores (
Validation with Alan Alda interviews. All videos after Mr Alda’s PD diagnosis are above the threshold T, whereas videos before his PD diagnosis are below T (horizontal red line), indicating that the algorithm is sensitive to PD hypomimia symptomatology. Dots show the confidence interval (third quantile and seventh quantile of the video density distribution). PD: Parkinson disease.
In this proof-of-principle pilot study, we used deep learning to detect PD hypomimia from videos of people with and without PD. Our method was also able to detect the effect of dopamine replacement medication in participants during their clinical visit and to analyze the progression of symptoms in the actor Alan Alda before and after his diagnosis of PD.
A well-established method to identify facial expressions was proposed by Ekman and Friesen [
An alternative approach to investigate the progression of PD as a function of the ability to move specific facial muscles is to use electromyography [
There are noteworthy limitations to our work. The training data set is limited, as in particular, it did not include people in the full range of relevant ages affected by PD (early onset PD patients were not represented), which constrains generalizability. The effect of dopaminergic medication was not taken into consideration when training the model, as all videos in the training data sets were classified as either PD or control, with no consideration of
Our algorithm may serve as a nonclinical marker for PD hypomimia and
With a shift toward a greater role of telemedicine, an automated assessment of hypomimia could serve as a screening tool for parkinsonism and as a nonobtrusive objective score to assess
area under the receiver operating characteristic curve
levodopa
Parkinson disease
United Parkinson Disease Rating Scale
The authors acknowledge the help provided by Dr John Gunstad in the initial phase of the project (encouraging our efforts and connecting us with clinical specialists in Parkinson disease) and the feedback provided by Dr Ajay Royyuru over several iterations of the results. The authors would also like to thank Alan Alda for allowing us to display the outcome of our algorithm on some of his publicly available videos.
None declared.