This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
A set of face stimuli, called the Umeå University Database of Facial Expressions, is described. The set consists of 30 female and 30 male models aged 17–67 years (M = 30.19, SD = 10.66). Each model shows seven different facial expressions (angry, surprised, happy, sad, neutral, afraid, and disgusted). Most models are ethnic Swedes but models of Central European, Arabic, and Asian origin are also included.
Creating and validating a new database of facial expressions that can be used for scientific experiments.
The images, presented in random order one at a time, were validated by 526 volunteers rating on average 125 images on seven 10-point Likert-type scales ranging from “completely disagree” to “completely agree” for each emotion.
The proportion of the aggregated results that were correctly classified was considered to be high (M = 88%).
The results lend empirical support for the validity of this set of facial expressions. The set can be used freely by the scientific community.
There is a wealth of published research into face perception, face processing, and facial expressions using images of facial expressions of emotions [
The human face is an integral part of daily life. Facial muscles allow a wide range of expressions and functions [
Ekman and Friesen published their pioneering Pictures of Facial Affect (PFA) in 1976, which became the most frequently used set in research [
The understanding and perception of emotions has been shown to be more accurate if those that are evaluating emotional expressions have the same ethnicity and national and regional background as the expressers. This may be because people of different ethnicities develop different nuances in their expressions. However, when different cultural groups spend more time together, the in-group advantage decreases [
The NimStim Set of Facial Expressions (NimStim) [
This project has attempted to address problems identified in previous sets of facial expressions and validation studies. The aim of producing the Umeå University Database of Facial Expressions was to create a database for Internet-based research, containing a large number of images across a spectrum of age, ethnicity, and gender.
This database has several advantages. First, it contains a large number of color images—a total of 424, posed by 60 models (2720x4080 pixels). The models express the most consistently recognized facial expressions of emotions, which are anger, surprise, happiness, sadness, fear, and disgust [
The aim of this validation study was to examine the extent to which facial expressions as depicted in the images were correctly interpreted as the intended emotion. It was done over the Internet in order to recruit participants with as great a range as possible of age, gender, and ethnicity. Swedish law does not, however, permit the registration of individual ethnicity. However, researchers based in a country without this restriction on reporting of individual ethnicities are free to do so after inspection of the photographs.
The genders of both model and rater may influence evaluation of facial expressions [
Each image was evaluated by participants. Nuanced answer options were used in the validation study in order to reduce the risk of influencing responses to a specific expression. Response scales with fixed response options can be problematic, as different response scale formats may influence the results obtained [
We hypothesized that the Internet-based validation study would provide sufficient data to support the validity of the Umeå University Database of Facial Expressions.
Data were collected from 526 participants. The mean age was 37.7 years (18–73, SD =13.0). 70% (369/526) were female and 30% (157/526) were male. Participants were recruited by disseminating information about the study via the local Swedish newspaper. All those who volunteered were allowed to take part in the study, and no financial compensation or remuneration was given.
The stimuli were 424 facial images from the Umeå University Database of Facial Expressions. A total of 60 subjects participated as amateur models (30 female, 30 male; 17-67 years old; M=30.19, SD=10.66). Most of these models were ethnic Swedes, but models of Central European, Arabic, and Asian origin were also included. During the photography session, models were instructed to display seven different facial expressions (angry, surprised, happy, sad, neutral, afraid, and disgusted). Instructions on how to make the facial expressions were based on the work of Ekman [
The photo shoots produced over 8,000 images. The best image of each expression from each model was chosen to be validated empirically. However, a clear decision could not be made in four instances, and these images were therefore added to the validation phase, making a total of 424.
The validation procedure took place on the Internet. Before obtaining access to the images, the potential participant had to register his/her age, gender, and email address. A confirmatory email, including a unique login link, was sent to the registered email address, ensuring that all participants had registered a valid email address. Instructions to participants were to sit alone in a quiet, private setting and base the evaluations on their own opinion. Participants evaluated the images at their own pace and were free to evaluate as many images as they wished. They were allowed to discontinue the evaluation at any time and were free to return and continue at another time during a two-week period in October 2011. Images were randomly presented to each participant. However, each of the 424 images was presented only once. 526 participants started the validation process, rating an average of 125.5 out of 424 faces (SD=137.4).
Each of the 424 images (320x480 pixels, color) was presented on its own with the text “This person seems to be…” above each image. As shown in
We used a binary logistic model (specified through generalized linear equations), and variance–covariance for all models was assumed to be block diagonal but independent within a block defined by individual, which implies that we assumed that the scoring of one image did not affect the score given by that individual to the next randomized image.
The seven outcome variables were defined as 1/0 for each “true” emotion. The independent factors were gender and age of the rater and model and the rating score for the seven emotions. We studied the adjusted association between each outcome and the 11 independent factors. We present the estimated odds ratios and their 95% Wald confidence intervals (CIs) and their significance (see Supplemental Tables 1–7 in
We considered an image to be correctly classified if the highest score was given to the emotion corresponding to the true emotion. For example, if the emotion “sad” was scored 7 and the other emotions between 0 and 6 points, then sad would be counted as the response. That response would then be compared to the intended emotion when calculating the hit rate. In addition, in order to obtain a measure of the reliability of the interpretation, we also calculated the sum of the scores given to emotions that did not correspond to the true emotion, and the number of emotions rated.
The validity measure (proportion interpreted correctly) was performed for every image. The data for these 424 individual images are presented separately on the Internet database. The proportion correctly interpreted for each portrayed emotion is, however, shown in
As shown in
A screen shot of the web-based validation.
Summary of the proportion of images correctly perceived, number of unintended emotions scored and the total scores given to unintended emotions.
Emotion expressed | Number of |
Proportion correctly perceived (%) | Number of unintended emotions scored |
Total score given to unintended emotions (0–9) | ||||||
Meana | Minb | Maxc | Meana | Minb | Maxc | Meana | Minb | Maxc | ||
Anger (n=9581) | 61 | 94 | 72 | 100 | 0.25 | 0.07 | 0.73 | 0.87 | 0.12 | 2.57 |
Surprise (n=9357) | 60 | 94 | 76 | 99 | 0.33 | 0.14 | 0.66 | 1.25 | 0.42 | 3.26 |
Happiness (n=9721) | 62 | 98 | 85 | 100 | 0.13 | 0.05 | 0.44 | 0.38 | 0.08 | 1.65 |
Sadness (n=9393) | 61 | 78 | 25 | 98 | 0.54 | 0.14 | 1.23 | 2.41 | 0.45 | 6.55 |
Neutral (n=9406) | 60 | 91 | 56 | 99 | 0.38 | 0.14 | 0.94 | 1.21 | 0.36 | 4.21 |
Fear (n=9211) | 60 | 73 | 39 | 95 | 0.65 | 0.33 | 1.08 | 3.15 | 1.43 | 6.32 |
Disgust (n=9325) | 60 | 90 | 60 | 100 | 0.36 | 0.10 | 0.86 | 1.42 | 0.22 | 4.23 |
Total (n=65994) | 424 | 88 | 25 | 100 | 0.38 | 0.05 | 1.23 | 1.52 | 0.08 | 6.55 |
Note: An image was considered to be correctly classified if the highest score was given to the emotion corresponding to the true emotion.
a Mean proportion of correct perception (n=9211–9721).
b The value for the image with the lowest proportion of correct perception.
c The value for the image with the highest proportion of correct perception.
Confusion matrix for images of expressed emotion and rater response (only scores 7–9 shown).
Expressed emotion | Rater response (7–9) (%) | ||||||
Anger | Surprise | Happiness | Sadness | Neutral | Fear | Disgust | |
Anger | 74.6a | 0.4 | 0.2 | 0.9 | 0.5 | 0.8 | 0.9 |
Surprise | 0.2 | 81.7a | 1.0 | 0.2 | 0.5 | 3.7 | 0.3 |
Happiness | 0.2 | 0.3 | 92.5a | 0.3 | 0.5 | 0.2 | 0.2 |
Sadness | 1.1 | 1.0 | 0.4 | 55.6a | 5.9 | 3.9 | 2.3 |
Neutral | 1.0 | 0.8 | 0.4 | 1.7 | 81.6a | 0.6 | 0.1 |
Fear | 2.6 | 14.2 | 0.6 | 0.9 | 0.5 | 55.5a | 1.7 |
Disgust | 2.2 | 0.9 | 0.3 | 2.5 | 0.2 | 0.8 | 71.8a |
a Intended emotion.
The odds ratios (presented in the supplemental tables in
The most noteworthy results relating to the four background variables (model age, model gender, rater age, and rater gender), presented in the supplemental tables in
Female facial models were more frequently significantly associated with three of the intended expressions in comparison with male facial models. Those were the expressions anger (OR=1.2,
The purpose of this study was to present a database of facial expressions and the results of an Internet-based validation study. The database contains 424 color images of models across a spectrum of age, ethnicity, and gender expressing a variety of different emotions. The database is freely available for scientific experiments both online and offline.
The validity of the database was based on how accurate the raters were in identifying the expressions in the presented images. Scores were generally high. The overall mean proportion of this database that was correctly interpreted was 88%. The corresponding values are 79% for NimStim [
The results did not show any consistent advantage related to age or gender in either the models or in the validating participants. There were significant differences when the seven expressions were studied individually, but the stronger and weaker association varies across the four background variables. Hall and Matsumoto found that women made more correct interpretations than men when multiple scales were used [
The results of previous studies [
The results of our study show that facial expressions of people ≥46 years showing anger, fear, and sadness were less reliably identified than those posed by younger faces aged ≤25 years. Faces of participants aged 26-45 years portraying anger, neutral, and disgust were also less reliably identified than the same expressions in younger people aged ≤25 years. This is consistent with the findings of Ebner et al [
The validation study was Internet-based. A large number of participants from different age groups evaluated the images, which provides this study with a more heterogeneous population of raters than previous studies [
As response scales with fixed response options can be problematic, Russell [
Some facial expressions of emotion are easier to identify correctly than others. In validation studies, happy facial expressions are usually recognized more reliably than negative facial expressions [
Sadness and fear had the lowest proportion of correct identification, also consistent with previous research [
The method of creating facial expressions can affect their interpretation. Currently existing databases have been produced by instructing the photo shoot models in two different ways. One is to instruct the models to move particular muscle groups while making the facial expressions [
One advantage of asking models to move particular muscle groups is that it creates uniform expressions. The disadvantage is that the ecological validity may be affected [
As we wanted models to make authentic expressions and still maintain uniformity within the same emotional expressions, the instructions given to models were a combination of the instructions used in previous studies. The models in the Umeå University Database of Facial Expressions were instructed to make the expressions as they saw fit, to look at pictures of facial expressions, and to move certain muscle groups.
The database has, however, a number of shortcomings. First, as the validation study was Internet-based, it was difficult to control for the authenticity of participant responses and other contextual variables, eg, how closely participants followed the instructions. However, the requirement for personal information such as name, age, gender, and email address should have decreased the risk of non-valid answers. In addition, the relatively large number of participants (n=526) would have reduced the impact of deliberately false responses. The lack of remuneration also meant there was no financial reward in providing false responses.
Second, models may have validated their own images, which may have inflated the proportion of correct identification in the database. However, the number of models who may have validated their own images was small in relation to the large number of ratings made for each image.
Third, there may have been a subjective interpretation of the meaning of the response scales. The scale steps between 0 and 9 could have been interpreted as a measure of intensity, authenticity, or purity. However, giving the participants the opportunity to rate every image on a continuum and to rate for several expressions, provided important information about each image. Valuable information about the extent to which each image was rated for expressions other than the one intended is available online, as well as the proportion correctly identified for each image.
A fourth limitation is that a forced-choice scale was used to calculate the proportion of correct identification. The response scale that received the highest score was regarded as the respondent’s answer. And since there was no “none of the above” option included, this has probably resulted in a higher proportion of correct identification than if this option had been included.
A fifth weakness is that no member of the research team instructing the models during the photo shoots, and selecting images for validation was certified according to the Facial Action Coding System (FACS) [
Finally, the instruction not to wear make-up was not followed by all participants, which may bias the interpretation of the images. However, the resulting images may more closely resemble the facial expressions seen in real life.
The goal of creating the Umeå University Database of Facial Expressions was to provide the scientific community with an online database for scientific experiments. The database consists of a large and contemporary set of images showing models across a spectrum of age, ethnicity, and gender. The Internet-based validity study obtained a larger number of ratings for each image compared to previous validation studies, and it has a higher proportion of correct identification compared to many existing databases. However, the validity of the Umeå University Database of Facial Expressions needs to be tested by further validation studies of similar or different design. Finally, we invite the scientific community to help expand the database by allowing inclusion of additional models to provide a more representative sample of populations. Obviously any added faces would first need to be validated to ensure high standards.
Supplemental Tables 1-7. Factors associated with images.
The study was funded by a grant from the Swedish Council for Social Research and the Swedish Council for Work Life Research (2009-0222). The funders had no role in study design, data collection, and analysis, decision to publish, or preparation of the manuscript. Alexander Alasjö is thanked for excellent web programming and Hans Pettersson for statistical help.
None declared.