This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
High-resolution medical images that include facial regions can be used to recognize the subject’s face when reconstructing 3-dimensional (3D)-rendered images from 2-dimensional (2D) sequential images, which might constitute a risk of infringement of personal information when sharing data. According to the Health Insurance Portability and Accountability Act (HIPAA) privacy rules, full-face photographic images and any comparable image are direct identifiers and considered as protected health information. Moreover, the General Data Protection Regulation (GDPR) categorizes facial images as biometric data and stipulates that special restrictions should be placed on the processing of biometric data.
This study aimed to develop software that can remove the header information from Digital Imaging and Communications in Medicine (DICOM) format files and facial features (eyes, nose, and ears) at the 2D sliced-image level to anonymize personal information in medical images.
A total of 240 cranial magnetic resonance (MR) images were used to train the deep learning model (144, 48, and 48 for the training, validation, and test sets, respectively, from the Alzheimer's Disease Neuroimaging Initiative [ADNI] database). To overcome the small sample size problem, we used a data augmentation technique to create 576 images per epoch. We used attention-gated U-net for the basic structure of our deep learning model. To validate the performance of the software, we adapted an external test set comprising 100 cranial MR images from the Open Access Series of Imaging Studies (OASIS) database.
The facial features (eyes, nose, and ears) were successfully detected and anonymized in both test sets (48 from ADNI and 100 from OASIS). Each result was manually validated in both the 2D image plane and the 3D-rendered images. Furthermore, the ADNI test set was verified using Microsoft Azure's face recognition artificial intelligence service. By adding a user interface, we developed and distributed (via GitHub) software named “Deface program” for medical images as an open-source project.
We developed deep learning–based software for the anonymization of MR images that distorts the eyes, nose, and ears to prevent facial identification of the subject in reconstructed 3D images. It could be used to share medical big data for secondary research while making both data providers and recipients compliant with the relevant privacy regulations.
It is becoming important to handle and share big data in the health care field, and accordingly, there is a big trend to share and protect individual patient data for secondary research [
High-resolution magnetic resonance (MR) images of the head risk exposing a subject’s face, which can be regarded at the level of photography by facial reconstruction [
Facial image anonymization is not fully conducted in public medical image repositories, while some public databases even provide the original images. For example, the Alzheimer's Disease Neuroimaging Initiative (ADNI) [
Previous approaches to anonymizing faces in medical images usually remove the entire facial region using a voxel classifier and mask the brain to preserve the brain image using a skull stripping technique or a convex hull [
The aim of this study was to develop software that can selectively distort the eyes, nose, and ears, which are the main factors for identifying a face, and make a robust anonymization algorithm that can be used on various MR images.
Process of (A) developing the facial feature detector, which is a deep learning model that can detect the eyes, nose, and ears in 3-dimensional (3D) magnetic resonance (MR) images, and (B) distorting the facial features in nonanonymized cranial MR images.
The Neuroimaging Informatics Technology Initiative (NIFTI) and Digital Imaging and Communications in Medicine (DICOM) formats of MR imaging (MRI) files were collected from the ADNI database (Magnetization Prepared RApid Gradient Echo [MPRAGE] scans; voxel size: 1.0 x 1.0 x 1.2 mm; inplane resolution: 1.0 x 1.0 mm2; interslice spacing: 1.2 mm; field of view [FOV]: 240 x 256 x 160 mm). A total of 240 NIFTI format files were used in the creation of the deep learning model: 144, 48, and 48 for the training, validation, and test sets, respectively.
Other NIFTI-format MRI files were collected from the OASIS-3 database for use as the external test set. The 100 MR images differed in orientation, resolution, and intensity from those in the ADNI data (MPRAGE scans; voxel size: 1.0 x 1.0 x 1.0 mm; FOV: 176 x 256 x 256 mm).
In general, supervised learning requires pairs consisting of the input object and the desired output value. In this study, the input object is a 3D cranial MR image, and the output values are regions containing the eyes, nose, or ears (the facial features). We manually drew labels that were the same as the desired output values in all of the ADNI and 20 OASIS-3 images using the AFNI program [
Three image augmentations were performed per 1 image in the training set. The augmented images were randomly transformed and then used for model training. As a result, 576 images per epoch were trained. Data augmentation was performed by filtering Gaussian noise, rotating from –15° to +15° around each axis in the image, randomly flipping each axis, randomly transposing between the axes, shifting each axis from 0 to 0.10, shearing each axis from 0 to 0.20, and resizing the image from 0.90 to 1.10 times the original size. After executing 1 image augmentation per original image, the validation set was validated for a total of 96 images per epoch.
The deep learning model was trained with the manually labeled data. We created a deep learning model that can generate labels similar to manually drawn labels on the regions of the eyes, nose, and ears from cranial MR image input. The basic structure of our deep learning model is attention-gated U-net [
In machine learning, the “loss” or “error” variable is set to achieve the goals through the training of the model. In addition, the “metric” variable indicates how much we have achieved the goals through the model training. A machine learning model has metrics to indicate the achievement rate and is trained to reduce loss.
In this study, the metric to determine whether the model can make labels similar to the manually drawn labels is the Dice coefficient, which is double the area of overlap divided by the total number of pixels or voxels in both images: It returns 1 if the predicted regions of the model exactly match the correct answers from the labels and 0 if the regions do not overlap. When the region of the label is
This can also be expressed as:
where
The loss function in our model was: 1 – the Dice coefficient + 0.1 × categorical cross-entropy. Categorical cross-entropy is the loss function mainly used in multiclass classification, and it induces our model to learn to distinguish whether a specific pixel is from the eye, nose, ear, or another area. This model computes the loss function between the correct answer labels and the predictive labels and is trained in the direction of loss reduction (toward zero).
The model calculated Dice coefficients for 96 images in the validation set for each epoch. After 5 epochs at the highest metric score, learning was stopped when there was no further improvement.
Here, we describe the process of image anonymization based on the output of the facial feature detector. The deep learning model was trained by identifying the eyes, nose, and ears (5 regions), after which the program proceeded with the image anonymization process.
Identification of the eyes, nose, and ears was automatically conducted on different images according to each feature by the deep learning algorithm. The detection region for the eyes is a spherical area covering the eyeball and the skin around the eye. The process of anonymizing the surface of the eye consists of 2 steps. First, based on the detection regions for the eyes, 2 boxes capable of covering the periocular area (the skin around the eyes) are formed. Second, the contour of the face surface was obtained within the range of the boxes, and a range of ±2 voxels along each axis from that surface was modified to the same intensity value. The nose was processed by removing the image and setting the intensity of the voxels to 0 in the area where the binding box for the detected region was doubled to each side. The detection region for the ears is the protruding part called the auricle. For the anonymization of the ears, random values were assigned to each voxel of the detection regions of the ears, and those values were generated in the noise range of the air in the MR image.
In the case of the medical images in DICOM format, it is necessary to anonymize the personal information in the header, and so we carried this out on the 20 DICOM headers using the Deface program (the DICOM headers are listed in
In the 23rd epoch, the average Dice coefficient of the validation set was the highest at 0.821. In the 28th epoch, the training of the model was stopped because the Dice score of the validation set did not improve. The average Dice score of 576 images trained over 23 epochs was 0.801. The average Dice scores using the test sets comprising 48 ADNI and 20 OASIS-3 images were 0.859 and 0.794, respectively. The Deface program was applied to the ADNI data, but anonymization was performed on the OASIS-3 data without any additional manipulation.
We applied the Deface program to 48 ADNI images and 100 OASIS-3 images as the test sets and then confirmed the accuracy of distorting the facial features in the 3D reconstructions of the face (
3-dimensional (3D) volume rendering of magnetic resonance images (MRI), showing the raw and distorted images from the Alzheimer's Disease Neuroimaging Initiative (ADNI) .
The Deface program was used to validate the de-identification performance by Microsoft Azure’s facial recognition artificial intelligence service (Face detection_01 model) [
In this study, we developed a program that can recognize the eyes, nose, and ears in MR images by applying artificial intelligence, after which they were blurred. We implemented the facial feature detector based on the 3D U-net deep learning model to automatically detect the eyes, nose, and ears. The reason for the development of this anonymization program is that 3D facial reconstruction of high-resolution MRI can show an individual’s similarity to a facial photograph [
Although the training data for the deep learning model comprised 144 images from ADNI, we introduced data augmentation to achieve robust performance in other MRI standards (
We applied different processes to blur each facial feature location. The eyes are close to the frontal lobe, so they were distorted only along the surface. The intensity of the pixels was converted to a value similar to the surface of the skin to make it appear on the surface when 3D rendering. Since the nose is usually the most protuberant part of the face, the area that covers the entire range of the nose was deleted to make it impossible to infer the original shape of the nose. The 3D box space containing the entire volume of the nose was removed to prevent recognition via the nose shape. The ears were only segmented by the facial feature detector, so only the corresponding regions were distorted to preserve the brain image. If regions such as the shape of the ears are simply removed, the shape of the ears may be revealed by the noise from air in the MR image. We reduced the possibility of recognition by replacing the ear regions with generated random values within the air noise range of the input MR image.
3D facial reconstruction of high-resolution MRI can be generated by a freeware MRI viewer [
Previous studies have applied techniques to remove the entire face regions, and the evaluation of anonymization was via direct human observation of face landmarks [
Among the facial features, wrinkles or the mouth can be identifiers but were not considered in this study. To train the deep learning model, we needed to manually draw labels that mark facial features. We are planning to construct a training dataset that takes into account additional facial features for further study. Once labeled training data comprising any desired facial feature have been constructed, our facial feature detector can evolve through deep learning.
Patients’ faces can be reconstructed from high-resolution cranial MR images at the photograph level, so there is a risk of infringing the personal information rules prescribed by HIPAA and GDPR when sharing data. Hence, we suggested a method to perceive the facial features in MR images via deep learning technology to specifically blur certain facial features. Users can create anonymization regions that blur the desired parts of the patient’s face (eyes, nose, or ears), which helps provide data for secondary research without violating relevant personal information regulations.
Information Protection Regulations.
Deep learning model structure.
DICOM header with personal information.
Face recognition test.
two-dimensional
three-dimensional
Alzheimer's Disease Neuroimaging Initiative
Digital Imaging and Communications in Medicine
field of view
General Data Protection Regulation
Health Insurance Portability and Accountability Act
Magnetization Prepared RApid Gradient Echo
magnetic resonance
magnetic resonance imaging
Neuroimaging Informatics Technology Initiative
Open Access Series of Imaging Studies
This research was supported by an Institute for Information & Communications Technology Promotion (IITP) grant funded by the Korean government (MSIT) (2018-0-00861), Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2017R1D1A1B03030713), and a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI18C2383). We thank the Biomedical Computing core facility at the ConveRgence mEDIcine research cenTer (CREDIT), Asan Medical Center for their technical support and instrumentation funded by Asan Institute for Life Sciences (2018-776).
None declared.