This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
Sleep apnea is a respiratory disorder characterized by frequent breathing cessation during sleep. Sleep apnea severity is determined by the apnea-hypopnea index (AHI), which is the hourly rate of respiratory events. In positional sleep apnea, the AHI is higher in the supine sleeping position than it is in other sleeping positions. Positional therapy is a behavioral strategy (eg, wearing an item to encourage sleeping toward the lateral position) to treat positional apnea. The gold standard of diagnosing sleep apnea and whether or not it is positional is polysomnography; however, this test is inconvenient, expensive, and has a long waiting list.
The objective of this study was to develop and evaluate a noncontact method to estimate sleep apnea severity and to distinguish positional versus nonpositional sleep apnea.
A noncontact deep-learning algorithm was developed to analyze infrared video of sleep for estimating AHI and to distinguish patients with positional vs nonpositional sleep apnea. Specifically, a 3D convolutional neural network (CNN) architecture was used to process movements extracted by optical flow to detect respiratory events. Positional sleep apnea patients were subsequently identified by combining the AHI information provided by the 3D-CNN model with the sleeping position (supine vs lateral) detected via a previously developed CNN model.
The algorithm was validated on data of 41 participants, including 26 men and 15 women with a mean age of 53 (SD 13) years, BMI of 30 (SD 7), AHI of 27 (SD 31) events/hour, and sleep duration of 5 (SD 1) hours; 20 participants had positional sleep apnea, 15 participants had nonpositional sleep apnea, and the positional status could not be discriminated for the remaining 6 participants. AHI values estimated by the 3D-CNN model correlated strongly and significantly with the gold standard (Spearman correlation coefficient 0.79,
This study demonstrates the possibility of using a camera-based method for developing an accessible and easy-to-use device for screening sleep apnea at home, which can be provided in the form of a tablet or smartphone app.
Sleep apnea is a chronic respiratory disorder occurring due to frequent respiratory airflow reduction during sleep. Cessation of airflow lasting for more than 10 seconds is called apnea, whereas partial reduction in airflow by more than 30% for at least 10 seconds—in association with more than a 3% drop in blood oxygen saturation level or arousals—is called hypopnea. Sample images indicating the chest movements during normal breathing, hypopnea, and apnea are shown in
Sample sum of chest and abdomen movements in (A) apnea, (B) hypopnea, and (C) normal breathing.
Positional sleep apnea refers to sleep apnea patients for whom the AHI in the supine sleeping position is at least 50% higher than that in the nonsupine sleeping positions [
The current clinical approach to diagnose sleep apnea and to determine whether or not it is positional is based on polysomnography (PSG). However, PSG requires connecting more than 20 sensors to a user, which is inconvenient. A trained sleep technician manually analyzes recorded PSG signals and annotates the sleep position overnight. Moreover, PSG is expensive (>US $400) and has a long waiting time in some areas (4-36 months in Canada [
Researchers have developed several easy-to-use, convenient, and accessible methods for sleep apnea monitoring. Merchant et al [
Although these methods are more convenient than PSG, sensors attached to the body could potentially disrupt the user’s regular sleep pattern. Therefore, researchers have continued to develop noncontact methods to screen individuals at risk of sleep apnea. For example, we previously developed a deep-learning model to distinguish between different types of apnea. However, as the model was not capable of detecting events, we used ground truth labels for this purpose [
To identify patients at risk of sleep apnea and to distinguish those with positional sleep apnea, an alternative is to use computer vision and machine-learning techniques. We here propose a noncontact algorithm that analyzes infrared videos captured from a participant during sleep to estimate the AHI and to distinguish patients with positional vs nonpositional sleep apnea. Specifically, we used a 3D convolutional neural network (CNN) to analyze movements in infrared videos, to detect apneas, and to estimate the AHI. In experimental evaluation, this model outperformed a baseline model that previously reported state-of-the-art results in noncontact AHI estimation [
The University Health Network Research Ethics Board approved this study (approval number 13-7210-DE). Participants aged 18 to 85 years and without a history of cardiovascular or renal diseases were recruited for this study. Participants were recruited among patients referred for sleep diagnosis at the sleep laboratory of the Toronto Rehabilitation Institute, University Health Network. All participants signed a written consent form before taking part in the study. There were no limitations on blanket usage, movement, or clothing worn during sleep.
Simultaneously with overnight PSG (Embla s4500) that was used for a clinical diagnosis of sleep, infrared videos of participants were recorded at a resolution of 640×480 with 30 frames per second. The participants’ video data were collected and synchronized with PSG signals all night for 5 (±1) hours while sleeping in a single session.
The infrared camera (Point Grey Firefly MV, 0.3 MP, FMVU-03MTM) was mounted approximately 1.5 meters above the bed. For illumination, a separate infrared light source (Raytec RM25-F-50) was mounted on the ceiling. A schematic of the camera setup and sample frame is shown in
Data collection setup and a sample anonymized image frame on the right. IR: infrared.
Respiratory events (apneas and hypopneas) and sleep positions (supine, lateral) of the participant throughout the night were annotated by a trained sleep technician who was blinded to the study. Since the video data were synchronized with PSG data, once the technician annotated the PSG data, all video frames were automatically labeled.
The video frames were first downsampled from 30 Hz to 2 Hz to reduce the computational cost. As breathing frequency is approximately 0.5 Hz during sleep, the reduced frequency of 2 Hz exceeds the Nyquist rate by a factor of 2. To track respiratory movements in the infrared video frames, a CNN dense optical flow (Flownet 2.0 [
Sample input and dense optical flow images.
The 3D-CNN was trained with class-weighted cross-entropy loss (5 for events and 1 for normal) and the Adam optimizer. An initial value of 0.001 for the learning rate and a batch size of 25 for 25,000 epochs were chosen. The total number of parameters in this network was 8,284,265, including 8,281,829 trainable parameters and 2436 nontrainable parameters. Depending on the sleep apnea severity, respiratory events are less frequent in comparison to normal breathing; thus, the data sets were highly imbalanced. In training time, to balance the data set, stride lengths of 0.5 and 15 seconds were used for apneas and normal breathing, respectively. In test time, a stride length of 0.5 seconds was used to predict the binary label of normal breathing versus apneas. The threshold of the trained binary classification (event vs normal) was set to 0.1 to maximize the area under the curve on the training data.
To estimate the AHI, a linear regression model was trained on the following three features: (1) the number of detected events, (2) the total duration of detected events longer than 9 seconds divided by sleep duration, and (3) sleep duration.
The performance of the 3D-CNN was compared against another approach developed by our group, which previously demonstrated state-of-the-art performance in noncontact vision-based estimation of the AHI [
For sleep position detection, a previously developed algorithm [
Sample supine (left) and lateral (right) frames.
Leave-one-person-out cross-validation was used to evaluate the performance of AHI estimation as well as the performance of positional vs nonpositional sleep apnea detection algorithms. Bland-Altman plots and Spearman correlation coefficients were used to evaluate the performance of AHI estimation. Since an AHI of 15 is commonly used as a threshold for screening sleep apnea [
Demographic information of the 41 individuals (26 men and 15 women) recruited for this study is shown in
Participants’ demographic features for apnea-hypopnea index (AHI) estimation (N=41).a
Characteristics | Value, mean (SD) |
Age (years) | 53 (13) |
BMI (kg/m2) | 30 (7) |
Sleep duration (hours) | 5 (1) |
Number of changes in body position | 9 (6) |
Sleep efficiency (%) | 75 (18) |
REMb sleep percentage (%) | 15 (7) |
Mean wake heart rate (bpmc) | 68 (16) |
Mean REM heart rate (bpm) | 67 (16) |
Minimum SaO2d | 82 (9) |
Mean SaO2 | 94 (3) |
AHI (events/hour) | 27 (31) |
Supine AHI (events/hour) | 41 (39) |
Lateral AHI (events/hour) | 21 (34) |
aParticipants’ information was obtained from the sleep reports of the overnight sleep study annotated by sleep technicians.
bREM: rapid eye movement.
cbpm: beats per minute.
dSaO2: arterial oxygen saturation.
The threshold used in this study for detecting position changes and ignoring the small movements (eg, breathing or pulse) was empirically set to 20,000 pixels. The total displacement was calculated by summing the displacement of all optical flow feature points [
To evaluate the performance of AHI detection,
Scatterplots of polysomnography (PSG) apnea-hypopnea index (AHI) vs estimated AHI values. The blue and red lines indicate fitted and unity lines, respectively. CNN: convolutional neural network.
Bland-Altman plots of apnea-hypopnea index (AHI) estimation algorithms. PSG: polysomnography; Est: estimated; CNN: convolutional neural network.
The Spearman correlation coefficients (ρ) for AHI estimation were 0.55 and 0.79 for the baseline and 3D-CNN approach, respectively (
Confusion matrices for screening patients with sleep apnea based on the apnea-hypopnea index threshold of 15. CNN: convolutional neural network.
Performance of models on screening patients with sleep apnea.
Method | Accuracy | Precision | Recall | F1-score |
3D-CNNa | 82.93 | 77.78 | 95.45 | 85.71 |
Baseline (Zhu et al [ |
73.17 | 76.19 | 72.73 | 74.42 |
aCNN: convolutional neural network.
The position detection algorithm estimated the body position with 83% accuracy, an F1-score of 83%, 77% precision, and 91% recall. The performance of the combination of the position detection algorithm with AHI detection on patients with positional sleep is shown in
Confusion matrix for identifying positional sleep apnea. CNN: convolutional neural network.
Performance of the models in detecting positional vs nonpositional sleep apnea.
Method | Accuracy | Precision | Recall | F1-score |
3D-CNNa | 65.71 | 72.22 | 65.00 | 68.42 |
Baseline (Zhu et al [ |
34.29 | 42.11 | 40.00 | 41.03 |
aCNN: convolutional neural network.
The main contributions of this study are: (1) the development and experimental validation of a new noncontact approach to estimate AHI, and (2) application of this method to automatically identify individuals with positional sleep apnea. The newly developed 3D-CNN–based method outperformed the baseline model in estimating the AHI in infrared video data. However, it was ~4 times slower than the baseline algorithm. Nevertheless, the new model could still process 5 hours of sleep data in ~20 hours. Through combining estimated sleeping position information with estimated AHI, this is the first noncontact method that can identify a positional sleep apnea patient.
The developed algorithm achieved comparable performance to existing contact methods (eg, those using a single wearable sensor or a sensor placed under the mattress). For example, Hafezi et al [
Our study has some limitations. One limitation is the failure of the event detection algorithm when the participant moved out of the field of view of the camera or when the room lighting condition suddenly changed. Another limitation is the small number of participants (N=41). The algorithm was validated via leave-one-person-out cross-validation. Future work should examine the generalizability of these models to data collected in new environments.
This study applied machine learning and computer vision approaches to develop a CNN-based method to detect respiratory events in different sleeping positions from data collected via an infrared camera. This method was validated on data from 41 participants to estimate AHI and to identify patients with positional sleep apnea.
This model could be used toward the development of affordable and easy-to-use technologies for screening sleep apnea at home (eg, in the form of a tablet or smartphone app). Such a system could help physicians in choosing suitable treatments for sleep apnea patients. Ultimately, improved treatment will reduce the consequences of untreated sleep apnea such as car accidents, heart disease, diabetes, and high blood pressure.
Architecture of a 3D convolutional neural network used to detect apneas.
apnea-hypopnea index
convolutional neural network
principal component analysis
polysomnography
This work was supported in part by FedDev Ontario; BresoTec Inc; Natural Sciences and Engineering Research Council of Canada (NSERC) through the Discovery Grant (RGPIN-2020-04184); AMS Healthcare Fellowship in Compassion and Artificial Intelligence; and the Toronto Rehabilitation Institute, University Health Network.
None declared.