This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
Asthma is one of the most prevalent chronic respiratory diseases. Despite increased investment in treatment, little progress has been made in the early recognition and treatment of asthma exacerbations over the last decade. Nocturnal cough monitoring may provide an opportunity to identify patients at risk for imminent exacerbations. Recently developed approaches enable smartphone-based cough monitoring. These approaches, however, have not undergone longitudinal overnight testing nor have they been specifically evaluated in the context of asthma. Also, the problem of distinguishing partner coughs from patient coughs when two or more people are sleeping in the same room using contact-free audio recordings remains unsolved.
The objective of this study was to evaluate the automatic recognition and segmentation of nocturnal asthmatic coughs and cough epochs in smartphone-based audio recordings that were collected in the field. We also aimed to distinguish partner coughs from patient coughs in contact-free audio recordings by classifying coughs based on sex.
We used a convolutional neural network model that we had developed in previous work for automated cough recognition. We further used techniques (such as ensemble learning, minibatch balancing, and thresholding) to address the imbalance in the data set. We evaluated the classifier in a classification task and a segmentation task. The cough-recognition classifier served as the basis for the cough-segmentation classifier from continuous audio recordings. We compared automated cough and cough-epoch counts to human-annotated cough and cough-epoch counts. We employed Gaussian mixture models to build a classifier for cough and cough-epoch signals based on sex.
We recorded audio data from 94 adults with asthma (overall: mean 43 years; SD 16 years; female: 54/94, 57%; male 40/94, 43%). Audio data were recorded by each participant in their everyday environment using a smartphone placed next to their bed; recordings were made over a period of 28 nights. Out of 704,697 sounds, we identified 30,304 sounds as coughs. A total of 26,166 coughs occurred without a 2-second pause between coughs, yielding 8238 cough epochs. The ensemble classifier performed well with a Matthews correlation coefficient of 92% in a pure classification task and achieved comparable cough counts to that of human annotators in the segmentation of coughing. The count difference between automated and human-annotated coughs was a mean –0.1 (95% CI –12.11, 11.91) coughs. The count difference between automated and human-annotated cough epochs was a mean 0.24 (95% CI –3.67, 4.15) cough epochs. The Gaussian mixture model cough epoch–based sex classification performed best yielding an accuracy of 83%.
Our study showed longitudinal nocturnal cough and cough-epoch recognition from nightly recorded smartphone-based audio from adults with asthma. The model distinguishes partner cough from patient cough in contact-free recordings by identifying cough and cough-epoch signals that correspond to the sex of the patient. This research represents a step towards enabling passive and scalable cough monitoring for adults with asthma.
Asthma is one of the most prevalent chronic respiratory diseases [
Disease control (also referred to as asthma control) is defined as the degree to which symptoms are controlled by treatment [
The monitoring of coughing by quantifying the number of coughs per unit of time has been attempted by researchers since the 1950s [
Smartphones are now ubiquitous [
To develop an automatic cough monitoring system is challenging due to the rare occurrence of coughing in comparison to the occurrence of other sounds. This natural imbalance of cough and noncough sounds poses two problems. First, it demands high specificity from the cough monitoring system to avoid false alarms from other similar and more frequently occurring sounds. Second, existing classification methods tend to perform poorly on minority-class examples when the data set is highly imbalanced [
The study was designed to mimic a real-world use case; data were collected from a smartphone placed on the bedside table in the participant's bedroom. The aim was to build a cough classifier and to evaluate its performance on unseen data. Further, we aimed to use the classifier to segment and count cough events over the course of the night. Building upon previous work [
Though a smartphone-based nocturnal cough monitoring service may enable passive monitoring in theory, its utility for application in practice depends on whether coughs can be correctly assigned to individuals. Prior research has shown that humans are able to determine whether the source of a cough is male or female based on sound alone [
This study involved the collection of smartphone-recorded audio and daily questionnaire data, the definition and quantification of coughs within that data (data annotation and automated cough recognition and segmentation), sex-based classification, and model performance evaluation.
We used data collected in a multicenter, longitudinal observational study over a 29-day period (28 nights) [
All participant data were collected by the physician (asthma evaluation data) or the nurse (lung function evaluations) in the study centers and were transferred to an electronic format and stored online on the study server. Nightly sensor data were stored locally on the smartphone. Data were backed up to external hard drives and secure online storage once the participant had completed the study and had returned the smartphone.
The study protocol was reviewed and approved by the
With respect to cough monitoring systems, the definition of cough depends on the modality used for monitoring [
Before annotation, silence was marked by applying a decibel filter by means of the Audacity software to the recordings. The Sound Finder filter marked sounds below –26 dB as silence with the constraint that the minimum duration of silence between sounds was 1 second. These periods marked as silence served as visual aids for the remainder of the annotation process. Human annotators listened to the smartphone recordings and labeled the periods that were not marked as silence as a cough if an explosive cough sound was identified [
We used two approaches to verify the quality of the labeling. First, we instructed human annotators to label an acoustic event if they were unsure that it was a cough. If annotators were unsure, the event was discarded and was not considered in the analysis. The remainder of acoustic events were classified as noncoughs. Second, the interrater reliability for the annotators was determined using intraclass correlation. A zero-inflated generalized mixed-effects model with a Poisson response was used [
When developing neural networks, the split into sets for training, validation, and testing of the model is favored over other approaches that involve cross-validation because of the long training phases of the models; however, this comes at the risk of overfitting the model to the specific data set and a lack of generalizability to unseen samples. To mitigate these effects, we split our data into disjunct data sets which contained a different set of participants in the training, validation, or test sets. Furthermore, the large number of cough samples and participants in comparison to former studies [
Training, validation, and test sets were created in the following way. From participants with complete data sets, roughly 20% were drawn at random; these nocturnal audio recordings constituted the testing corpus for our evaluations. From the remaining participants, roughly 15% were drawn at random to be included in the validation set. The remaining participants then comprised the training set. Thus, data were roughly split into a ratio of 65:15:20. For model evaluation, the neural network was first trained on the training set. Hyperparameter tuning and model selection were performed on the validation set. Once the best performing parameters and model were selected, the final training was completed on the unified training and validation set. Model results were derived from the test set. The data split proportions were motivated by the fact that larger amounts of training data improve the performance of the classifier [
This work was built upon a convolutional neural network architecture for cough recognition that we introduced in previous work [
The architecture consists of 5 convolutional layers with alternating max-pooling layers followed by a global max-pooling layer. The annotation "16@80x122" refers to a feature map with dimensions (height x width) and 16 channels. The annotation "1x7 Convolution" refers to a convolutional filter with spatial dimensions (height x width).
A common problem in real data sets is that some classes have more samples than others. This class imbalance can have a considerable detrimental effect on convergence during the training phase and generalization of a model on the test set [
To train the classifier, nonoverlapping 650 ms windows from the noncough acoustic events were extracted. The duration of the window was based upon previous work. The same approximate duration has performed best in other cough monitoring approaches [
In ensemble learning, combining individual models may help to improve generalization performance, if the individual models are dissimilar [
Although the multilayer architecture of neural networks is essential, it is the back-propagation algorithm [
This process was done iteratively using the learning rule defined as follows:
where
Thresholding adjusts the decision threshold of a classifier. It is typically applied in the test phase and involves changing the output class probabilities [
where
In cough segmentation, the objective is to segment coughs from continuous audio recordings by employing a trained convolutional neural network ensemble classifier. We extracted 650 ms window with an overlap of 65 ms. To discard silent windows, each of these windows passes through a decibel filter that removes windows with sounds below –26 dB as in the annotation process. Subsequently, preprocessed into Mel spectrograms for each window were computed (as described in
The steps for the segmentation of coughs from continuous audio recordings (from top to bottom): First, the continuous extraction of overlapping windows from continuous audio recordings; second, the discarding of silent windows by applying a dB filter; third, the computation of Mel spectrograms; fourth, the computation of the prediction probability of cough by the convolutional neural network ensemble; last, the recognition of cough by applying the postprocessing rules. CNN: convolutional neural network.
For the data set partitioning for determining the source of each cough by sex, we used the complete data set and did not consider the partitioning used for cough recognition. The reason for this lay in the limited amount of data that fulfilled the filtering requirements for the analysis. Since the data collection study included couples or multiple people in one room, we filtered the annotated cough data based on the information collected daily regarding whether the participant slept alone or not. We then filtered the data of the corresponding nights to create a balanced data set of male and female coughs that included 19 female and 19 male participants. We conducted our analysis on both extracted cough and cough-epoch signals. In both cases, we partitioned the data set into a disjunct training set of 10 female and 9 male and a test set of 9 female and 10 male participants.
Gaussian mixture models in combination with Mel-frequency cepstral coefficients [
where
Considering the equal distribution of both sexes in the partitioned data set, a feature vector
Mel-frequency cepstral coefficients (n=20) were computed with 256 samples between successive frames and a 4096-point fast Fourier transform. Analogously, the zero-crossing rate was computed over frames of 4096 samples with 256 samples between successive frames. These features were then vertically concatenated, which resulted in a matrix where the first dimension contained 41 entries. Feature selection and specific parameters were determined by employing 5-fold cross-validation on the training set. Hyperparameters of the Gaussian mixture models were also determined by employing 5-fold cross-validation on the training set which resulted in 30 Gaussian distributions each for female and male classes. Further hyperparameters were the number of initializations (n=3), number of expectation-maximization iterations (n=200), and the use of diagonal-type covariance.
For the evaluation of the performance of the different models, we reported several metrics such as sensitivity (true positive rate), specificity (true negative rate), accuracy, Matthews correlation coefficients, predictive positive value, negative predictive value, receiver operating characteristic curve, precision-recall curve, and Bland-Altman plot. These metrics are commonly used in machine learning and research in the context of clinical decision-support systems. For the segmentation of cough and cough epoch, we reported the number of false positives, true positives, and false negatives per night. These metrics are defined in
A total of 94 participants (female: 54/94, 57%; male: 40/94, 43%) were recruited for the study. Ages of the participants ranged from 18 to 89 years with a mean of 43 (SD 16) years. Fifteen of the 94 participants were excluded from the analysis; 2 participants withdrew, 3 participants were not involved in the study procedures for more than 5 days, and 10 participants had more than 5 nights of missed audio recordings. Some of the missed audio recordings were due to technical difficulties (such as the app crashed) while some were participant-related (such as the participant’s smartphone had been turned off).
Of the 79 participants whose data were included for analysis, 15 participants were initially drawn at random to be included in the test set. From the remaining 64 participants, 12 additional participants were drawn at random and included in the validation set. Data from the remaining 52 participants comprised the training set.
Of a total of 704,697 acoustic events, 30,304 were clearly classified as coughing and 0.11% (767/704,697) were discarded. A total of 2,547,187 noncough and 30,304 cough Mel spectrograms were computed yielding a 0.015 class ratio.
Thresholds of 0.98 and 0.95 yielded best results in terms of Matthews correlation coefficient for the single and ensemble convolutional neural networks of 84% and 91% on the validation set, respectively.
Two of the annotators together accounted for 90.23% of all nights. These two annotators had an intraclass correlation of 95.79% (mean absolute error: 0.44 coughs per night). We calculated the intraclass correlation based on 65 nights. The intraclass correlation was interpreted as excellent.
We evaluated the classifiers on the testing set, which consisted of 5489 cough and 541,972 noncough events. The test set represented the soundscape encountered in the bedrooms of 15 participants over the course of 28 nights. As shown in
Results of the convolutional neural network classifier for cough recognition.
Model type | True positive rate, % | True negative rate, % | Accuracy, % | Matthews correlation coefficient, % | Positive predictive value, % | Negative predictive value, % |
Single | 99.9 | 87.5 | 99.7 | 87.2 | 99.9 | 87.1 |
Ensemble | 99.9 | 91.5 | 99.8 | 92.0 | 99.9 | 92.6 |
Precision-recall curves with the corresponding area-under-the-curve values, for the single and ensemble convolutional neural network models for the recognition of coughing. The dashed line represents the curve for a random classifier showing the proportion of cough-class instances to the total amount of instances. AUC: area under the curve; CNN: convolutional neural network.
The test set included 15 participants and 421 nights; human annotators counted as few as zero and as many as 368 coughs in one night. The mean count difference between automated and annotator coughs was –0.1 (95% CI –12.11, 11.91) coughs per night (
Bland-Altman plot of the automated and annotator cough counts per night.
Histogram of the differences between automated and annotator cough counts per night.
Bland-Altman plot of the automated and annotator cough-epoch counts per night.
Histogram of the differences between automated and annotator cough-epoch counts per night.
Using the data set that included 19 female and 19 male participants which had been selected for a balanced set of male and female coughs, both extracted cough and cough-epoch signals were analyzed. The partitioning resulted in 1532 female and 1527 male coughs for training and 500 female and 498 male coughs for testing. In the case of cough epochs, this partitioning led to 366 female and 351 male cough epochs for training, and 194 male and 134 female cough epochs for testing.
As shown in
Gaussian mixture model results of sex recognition for coughs and cough epochs.
Model for | True positive rate, % | True negative rate, % | Accuracy, % | Matthews correlation coefficient, % | Positive predictive value, % | Negative predictive value, % |
Cough | 81.0 | 71.8 | 74.8 | 49.6 | 57.8 | 88.8 |
Cough epochs | 95.0 | 74.9 | 83.2 | 69.1 | 72.8 | 95.5 |
Receiver operating characteristic curves with corresponding area-under-the-curve values for cough and cough epoch–based sex assignment. The dashed line represents the curve for a random classifier. AUC: area under the curve; ROC: receiver operating characteristic.
To the best knowledge of the authors, the data set in this paper is the largest, real-life cough data set with published recognition and segmentation results, not only for adults with asthma but across all respiratory conditions. Given the data set of continuous overnight recordings of 79 adults with asthma in different soundscapes (excluding dropouts), our results demonstrate that cough recognition from smartphone-based audio recordings is feasible. The ensemble classifier performed well with values greater than 90% across different metrics for the pure classification task and achieved comparable cough counts to that of human annotators in the segmentation of coughing from continuous overnight audio recordings. In specific cases (for example, the 6 nights with a difference in cough counts of 20 and above), a need for further development was demonstrated. We listened to the original recording of these cases and believe these failures were caused by strong background noise, peculiar chuckle and laughter sounds, and a specific type of music, among others. These sounds, however, strongly suggest that the participant was not asleep.
We also provided a first step towards distinguishing partner cough from patient cough by determining the source of cough signals classifying those that corresponded to sex of the patient as patient coughs. This can be applied to cough recordings from the bedrooms of opposite-sex couples, even when both are coughing. Our results further indicate that cough epoch–based sex classification has greater potential than that of cough-based sex classification. This may be explained by the fact that cough epochs are longer and may contain more periodic information, rather than the limited amount of periodic information contained in the short bursts of the explosive cough sound. Speech signals of a typical adult male contain a fundamental frequency from 85 to 180 Hz and those of a typical adult female from 165 to 255 Hz [
Cough-monitoring systems that are capable of detecting reflex coughs in audio recordings have been proposed in previous work [
There were several limitations in our study regarding the generalization of our results. First, we only used data collected by one specific model of smartphone. It has previously been demonstrated [
Our study proposed a combined approach to combat the detrimental effect of learning from highly imbalanced data sets by combining techniques such as ensemble learning, balanced minibatch training, and decision thresholding. We showed that automated methods can recognize nocturnal coughs and cough epochs in smartphone-based audio recordings. The model addressed distinguishing subject coughs from those of a bed partner in contact-free recordings by classifying cough and cough-epoch signals to the corresponding sex of the participant. This research enables smartphone-based cough monitoring of individuals and of couples of different sexes in their bedrooms. It represents a step towards passive, scalable cough monitoring for people with asthma, and thus contributes to the development of a scalable diagnostic tool for early recognition of exacerbations.
Additional information.
MB, MAP, PT, FB, CS-S, FR, and TK contributed to the study design. FR and CS-S were responsible for the study execution. PT and FB provided the technological support of the study execution. FB developed the study app and embedded the content created by PT and TK. The data annotation process was designed by PT and FB, and evaluated by PT. Data preprocessing, analysis, and machine learning were conducted by FB. Writing and editing of the manuscript was done by FB. Critical review and revision of the manuscript were done by DK, MB, MAP, TK, PT, CS-S, FR, and EF. FR was the principal investigator of the clinical study. This study was funded by CSS Insurance, Switzerland. The CSS insurance supported the recruitment of participants but had no role in study design, app design, data management plans, or in reviewing and approving the manuscript for publication. DK’s participation in this research was funded by Dartmouth College and by the US National Institute of Drug Abuse through the Center for Technology and Behavioral Health at Dartmouth College. The views and conclusions contained in this document are those of the authors and do not necessarily represent the official policies, either expressed or implied, of the sponsors.
FB, PT, EF, and TK are affiliated with the Center for Digital Health Interventions, a joint initiative of the Department of Management, Technology, and Economics at Eidgenössische Technische Hochschule Zurich and the Institute of Technology Management at the University of St. Gallen, which is funded in part by the Swiss health insurer CSS. EF and TK are also cofounders of Pathmate Technologies, a university spin-off company that creates and delivers digital clinical pathways and has used the open-source MobileCoach platform for that purpose; however, Pathmate Technologies is not involved in the study app described in this paper.
EF and TK are also cofounders of Pathmate Technologies, a university spin-off company that creates and delivers digital clinical pathways and has used the open-source MobileCoach platform for that purpose; however, Pathmate Technologies is not involved in the study app described in this paper.