Background: Sleep disorders are a global challenge, affecting a quarter of the global population. Mobile health (mHealth) sleep apps are a potential solution, but 25% of users stop using them after a single use. User satisfaction had a significant impact on continued use intention.
Objective: This China-US comparison study aimed to mine the topics discussed in user-generated reviews of mHealth sleep apps, assess the effects of the topics on user satisfaction and dissatisfaction with these apps, and provide suggestions for improving users’ intentions to continue using mHealth sleep apps.
Methods: An unsupervised clustering technique was used to identify the topics discussed in user reviews of mHealth sleep apps. On the basis of the two-factor theory, the Tobit model was used to explore the effect of each topic on user satisfaction and dissatisfaction, and differences in the effects were analyzed using the Wald test.
Results: A total of 488,071 user reviews of 10 mainstream sleep apps were collected, including 267,589 (54.8%) American user reviews and 220,482 (45.2%) Chinese user reviews. The user satisfaction rates of sleep apps were poor (China: 56.58% vs the United States: 45.87%). We identified 14 topics in the user-generated reviews for each country. In the Chinese data, 13 topics had a significant effect on the positive deviation (PD) and negative deviation (ND) of user satisfaction. The 2 variables (PD and ND) were defined by the difference between the user rating and the overall rating of the app in the app store. Among these topics, the app’s sound recording function (β=1.026; P=.004) had the largest positive effect on the PD of user satisfaction, and the topic with the largest positive effect on the ND of user satisfaction was the sleep improvement effect of the app (β=1.185; P<.001). In the American data, all 14 topics had a significant effect on the PD and ND of user satisfaction. Among these, the topic with the largest positive effect on the ND of user satisfaction was the app’s sleep promotion effect (β=1.389; P<.001), whereas the app’s sleep improvement effect (β=1.168; P<.001) had the largest positive effect on the PD of user satisfaction. The Wald test showed that there were significant differences in the PD and ND models of user satisfaction in both countries (all P<.05), indicating that the influencing factors of user satisfaction with mHealth sleep apps were asymmetrical. Using the China-US comparison, hygiene factors (ie, stability, compatibility, cost, and sleep monitoring function) and 2 motivation factors (ie, sleep suggestion function and sleep promotion effects) of sleep apps were identified.
Conclusions: By distinguishing between the hygiene and motivation factors, the use of sleep apps in the real world can be effectively promoted.
Approximately a quarter of the global population has sleep problems [- ], such as sleep deprivation, excessive sleep, and reduced sleep quality, resulting in health, safety, and economic issues. These sleep problems not only reduce the quality of life but also increase the risk of various diseases, such as high blood pressure [ ], diabetes [ ], and digestive diseases [ ]. Sleep problems can also cause economic losses [ ] and lead to traffic accidents [ ]. However, in the context of the COVID-19 epidemic, traditional offline sleep interventions are difficult to perform. Therefore, a convenient, remote, and effective tool is necessary to solve or alleviate the global challenge of sleep problems.
Mobile health (mHealth) apps have the potential to provide this tool; however, their use in the real world is poor. The popularity of mobile phones and the immediacy and convenience of the mobile internet have led to the unique advantage of mHealth apps in providing sleep management services, and their effectiveness has been recognized in research. For example, Hasan et al  collected data from 54 randomized controlled trials, including 11,815 participants. They found that compared with conventional sleep interventions, cognitive behavioral therapy for insomnia–based apps significantly increased total sleep time, reduced sleep latency time, and improved sleep efficiency and quality. However, the real-world use of mHealth sleep apps is poor; nearly 25% of users stop using them after one use [ ]. The existing literature indicates that user satisfaction is a major factor influencing continuous intention [ , ]. Therefore, exploring the factors that influence user satisfaction is vital for improving the real-world use of mHealth sleep apps.
Existing research on user satisfaction with mHealth sleep apps is limited by the small sample size, high costs, and difficulty in generalizing the findings. For example, Buman et al  used structured interviews to explore the acceptability and satisfaction of an mHealth sleep app (BeWell24) among 26 veterans. Similarly, Huberty et al [ ] used interviews to explore the satisfaction and acceptability of an mHealth sleep app (Calm) among 88 college students. Philip et al [ ] developed an mHealth app (KANOPEE) that supports the diagnosis of sleep disorders and interventions. Then, they explored the acceptability of the app by distributing questionnaires to 2000 users via the internet.
In the field of mHealth sleep apps, research based on real-world big data deeply exploring the factors influencing user satisfaction is lacking. Existing studies have shown that user satisfaction factors are asymmetrical. For example, the two-factor theory suggests that there is an asymmetrical effect between the factors influencing user satisfaction and dissatisfaction . Influencing factors can be divided into motivation and hygiene factors. Motivation factors refer to the value-added attributes that users usually do not expect (ie, when some features of the app meet user expectations, users are satisfied, and when they do not meet expectations, users are not disappointed). Hygiene factors are basic attributes (ie, users will not feel satisfied when the feature of the app meets their expectations and are disappointed when the feature does not meet their expectations) [ ]. Failure to consider the asymmetry of factors influencing user satisfaction may lead to poor construction and predictive power of the explanatory models [ ].
Guided by the two-factor theory and using the internet as the data source to explore the factors influencing user satisfaction and dissatisfaction with mHealth sleep apps can, we can alleviate the above problems and fill the gaps in the research. Moreover, the number of user reviews is large and can reflect how the apps are used in the real world. Thus, the conclusions obtained may be more objective with less bias. Studies have analyzed user reviews of weight loss and precision nutrition apps [, ]; however, research on mHealth sleep apps is lacking.
Therefore, in this study, we used machine learning–based topic modeling to analyze user reviews of 10 mainstream mHealth sleep apps in Chinese and American mobile app stores. This study had the following research objectives:
- Mine and compare Chinese and American user viewpoints of mHealth sleep apps and explore the factors influencing user satisfaction and dissatisfaction with mHealth sleep apps.
- Validate the asymmetry of influencing factors of mHealth sleep app on user satisfaction and dissatisfaction and identify motivation and hygiene factors of the mHealth sleep app.
- Provide suggestions for improving users’ intentions to continue using mHealth sleep apps.
Identifying 10 Mainstream Apps
In March 2022, we conducted a comprehensive search of sleep-related mHealth apps across leading mobile app stores in China (Chinese Apple App Store, Huawei App Store, Xiaomi App Store, and VIVO App Store) and the United States (US Apple App Store and US Android Google Play Store). The following search terms were used for each app store: “sleep,” “sleep management,” “sleep monitoring,” and “sleep tracking.” After deduplication, we identified 131 unique apps in 4 Chinese mobile app stores and 193 unique apps in 2 US mobile app stores.
The inclusion criteria for the apps were as follows: (1) focus on sleep self-management based on user-generated data, (2) can be used without the assistance of health care givers, and (3) language in English or Chinese. We excluded the apps that met the following exclusion criteria: (1) cannot work properly; (2) app for special groups only, such as infants or the older adults; and (3) sleep management is not the main function or purpose of the app, for example, some comprehensive health management apps.
Finally, 5 unique apps with the highest number of downloads were selected in China and the United States (Sleep Cycle, Pillow, Sleep++, AutoSleep, and Calm in the 2 US app stores, and Little Sleep, Snail Sleep, Tidal Sleep, Sleep-White Noise, and Sleep Cycle in the 4 Chinese app stores). Detailed information on the 10 selected apps is shown in.
|Appa||Mean app ratingb, mean (SD)||Number of ratings or reviewsc||Number of downloadsd||Developer||Category|
|Sleep Cycle (United States)||4.4 (0.3)||>370,000||>50 million||Sleep Cycle AB||Health and fitness|
|Pillow||4.2 (0.2)||>60,000||>60 million||Neybox Digital Ltd||Health and fitness|
|Sleep++||3.6 (1.3)||>20,000||>40 million||Cross Forward Consulting, LLC||Health and fitness|
|AutoSleep||4.6 (0.2)||>170,000||>50 million||Tantsussa Holdings Pty Ltd||Health and fitness|
|Calm||4.3 (0.3)||>350,000||>90 million||Calm.com, Ine||Health and fitness|
|Little Sleep||4.7 (0.2)||>50,000||>50 million||XinChao Technology Co, Ltd||Health and fitness|
|Snail Sleep||4.0 (0.2)||>80,000||>90 million||Seblong Technology (Beijing) Co, Ltd||Health and fitness|
|Tidal Sleep||4.4 (0.1)||>100,000||>40 million||Guangzhou Moreless Network Technology Co, Ltd||Health and fitness|
|Sleep-White Noise||3.8 (1.0)||>20,000||>40 million||SeekerTech Co, Ltd||Health and fitness|
|Sleep Cycle (China)||3.5 (0.9)||>90,000||>70 million||Sleep Cycle AB||Health and fitness|
aEnd date: March 31, 2022.
bThe mean of mean user rating of the app in all selected mobile app stores.
cThe total number of user rating or review times of the app in all selected mobile app stores.
dThe total number of user download times of the app in all selected mobile app stores.
Crawling User-Generated Reviews From App Stores in China and the United States
The app stores provide user ratings and comment functions that allow users to rate and textually describe their experiences with the app. We used the Crawler and Qimai app data analysis platform  to obtain all user reviews published between March 31, 2018, and March 31, 2022, which contained user comments and ratings of the selected apps. A total of 488,071 user reviews were collected, including 267,589 (54.8%) American user reviews and 220,482 (45.2%) Chinese user reviews.
Pipelines of Processes to Clean Review Texts Using Natural Language Processing Techniques
In the data preprocessing section, we applied the Python Natural Language Processing Toolkit and Sentiment Knowledge Enhanced Pretraining (SKEP) algorithm to preprocess the user review data of the 2 countries in the following seven steps:
- Eliminated user reviews in languages other than English and Chinese.
- Contradictory data with inconsistent user ratings and reviews were removed. We used the SKEP algorithm (its accuracy in the sentence-level sentiment classification task was 97.6%) [ ] to judge the sentiment polarity of user reviews, which was compared with the ratings. When the ratings were 4 or 5 points, the sentiment polarity of the reviews was considered positive. When the ratings were 1 or 2 points, the sentiment polarity of the reviews was considered negative. On the basis of this assumption, we excluded mismatched data between the user reviews and ratings ( ). Consequently, 87,587 mismatched user review data were excluded.
- User reviews content text tokenization, then eliminated the numbers and punctuation marks of the content.
- Language stop words (for English reviews using the System for the Mechanical Analysis and Retrieval of Text list and for Chinese reviews using the Harbin Institute of Technology list) and context-specific stop words such as the name of the mHealth sleep app were excluded.
- The remaining words were filtered, retaining only the adverbs, adjectives, and nouns. Studies have shown that these words contain information about the product and product quality [ ].
- For English reviews, we stemmed and lemmatized each word to derive groups with the same root form.
- Eliminated the data of blank records.
Finally, 372,730 user reviews were included after data preprocessing, including 202,963 (54.45%) American user reviews and 169,767 (45.55%) Chinese user reviews.
Latent Dirichlet Allocation to Estimate the Number of Topics
We used latent Dirichlet allocation (LDA), a 3-level hierarchical Bayesian model in which each item of a collection is modeled as a finite mixture over an underlying set of topics, for collections of discrete data such as text corpora, which provides easy operation and high performance [, ]. We used Python Gensim library to implement the LDA modeling. According to the perplexity curves, coherence scores, classification results, and realistic considerations (Figures S1-S6 in ), we identified 14 topics in Chinese and American user reviews.
Tobit Model and Wald Test for Statistical Analyses Based on the Two-Factor Theory
The two-factor theory is the most widely replicated study on user satisfaction . We set 2 dependent variables, namely, positive deviation (PD) and negative deviation (ND) [ ], for data analysis. The 2 variables were defined by the difference between the user rating and the overall rating of the app in the app store. The range of values for PD was (0,4) and that for ND was (−4,0). The independent variable in this study was the probability distribution of each topic returned by the LDA model for each user review. In our study, the Tobit model, which is designed to estimate linear relationships between variables when there is either left-censoring or right-censoring in the dependent variable, was chosen to explore the factors influencing user satisfaction and dissatisfaction as follows:
Where βk represents the correlation coefficient between the kth topic Xki for comment I and user satisfaction (PDi and NDi), k represents the number of topics included in the model, and δi is the error term.
Finally, we applied the Wald test  to verify the variance of the parameters in the PD and ND models to analyze the asymmetrical effects of the factors influencing user satisfaction and dissatisfaction.
Numbers of Topics and Impact Factors
A total of 488,071 user reviews of 10 mainstream mHealth sleep apps were obtained, and after data preprocessing, 372,730 (76.4%) user reviews (202,963, 54.5% from the United States and 169,767, 45.5% from China) were included. After LDA topic modeling, we identified 14 topics in Chinese and American user reviews (the topics identified in both sets were not identical). The topic modeling results are shown inand .
|Topics||Keywords||Reviews, n (%)|
|Topic 1: cost||Free, price, sleep, pay, money, worth, and purchase||28,691 (16.9)|
|Topic 2: reliability||Update, issue, problem, cannot open, trash, fix, and disappoint||28,012 (16.5)|
|Topic 3: usability||Use, easy, help, daily, love, focus, and operate||22,748 (13.4)|
|Topic 4: sleep tracking function||Sleep, night, wake, track, pattern, asleep, and awake||18,165 (10.7)|
|Topic 5: sleep improvement effect||Sleep, feature, well, helpful, health, user, and improve||15,279 (9.0)|
|Topic 6: sleep advice function||Guide, voice, download, advice, read, review, and result||10,186 (6.0)|
|Topic 7: alarm function||Alarm, wake, morning, time, clock, start, and in bed||7130 (4.2)|
|Topic 8: attitude (positive)||Love, great, good, amaze, perfect, excellent, and tool||6960 (4.1)|
|Topic 9: compatibility||Watch, bracelet, Pad, support， not synchronize, sport, and trash||6621 (3.9)|
|Topic 10: sleep evaluation function||Analysis, evaluate, chart, average, record, sleep, hour, and accurate||5941 (3.5)|
|Topic 11: user interface||Awesome, interface, reminder, easy, find, complex, and share||5602 (3.3)|
|Topic 12: sound record function||Sound, snore, noise, sleep, record, interesting, and amazing||5093 (3.0)|
|Topic 13: advertisement distribution||Advertisement, game, videos, compulsion, annoying, and funny||4753 (2.8)|
|Topic 14: reminder function||Recommend, cycle, highly, world, anxious, person, and reminder||4586 (2.7)|
|Topics||Keywords||Reviews, n (%)|
|Topic 1: sleep tracking function||Sleep, night, wake, cycle, pattern, track, and graph||50,334 (24.8)|
|Topic 2: sound record function||Sound, record, support, hear, snore, noise, and issue||21,717 (10.7)|
|Topic 3: sleep improvement effect||Sleep, great, helpful, asleep, feeling, super, and everyday||20,702 (10.2)|
|Topic 4: user interface||Friendly, interface, user, pleasant, forward, improve, and share||18,064 (8.9)|
|Topic 5: usability||Easy, make, feel, nice, use, simple, and difference||14,004 (6.9)|
|Topic 6: alarm function||Wake, alarm, morning, clock, refresh, snooze, sound, and awake||12,786 (6.3)|
|Topic 7: sleep evaluation function||Sleep, review, give, star, change, download, and start||12,583 (6.2)|
|Topic 8: meditation function||Calm, sound, meditation, help, music, use, and daily||9539 (4.7)|
|Topic 9: reliability||Update, fix, version, effective, issue, open, and problem||9133 (4.5)|
|Topic 10: sharing function||Recommend, share, amaze, highly, friend, interest, and pretty||8727 (4.3)|
|Topic 11: sleep advice function||Love, advice, song, amaze, favorite, guide, and stay||7306 (3.6)|
|Topic 12: activity tracking function||Tone, track, perfectly, activity, move, body, and sleep||6900 (3.4)|
|Topic 13: cost||Worth, free, pay, money, price, subscription, and purchase||6697 (3.3)|
|Topic 14: compatibility||iPod, phase, watch, touch, science, trend, and nightly||4971 (2.4)|
Motivation and Hygiene Factors via Tobit Analysis
and show the results of the Tobit analysis. Topic 8 in the Chinese data was user attitude (positive), and it was excluded from the Tobit model because it did not include user opinions on app features and effectiveness.
|Variable||Model 1: positive rating deviationsa||Model 2: negative rating deviationsb|
|Coefficient||SE||P value||Coefficient||SE||P value|
|Topic 1: cost||−1.232||0.025||<.001||−2.136||0.089||<.001|
|Topic 2: reliability||−0.909||0.022||.03||−3.268||0.096||.02|
|Topic 3: usability||−0.415||0.021||.007||−0.719||0.099||.002|
|Topic 4: sleep tracking function||−0.053||0.023||<.001||−0.226||0.098||<.001|
|Topic 5: sleep improvement effect||0.924||0.026||.005||1.185||0.131||<.001|
|Topic 6: sleep advice function||0.219||0.024||<.001||0.840||0.095||<.001|
|Topic 7: alarm function||0.541||0.026||.004||0.253||0.086||.003|
|Topic 9: compatibility||−1.125||0.024||<.001||−1.196||0.093||<.001|
|Topic 10: sleep evaluation function||0.094||0.025||<.001||0.612||0.096||<.001|
|Topic 11: user interface||−0.862||0.030||<.001||−0.429||0.092||<.001|
|Topic 12: sound record function||1.026||0.025||.004||0.098||0.108||<.001|
|Topic 13: advertisement distribution||−0.862||0.031||.004||−1.028||0.095||<.001|
|Topic 14: reminder function||0.862||0.030||.002||0.198||0.089||.004|
aThe maximum likelihood of model 1 was −46,902.561.
bThe maximum likelihood of model 2 was −27,209.943.
|Variable||Model 3: positive rating deviationsa||Model 4: negative rating deviationsb|
|Coefficient||SE||P value||Coefficient||SE||P value|
|Topic 1: sleep tracking function||−0.128||0.014||.002||−1.446||0.040||<.001|
|Topic 2: sound record function||−0.243||0.016||.003||−1.025||0.040||.002|
|Topic 3: sleep improvement effect||1.168||0.020||<.001||1.389||0.053||<.001|
|Topic 4: user interface||0.011||0.017||<.001||0.149||0.047||<.001|
|Topic 5: usability||0.861||0.018||<.001||0.346||0.050||<.001|
|Topic 6: alarm function||−0.049||0.015||.007||−0.305||0.046||.004|
|Topic 7: sleep evaluation function||−0.233||0.018||.004||−1.105||0.041||.006|
|Topic 8: meditation function||0.297||0.014||<.001||0.390||0.043||<.001|
|Topic 9: reliability||−0.826||0.018||<.001||−0.222||0.040||<.001|
|Topic 10: sharing function||0.054||0.016||.004||0.277||0.052||<.001|
|Topic 11: sleep advice function||0.197||0.016||<.001||0.444||0.059||<.001|
|Topic 12: activity tracking function||−0.296||0.018||.006||−0.518||0.048||<.001|
|Topic 13: cost||−0.426||0.016||<.001||−0.983||0.039||<.001|
|Topic 14: compatibility||−0.168||0.015||.006||−0.276||0.040||<.001|
aThe maximum likelihood of model 3 was −70,456.368.
bThe maximum likelihood of model 4 was −29,095.359.
In, model 1 shows the results of the PD for the Chinese user data. We found that 13 user-discussed topics had a significant effect on the PD of user satisfaction. The variable with the largest positive effect was the app’s sound recording function (β=1.026; P<.001), and the variable with the largest negative effect was the value of money (β=−1.232; P<.001). Model 2 provides the results of ND for Chinese user data, and all 13 topics were statistically significant. The variable with the largest positive effect was the sleep improvement effect of the app (β=1.185; P<.001) and that with the largest negative effect was the reliability of the app (β=−3.268; P<.001).
In, model 3 shows the PD results for American user data. All 14 topics had a significant effect on the PD of user satisfaction; the variable with the largest positive effect was the app’s sleep improvement effect (β=1.168; P<.001) and that with the largest negative effect was the app’s reliability (β=−0.826; P<.001). Model 4 presents the results of the ND for American user data. All 14 topics included were statistically significant; the variable with the largest positive effect was the app’s sleep promotion effect (β=1.389; P<.001) and that with the largest negative effect was the sleep tracking function (β=−1.446; P<.001).
Effects of Asymmetrical Attributes by Wald Testing
To explore the effects of asymmetry of the influencing factors, we used the Wald test to verify the differences in the parameters between models 1 and 2 and between models 3 and 4. The results of the Wald test are presented inand . We found that between models 1 and 2 and between models 3 and 4, all effect parameters were significantly different.
In, for Chinese users, 6 topics, namely, sleep improvement effect, sleep advice, alarm, sleep evaluation, sound record, and reminder function of the app, had a significant positive impact on the 2 models, with a significant difference in impact. In contrast, 7 topics, namely, the value of money, reliability, usability, sleep tracking, other device support, user interface, and advertisement, had a significant negative impact on both models.
In, for American users, 6 topics, namely, sleep improvement effect, user interface, ease of use, meditation, sharing function, and sleep advice, had significant positive effects on the 2 models, and the effects were significantly different. In contrast, 8 topics, namely, sleep tracking, sound recording, alarm, sleep evaluation, activity tracking, value of money, compatibility, and reliability, had a significant negative impact on both models, and the impact was significantly different.
|Variablea||Wald test||P value||Reviews in PDb, n (%)||Reviews in NDc, n (%)|
|Topic 1: cost||3181.81||.004||4137 (14.42)||24,554 (85.58)|
|Topic 2: reliability||563.20||.03||9854 (35.18)||18,158 (64.82)|
|Topic 3: usability||1652.26||<.001||9251 (40.67)||13,497 (59.33)|
|Topic 4: sleep tracking function||8926.87||<.001||7956 (43.80)||10,209 (56.20)|
|Topic 5: sleep improvement effect||550.71||.008||9609 (62.89)||5670 (37.11)|
|Topic 6: sleep advice function||571.94||<.001||5727 (56.23)||4459 (43.77)|
|Topic 7: alarm function||148.67||.02||4333 (60.78)||2797 (39.22)|
|Topic 9: compatibility||594.64||<.001||1383 (20.89)||5238 (79.11)|
|Topic 10: sleep evaluation function||186.15||<.001||3143 (52.90)||2798 (47.10)|
|Topic 11: user interface||244.65||.04||1611 (28.76)||3991 (71.24)|
|Topic 12: sound record function||2937.53||<.001||3889 (76.37)||1204 (23.63)|
|Topic 13: advertisement distribution||783.92||.04||1436 (30.22)||3317 (69.78)|
|Topic 14: reminder function||284.35||<.001||2839 (61.92)||1747 (38.08)|
aThe total number of reviews for each topic: topic 1: 28,691, topic 2: 28,012, topic 3: 22,748, topic 4: 18,165, topic 5: 15,279, topic 6: 10,186, topic 7: 7130, topic 8: 6960, topic 9: 6621, topic 10: 5941, topic 11: 5602, topic 12: 5093, topic 13: 4753, and topic 14: 4586.
bPD: positive deviation.
cND: negative deviation.
|Variablea||Wald test||P value||Reviews in PDb, n (%)||Reviews in NDc, n (%)|
|Topic 1: sleep tracking function||2738.40||<.001||10,967 (21.79)||39,367 (78.21)|
|Topic 2: sound record function||826.85||.005||8072 (37.17)||13,645 (62.83)|
|Topic 3: sleep improvement effect||26.17||.03||19,244 (92.96)||1458 (7.04)|
|Topic 4: user interface||550.71||<.001||9868 (54.63)||8196 (45.37)|
|Topic 5: usability||2532.80||.002||11,004 (78.58)||3000 (21.42)|
|Topic 6: alarm function||102.38||<.001||4298 (33.62)||8488 (66.38)|
|Topic 7: sleep evaluation function||729.84||<.001||3698 (29.39)||8885 (70.61)|
|Topic 8: meditation function||129.39||.04||5742 (60.20)||3797 (39.80)|
|Topic 9: reliability||134.82||<.001||3918 (42.90)||5215 (57.10)|
|Topic 10: sharing function||2738.98||<.001||4801 (55.02)||3926 (44.98)|
|Topic 11: sleep advice function||7389.03||.01||4653 (63.70)||2653 (36.30)|
|Topic 12: activity tracking function||182.04||<.001||2887 (41.85)||4013 (58.15)|
|Topic 13: cost||249.94||.004||1869 (27.91)||4828 (72.09)|
|Topic 14: compatibility||1049.89||<.001||1532 (30.82)||3439 (69.18)|
aThe total number of reviews for each topic: topic 1: 50,334, topic 2: 21,717, topic 3: 20,702, topic 4: 18,064, topic 5: 14,004, topic 6: 12,786, topic 7: 12,583, topic 8: 9539, topic 9: 9133, topic 10: 8727, topic 11: 7306, topic 12: 6900, topic 13: 6697, and topic 14: 4971.
bPD: positive deviation.
cND: negative deviation.
Satisfaction of Chinese and American Users With mHealth Sleep Apps
The overall user satisfaction rate of mHealth sleep apps was poor, and compared with the US users, Chinese users were slightly more satisfied with such apps. Among the Chinese user review data, 96,056 user reviews had ratings higher than 3, indicating a user satisfaction rate of 56.58%. Among the US user review data, 93,094 reviews had ratings higher than 3, indicating a user satisfaction rate of 45.87%. The low user satisfaction rates explain the poor user engagement and low willingness to use mHealth sleep apps . However, this result is inconsistent with the findings of some user satisfaction studies using small samples [ - ], such as those by Philip et al [ ], which indicated that 91.6% (395/431) of users rated the system as “satisfactory” or above and 61.7% (266/431) rated the system as “very satisfactory.”
Users in both countries commented on the app’s functionality, usability, reliability, compatibility, user interface, value of money, and sleep improvement effects. First, users discussed the functions of the apps, mainly focusing on sleep monitoring, sound and sleep movement recording, and the user experience related to these features. Second, users discussed the usability and compatibility of the apps, focusing on the various failures encountered during use, such as flashbacks and lagging. They also provided feedback on app compatibility with other devices (sports bracelets, sports watches, etc), such as information that was not synchronized or incompatible. Third, feedback on the user interface of the app mainly focused on the design, user experience after interacting with the interface, and distribution of advertisements. The use of too many advertisements was a common source of dissatisfaction. Moreover, users heavily discussed the apps’ value for money, mainly concerning the price of the app and charging issues, which led to user dissatisfaction. Finally, the effect of the app’s intervention on sleep was another aspect of concern.
According to the probability of topic distribution, we found that the content of discussions about mHealth sleep apps differed significantly between the Chinese and American users. American users cared more about the functions of the apps, discussing sleep tracking (50,334/202,963, 24.8%) and sound recording functions (21,717/202,963, 10.7%) significantly more than the value for money (6697/202,963, 3.3%) and compatibility (4971/202,963, 2.25%). American users also mentioned the user interface design of the app (18,064/202,963, 8.9%). In contrast, Chinese users were price sensitive and discussed the app’s cost (28,691/169,767, 16.9%), reliability (28,012/169,767, 16.5%), and usability (22,748/169,767, 13.4%).
Hygiene and Motivation Factors of mHealth Sleep Apps
The results of the Wald test demonstrated that all effect parameters in the 4 models were significantly different, indicating that the factors influencing Chinese and American user satisfaction with mHealth sleep apps were asymmetrical. Therefore, we obtained the hygiene and motivation factors of mHealth sleep apps in both countries.
We found that the cost, reliability, usability, compatibility, user interface, advertisements, and sleep monitoring functions of the app were the hygiene factors of Chinese mHealth sleep apps. The hygiene factors for American mHealth sleep apps included reliability, cost, and compatibility of the app; sleep monitoring; sound and sleep movement recording; smart alarm; and sleep assessment functions. The probability of the above topics appearing in user reviews increased with the degree of user dissatisfaction, indicating that the above factors were the basic user-expected attributes of sleep apps. The sleep promotion effect, sleep suggestion, smart alarm, sleep assessment, sleep sound recording, and reminder functions of the app were the motivation factors for Chinese mHealth sleep apps. The motivation factors of American sleep apps included the sleep promoting effect, user interface, usability, meditation, sleep suggestion, and sharing functions of the app. The probability of the above features appearing in user reviews positively influenced user satisfaction, indicating that the above factors were the value-added attributes of sleep apps.
Using the China-US comparison, we found that user satisfaction was mostly related to the app’s sleep promotion effects and sleep advice function, whereas user dissatisfaction was mostly related to the app’s stability, compatibility, value of money, and sleep tracking function.
Suggestions for Improving Users’ Intentions to Continue the Use of mHealth Sleep Apps
An in-depth analysis of the asymmetry of factors influencing user satisfaction and dissatisfaction with mHealth sleep apps is of great value in providing a reference for improving users’ intentions to continue using mHealth sleep apps.
As a strategy for promoting the continued use intentions of sleep app users, hygiene factors should be improved. First, the reliability and usability of the app should be improved, as studies have shown that poor usability is one of the most common reasons for users to abandon mHealth apps . Second, experts need to enhance the compatibility of apps with wearable devices and solve problems such as unsynchronized terminal data app pricing. Finally, sleep apps should provide more accurate sleep tracking services. Although sleep monitoring is the core function of mHealth sleep apps, users are generally dissatisfied with the accuracy of this function. Under the premise of improving hygiene factors, experts should focus on the sleep promotion effect and sleep advice function of apps to further improve user satisfaction. Scientific and effective sleep advice is the basis of sleep promotion, which is the core goal of mHealth sleep apps, but studies have shown that existing apps lack clinical validation of sleep promotion effects [ ]. Therefore, experts should conduct clinical trials of different scales to fully validate the effectiveness of apps in sleep intervention while continuously improving the internal app algorithms to provide more effective sleep recommendation services based on scientific evidence.
This study had several limitations. First, because not all users provide reviews, we cannot evaluate the satisfaction with mHealth sleep apps of such users, which could lead to selection bias. Nevertheless, 372,730 user reviews were analyzed, which provided sufficient data to explore user satisfaction with mHealth sleep apps and their influencing factors. Second, our analysis did not consider the characteristics of app users and the metadata of user reviews, which may have improved the accuracy of the model and significantly affected the results. However, the user review data in mobile app stores usually contain the user’s name, release date, user score, and user text reviews. In addition, we cannot obtain the user characteristics and metadata of user reviews without the assistance of app developers. This is a common limitation in user-generated content studies. Moreover, only user reviews of 10 mainstream mHealth sleep apps from mobile app stores in China and the United States were included, leading to possible bias. However, China and the United States have the highest number of internet users in the world, and the number of active users of mainstream apps far exceeds that of other apps. Thus, we believe that the results of this study are representative and adequate to reflect the overall satisfaction with mHealth sleep apps.
To improve users’ intention to continue using mHealth sleep apps, we identified motivation and hygiene factors for such app use by processing and comparing >480,000 user reviews of 10 mainstream sleep apps in China and the United States. We found that the user satisfaction rates of mHealth sleep apps were poor, resulting in users’ low continued use intentions. Moreover, the effects of the influencing factors of Chinese and American user satisfaction and dissatisfaction with mHealth sleep apps were asymmetrical. Overall, the motivation factors for mHealth sleep apps were the sleep suggestion function and sleep promotion effects of the apps. The hygiene factors included the reliability, compatibility, value of money, and sleep monitoring function of the apps. To promote the use of mHealth sleep apps in the real world, we should first improve the hygiene factors and then achieve the motivation factors. In future research, we should focus on the characteristics of the user and the metadata of user reviews and then introduce them into the equation as covariates to further improve the accuracy of the model.
This work was funded by the Natural Science Foundation of Beijing Municipality (grant 7222306), National Traditional Chinese Medicine (TCM) Innovation Team and Talent Support Projects (grant ZYYCXTD-C-202210), Hainan Province Science and Technology Special Fund (grant ZDYF2020132), Hainan Province Clinical Medical Center (grant QWYH202175), and Specific Research Fund of The Innovation Platform for Academicians of Hainan Province (grant YSPTZX202208).
The user review data were extracted between March 31, 2018, and March 31, 2022, through the Crawler and the Qimai app data analysis platform from 4 Chinese mobile app stores (China Apple App Store, Huawei App Store, Xiaomi App Store, and VIVO App Store) and 2 mobile app stores in the United States (US Apple App Store and US Android Google Play Store). The source code for our analysis is publicly available at GitHub .
MN, SZ, QW, HH, and J Lei participated in the study concept and design. MN, SZ, and TW collected and preprocessed the data. MN, SZ, TW, J Liang, HJF, and J Lei participated in the acquisition, analysis, and interpretation of the data. HH and J Lei received funding for this study. MN and SZ drafted the manuscript. All authors critically revised, read, and approved the final manuscript. MN, SZ, HH, and J Lei had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Conflicts of Interest
Supplementary figures.DOCX File , 736 KB
- Cao XL, Wang SB, Zhong BL, Zhang L, Ungvari GS, Ng CH, et al. The prevalence of insomnia in the general population in China: a meta-analysis. PLoS One 2017 Feb 24;12(2):e0170772 [FREE Full text] [CrossRef] [Medline]
- Buysse DJ. Insomnia. JAMA 2013 Feb 20;309(7):706-716 [FREE Full text] [CrossRef] [Medline]
- Senaratna CV, Perret JL, Lodge CJ, Lowe AJ, Campbell BE, Matheson MC, et al. Prevalence of obstructive sleep apnea in the general population: a systematic review. Sleep Med Rev 2017 Aug;34:70-81. [CrossRef] [Medline]
- Ram S, Seirawan H, Kumar SK, Clark GT. Prevalence and impact of sleep disorders and sleep habits in the United States. Sleep Breath 2010 Feb;14(1):63-70. [CrossRef] [Medline]
- Ren R, Li Y, Zhang J, Zhou J, Sun Y, Tan L, et al. Obstructive sleep apnea with objective daytime sleepiness is associated with hypertension. Hypertension 2016 Nov;68(5):1264-1270. [CrossRef] [Medline]
- Anothaisintawee T, Reutrakul S, Van Cauter E, Thakkinstian A. Sleep disturbances compared to traditional risk factors for diabetes development: systematic review and meta-analysis. Sleep Med Rev 2016 Dec;30:11-24. [CrossRef] [Medline]
- Orr WC, Fass R, Sundaram SS, Scheimann AO. The effect of sleep on gastrointestinal functioning in common digestive diseases. Lancet Gastroenterol Hepatol 2020 Jun;5(6):616-624. [CrossRef] [Medline]
- Kessler RC, Berglund PA, Coulouvrat C, Hajak G, Roth T, Shahly V, et al. Insomnia and the performance of US workers: results from the America insomnia survey. Sleep 2011 Sep 01;34(9):1161-1171 [FREE Full text] [CrossRef] [Medline]
- Léger D, Bayon V, Ohayon MM, Philip P, Ement P, Metlaine A, et al. Insomnia and accidents: cross-sectional study (EQUINOX) on sleep-related home, work and car accidents in 5293 subjects with insomnia from 10 countries. J Sleep Res 2014 Apr;23(2):143-152 [FREE Full text] [CrossRef] [Medline]
- Hasan F, Tu YK, Yang C, James Gordon C, Wu D, Lee H, et al. Comparative efficacy of digital cognitive behavioral therapy for insomnia: a systematic review and network meta-analysis. Sleep Med Rev 2022 Feb;61:101567. [CrossRef] [Medline]
- Perez C. Nearly 1 in 4 people abandon mobile apps after only one use. TechCrunch+. 2016 May 31. URL: https://techcrunch.com/2016/05/31/nearly-1-in-4-people-abandon-mobile-apps-after-only-one-use/ [accessed 2022-04-20]
- Wang T, Fan L, Zheng X, Wang W, Liang J, An K, et al. The impact of gamification-induced users' feelings on the continued use of mHealth apps: a structural equation model with the self-determination theory approach. J Med Internet Res 2021 Aug 12;23(8):e24546 [FREE Full text] [CrossRef] [Medline]
- Lin Z, Filieri R. Airline passengers’ continuance intention towards online check-in services: the role of personal innovativeness and subjective knowledge. Transp Res E Logist Transp Rev 2015 Sep;81:158-168. [CrossRef]
- Buman MP, Epstein DR, Gutierrez M, Herb C, Hollingshead K, Huberty JL, et al. BeWell24: development and process evaluation of a smartphone "app" to improve sleep, sedentary, and active behaviors in US Veterans with increased metabolic risk. Transl Behav Med 2016 Sep;6(3):438-448 [FREE Full text] [CrossRef] [Medline]
- Huberty J, Green J, Glissmann C, Larkey L, Puzia M, Lee C. Efficacy of the mindfulness meditation mobile app "Calm" to reduce stress among college students: randomized controlled trial. JMIR Mhealth Uhealth 2019 Jun 25;7(6):e14273 [FREE Full text] [CrossRef] [Medline]
- Philip P, Dupuy L, Morin CM, de Sevin E, Bioulac S, Taillard J, et al. Smartphone-based virtual agents to help individuals with sleep concerns during COVID-19 confinement: feasibility study. J Med Internet Res 2020 Dec 18;22(12):e24268 [FREE Full text] [CrossRef] [Medline]
- Herzberg F. Motivation to Work. Milton Park, UK: Routledge; 1993.
- Streukens S, de Ruyter K. Reconsidering nonlinearity and asymmetry in customer satisfaction and loyalty models: an empirical study in three retail service settings. Mark Lett 2004 Jul;15(2/3):99-111. [CrossRef]
- Frie K, Hartmann-Boyce J, Jebb S, Albury C, Nourse R, Aveyard P. Insights from Google Play Store user reviews for the development of weight loss apps: mixed-method analysis. JMIR Mhealth Uhealth 2017 Dec 22;5(12):e203 [FREE Full text] [CrossRef] [Medline]
- Zečević M, Mijatović D, Kos Koklič M, Žabkar V, Gidaković P. User perspectives of diet-tracking apps: reviews content analysis and topic modeling. J Med Internet Res 2021 Apr 22;23(4):e25160 [FREE Full text] [CrossRef] [Medline]
- Qimai Data (formerly ASO100) - professional mobile product business analysis platform - ASO-ASM optimization. Qimai Data. URL: https://www.qimai.cn/ [accessed 2022-04-21]
- Tian H, Gao C, Xiao X, Liu H, He B, Wu H, et al. SKEP: sentiment knowledge enhanced pre-training for sentiment analysis. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020 Presented at: ACL '20; July 5-10, 2020; Virtual p. 4067-4076. [CrossRef]
- Guo Y, Barnes SJ, Jia Q. Mining meaning from online ratings and reviews: tourist satisfaction analysis using latent dirichlet allocation. Tour Manag 2017 Apr;59:467-483. [CrossRef]
- Jelodar H, Wang Y, Yuan C, Feng X, Jiang X, Li Y, et al. Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimed Tools Appl 2018 Nov 28;78(11):15169-15211. [CrossRef]
- Min KB, Song SH, Min JY. Topic modeling of social networking service data on occupational accidents in Korea: latent Dirichlet allocation analysis. J Med Internet Res 2020 Aug 13;22(8):e19222 [FREE Full text] [CrossRef] [Medline]
- Park S, Lee JS, Nicolau JL. Understanding the dynamics of the quality of airline service attributes: satisfiers and dissatisfiers. Tour Manag 2020 Dec;81:104163 [FREE Full text] [CrossRef] [Medline]
- Alqahtani F, Orji R. Insights from user reviews to improve mental health apps. Health Informatics J 2020 Sep;26(3):2042-2066 [FREE Full text] [CrossRef] [Medline]
- Ananth S. Sleep apps: current limitations and challenges. Sleep Sci 2021;14(1):83-86 [FREE Full text] [CrossRef] [Medline]
- The source code for the analysis. URL: https://github.com/MingfuNUO/Source-Code [accessed 2022-06-07]
|LDA: latent Dirichlet allocation|
|mHealth: mobile health|
|ND: negative deviation|
|PD: positive deviation|
|SKEP: Sentiment Knowledge Enhanced Pretraining|
Edited by G Eysenbach; submitted 21.09.22; peer-reviewed by F Wang, J Roland, Y Jeem, SM Ayyoubzadeh, W Ceron, F Wang, X Wang; comments to author 07.11.22; revised version received 14.11.22; accepted 28.11.22; published 31.01.23Copyright
©Mingfu Nuo, Shaojiang Zheng, Qinglian Wen, Hongjuan Fang, Tong Wang, Jun Liang, Hongbin Han, Jianbo Lei. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 31.01.2023.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.