This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
Artificial intelligence (AI)–based chatbots can offer personalized, engaging, and on-demand health promotion interventions.
The aim of this systematic review was to evaluate the feasibility, efficacy, and intervention characteristics of AI chatbots for promoting health behavior change.
A comprehensive search was conducted in 7 bibliographic databases (PubMed, IEEE Xplore, ACM Digital Library, PsycINFO, Web of Science, Embase, and JMIR publications) for empirical articles published from 1980 to 2022 that evaluated the feasibility or efficacy of AI chatbots for behavior change. The screening, extraction, and analysis of the identified articles were performed by following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines.
Of the 15 included studies, several demonstrated the high efficacy of AI chatbots in promoting healthy lifestyles (n=6, 40%), smoking cessation (n=4, 27%), treatment or medication adherence (n=2, 13%), and reduction in substance misuse (n=1, 7%). However, there were mixed results regarding feasibility, acceptability, and usability. Selected behavior change theories and expert consultation were used to develop the behavior change strategies of AI chatbots, including goal setting, monitoring, real-time reinforcement or feedback, and on-demand support. Real-time user-chatbot interaction data, such as user preferences and behavioral performance, were collected on the chatbot platform to identify ways of providing personalized services. The AI chatbots demonstrated potential for scalability by deployment through accessible devices and platforms (eg, smartphones and Facebook Messenger). The participants also reported that AI chatbots offered a nonjudgmental space for communicating sensitive information. However, the reported results need to be interpreted with caution because of the moderate to high risk of internal validity, insufficient description of AI techniques, and limitation for generalizability.
AI chatbots have demonstrated the efficacy of health behavior change interventions among large and diverse populations; however, future studies need to adopt robust randomized control trials to establish definitive conclusions.
Artificial intelligence (AI)–driven chatbots (AI chatbots) are conversational agents that mimic human interaction through written, oral, and visual forms of communication with a user [
AI chatbots demonstrate their potential for effective behavior change through key steps of data processing in health-related conversations: data input, data analysis, and data output. First, AI chatbots can collect data sets from diverse sources: electronic health records, unstructured clinical notes, real-time physiological data points using additional sensors (eye-movement tracking, facial recognition, movement tracking, and heartbeat), and user interactions [
In the past decade, evidence regarding the feasibility and efficacy of AI chatbots in delivering health care services has focused on different health contexts and technological perspectives, and most of these chatbots aim to improve mental health outcomes. Of the extant systematic reviews on AI chatbots, 6 articles targeted at
Given the merits of AI chatbots in health promotion, recent literature has paid increasing attention to the use of AI chatbots for health behavior changes. Oh et al [
The study protocol of this systematic literature review followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines [
The search was conducted using a combination of various keywords from 3 categories. The first category comprised keywords related to AI-based chatbot, including
Keywords were organized using the following approaches: (1) keywords within one category were lined using the OR operator (eg,
This review selected empirical studies on health behavior interventions applying AI-based chatbot techniques according to the following inclusion criteria: (1) intervention research focusing on health behaviors; (2) empirical studies using chatbots; (3) chatbots developed upon existing AI platforms (eg, IBM Watson Assistant [IBM Corp]) or AI algorithms, such as ML, deep learning, natural language understanding, and NLP; (4) studies reporting qualitative or quantitative results on interventions; and (5) English articles published from 1980 to 2022 (as of June 2, 2022). Articles were excluded if they were (1) not full-text empirical studies (eg, conference abstracts or proposals); (2) intervention studies with chatbots based on non-AI methods, such as the rule-based approach; (3) studies that did not clarify their AI algorithms; or (4) studies that focused only on mental health and not on health behaviors.
A total of 1961 articles were initially retrieved and screened based on these criteria. Finally, 15 articles met the inclusion criteria and were selected for this review (
Eligibility screening process. AI: artificial intelligence.
Several summary tables were used to extract information from the selected articles, including study characteristics (ie, author, publication year, study design, participants, age of the sample, sample size, country, and target health behaviors), chatbot-based intervention features (ie, chatbot types, chatbot components or functionality, settings, existing AI technology, input data sources, platform, theoretical foundation, and AI algorithms), and intervention outcomes (ie, health behavioral outcomes or primary outcomes, feasibility, usability, acceptability, and engagement).
Feasibility, acceptability, and usability did not have a consistent definition across the studies. Therefore, for the ease of comprehension and systematic representation, the authors categorized the data on feasibility, acceptability, and usability based on their definitions. Feasibility was defined as the
Quality assessment of selected studies was performed in accordance with the National Institutes of Health’s quality assessment tool for controlled intervention studies [
AI techniques specific to AI chatbot interventions were also appraised using the CONSORT-AI (Consolidated Standards of Reporting Trials–Artificial Intelligence) extension guidance for AI studies [
The characteristics of the reviewed studies are summarized in
Out of the 14 studies that reported the mean age of the participants, most had adult participants aged 18 to 30 years (n=2, 14%), 30 to 40 years (n=3, 21%), 40 to 50 years (n=5, 36%), 50 to 60 years (n=1, 7%), and >60 years (n=1, 7%), with only 2 (14%) studies having participants aged <18 years. The selected studies included participants with diverse preexisting conditions: individuals with lower physical exercise and healthy diet levels (4/15, 27%), smokers (4/15, 27%), patients with obesity (2/15, 13%), patients with breast cancer (1/15, 7%), patients with substance use disorder (1/15, 7%), the general population (2/15, 13%), and Medicare recipients (1/15, 7%). The target health behaviors of the reviewed studies included promotion of a healthy lifestyle (physical exercise and diet; 5/15, 33%), smoking cessation (4/15, 27%), treatment or medication adherence (3/15, 20%), and reducing problematic substance use (1/15, 7%). Only 27% (4/15) of studies used randomized control trials (RCTs), and most of the studies (9/15, 60%) adopted a quasiexperimental design (ie, pre- and posttests) with no control group, followed by 7% (1/15) of studies with a cross-sectional design and 7% (1/15) of studies with a postexperimental research method.
Characteristics of the reviewed studies (N=15).
Study | Study design | Participants | Average (SD) or median age (years) | Sample size | Country | Target health behaviors or purposes |
Piao et al [ |
RCTa | Office workers | 35 |
N=106 n=57 (intervention group) n=49 (control group) |
South Korea | Healthy lifestyle (physical activity) |
Maher et al [ |
Pre-post studyb | Australians who did not meet Australia’s physical activity guidelines and not follow a Mediterranean dietary pattern | 56.2 (SD 8) |
N=31 |
Australia | Healthy lifestyle (physical activity and healthy diet) |
Carrasco-Hernandez et al [ |
RCT | Smokers at an outpatient clinic | 49.655 |
N=240 n=120 (intervention: chatbot + pharmaceutical treatment) n=120 (control: pharmaceutical treatment) |
Spain | Smoking cessation |
Stephens et al [ |
Pre-post studyb | Youths with obesity symptoms at a children’s health care system | 15.20 |
N=23 |
The United States | Treatment adherence (obesity) |
Perski et al [ |
RCT | Smokers who purchased the Smoke Free app | N/Ac |
N=6111 n=1061 (intervention: chatbot + Smoke Free app) n=5050 (control: Smoke Free app) |
The United Kingdom | Smoking cessation |
Masaki et al [ |
Pre-post studyb | Adult smokers with nicotine dependence | 43.5 (SD 10.5) |
N=55 |
Japan | Smoking cessation |
Chaix et al [ |
Pre-post studyb | Patients with breast cancer | 48 |
N=958 |
France | Medication adherence |
Calvaresi et al [ |
Pre-post studyb | Smokers from Facebook communities | N/A |
N=270 |
Switzerland | Smoking cessation |
Galvão Gomes da Silva et al [ |
Qualitative study | Volunteers from School of Psychology’s pool | 23 |
N=20 |
The United Kingdom | Healthy lifestyle (physical activity) |
Stein and Brooks [ |
Pre-post studyb | Adults with overweight and obesity (BMI ≥25) | 46.9 (SD 1.89) |
N=70 |
The United States | Healthy lifestyle (weight loss, healthy dietary, physical activity, and healthy sleep duration) |
Crutzen et al [ |
Pre-post studyb | Adolescents interested in the intervention | 15 |
N=920 |
The Netherlands | Healthy lifestyle |
Brar Prayaga et al [ |
Cross-sectional study (poststudy) | Medicare recipients | Median 71 |
N=99,217 |
The United States | Medication adherence |
Prochaska et al [ |
Pre-post studyb | American adults screened positive for substance misuse | 36.8 (SD 10) |
N=101 |
The United States | Reducing problematic substance use |
To et al [ |
Quasiexperimental design without a control group | Individuals who were inactive (<20 min per day of moderate-to-vigorous physical activity) | 49.1 (SD 9.3) |
N=116 |
Australia | Healthy lifestyle (physical activity) |
Bickmore et al [ |
RCT (4-arm) | Individuals in precontemplation or contemplation stages of change with respect to moderate-or-greater intensity physical activity or consumption of fruits and vegetables | 33 (SD 12.6) |
N=122 |
NRd | Healthy lifestyle (physical activity and healthy diet) |
aRCT: randomized controlled trials.
bPre-post studies had no control group.
cN/A: not applicable.
dNR: not reported.
The results of the quality assessment are presented in
Among the 4 RCT studies, the study by Carrasco-Hernandez et al [
The AI component of the chatbots was evaluated to demonstrate AI’s impact on health outcomes (
Out of the 15 studies, 7 (47%) studies [
Second, 20% (3/15) of studies [
Out of the 15 studies, 4 (27%) studies [
Out of the 15 studies, only 1 (7%) study [
Out of the 15 studies, 3 (20%) studies [
Only one study conducted a qualitative analysis, that is, the study by Galvão Gomes da Silva et al [
The outcomes of the selected studies are reported in
Out of the 15 studies, 7 (47%) reported acceptability and engagement of AI chatbots in terms of (1) satisfaction and (2) provision of a nonjudgmental safe space. In the case of satisfaction, 7% (1/15) of studies reported that approximately one-quarter of the participants liked the messages [
Out of the 15 studies, 11 (73%) reported the usability of AI chatbots in terms of (1) ease of using the chatbot, (2) outside-office support, (3) usability of the content, and (4) technical difficulties. Overall, the ease of using chatbots was low to moderate. The ease of use was dependent on the participants’ smartphone skills, platform’s user interface, and cultural sensitivity in the chatbot’s design. One study reported that chatbots were used to offer outside-office support to the participants, demonstrating the potential of AI chatbots to offer sustainable and continuous support [
The chatbot intervention characteristics are summarized in
The habit formation model, which explains the relationship among cues, behaviors, and rewards, was used to develop the reminder system in Healthy Lifestyle Coaching Chatbot (HLCC). The Mohr’s
On the basis of the behavior change theories, the AI chatbots had multiple functionalities that contributed to efficacious outcomes. First, 53% (8/15) of studies targeted
Third, 53% (8/15) of studies offered
Fourth, 53% (8/15) of studies reported
Fifth, 27% (4/15) of studies provided
Sixth, 7% (1/15) of studies (CASC [
Most of the studies (10/15, 67%) deployed different AI techniques to deliver personalized interventions: NLP, ML, hybrid techniques (ML and NLP), Hybrid Health Recommender System, face-tracking technology, and procedural and epistemological knowledge–based algorithm. ML-driven emotional algorithms were used in Tess [
The chatbots used multimodal channels of communication with the users. All chatbots except NAO [
To deliver personalized services using AI chatbots, most chatbots or studies (9/15, 60%) required input data on the users’ background, goals, and behavioral performance and chatbots’ usability and evidence-based content. The users’ background information or baseline characteristics were collected by 4 AI chatbots. Paola [
The results of this review demonstrate the potential of AI chatbots to deliver efficacious, effective, and feasible health behavior interventions. However, the high risk of internal validity, lack of sufficient description of AI techniques, and lack of generalizability of the selected studies suggest the need for further research with robust methodologies to draw definitive conclusions. Regardless, the review identified practical and research implications of intervention strengths and limitations of the existing studies with potential future directions.
This review found that AI chatbots were efficacious in promoting healthy lifestyles, including physical exercise and diet (6/15, 40%), smoking cessation (4/15, 27%), treatment or medication adherence (2/15, 13%), and reduction in substance misuse (1/15, 7%). These findings are consistent with previous systematic reviews that reported the use of AI chatbots for improvement in physical activity levels and improvement in medication adherence [
The review found that AI chatbots reported mixed results in terms of feasibility, acceptability, and usability. In the case of feasibility, evidence on the safety of chatbots was quite less because only 7% (1/15) of studies reported safety [
The fundamental characteristics of the AI chatbots played a critical role in determining efficacious outcomes. First, the majority of the studies (9/15, 60%) used critically selected behavior change theories in the design and delivery of the AI chatbots. Our findings suggested that the integration of behavior change theories such as CBT, TTM, motivational interviewing, emotionally focused therapy, habit formation model, and Mohr’s Model of Supportive Accountability resulted in the delivery of consistent motivational support to users through goal setting, monitoring or tracking behaviors, and reinforcement. These strategies not only contributed toward better primary and secondary outcomes but also solved several challenges in the traditional face-to-face intervention models from users’ standpoint, such as limited connectivity with the expert, lack of consistent motivation, and lack of access to diverse information over time. Previous systematic reviews also reported that the use of CBT [
Second, in all studies,
The need for greater interactivity can also be associated with the fluctuations in user engagement found in 13% (2/15) of studies [
Third, in 20% (3/15) of studies, the humanistic yet nonhumanistic construct of AI chatbots provided a safe space for the users to discuss, share, and ask for information on sensitive issues [
Fourth, most studies (8/15, 53%) reported that the AI chatbots have a low threshold for integration into existing services yet a high reward. Most of the traditional behavioral interventions require in-person service delivery; however, this approach has several limitations from the implementor’s standpoint such as lack of consistent data collection, continuous monitoring, scalability, and sustainability of the intervention. AI chatbots have a low threshold for integration into these traditional services because they do not put a strain on existing resources such as experts, time, money, and effort. The chatbots can be freely deployed through daily use platforms and accessed at any time by the users. The use of chatbots can help integrate behavioral interventions into the daily clinical setting and avoid addition pressure faced by health care providers. For example, chatbots can independently offer low-intensity services such as information delivery to users. Furthermore, chatbots can offer provider-recommendation services, wherein, based on the analysis of real-time user data, the chatbots may offer suggestions to the health care providers to help them offer more effective services [
Most of the studies (10/15, 67%) had a large and diverse sample population, demonstrating the potential for scaling up chatbot-based interventions. Almost half of the studies had >200 participants, with 27% (4/15) of studies consisting of a sample size ranging from approximately 920 to 991,217 participants. Similarly, the selected studies not only included samples with diverse health and behavioral conditions (13/15, 87%), such as breast cancer, smoking, obesity, unhealthy eating patterns, lack of physical exercise, conditions requiring medication, substance misuse, but also samples with no preexisting conditions (2/15, 13%). This demonstrates the potential of AI chatbots to reach a large and diverse population in different settings. This is because AI chatbots have the potential to be integrated into extensively used existing platforms such as text SMS, Facebook Messenger, and WhatsApp and deployed through commonly used devices such as smartphones, computers, and Alexa, making it highly feasible to access a large and diverse population. This finding is consistent with the previous systematic reviews that reported the integration of AI chatbots into diverse platforms, such as Slack (Slack Technologies, LLC), Messenger, WhatsApp, and Telegram [
Almost 75% (11/15) of the articles were published in the years 2019 and 2021, indicating that the use of AI-driven chatbot interventions for behavior changes is at a nascent stage. Most studies (9/15, 60%) adopted a pre-post study design with no control group, with only 27% (4/15) of studies using RCT models, reinstating the immaturity in establishing causal connections between AI-based conversational agents and health behavior outcomes. This finding is aligned with many previous systematic reviews that reported that 4 of 9 studies were RCTs, remaining were quasiexperimental, feasibility, or pilot RCT studies [
The outcome of this review should be interpreted with caution because of the moderate to high risk of internal validity within the selected studies. In the included studies, the risk of outcomes from unintended sources was high owing to the lack of information on the measures to avoid the influence of other interventions and level of adherence to the intervention protocol. The risk of bias in the measurement of the outcomes was moderate to high owing to the lack of concealment of the assigned intervention from the evaluators and the lack of using validated and reliable outcome measures. The risk of bias in the analysis was moderate to high owing to high dropout rates, the lack of power calculation to estimate sample size, and the lack of information on the use of intent-to-treat analysis. These findings are consistent with many previous systematic reviews that reported moderate risk of outcomes from unintended sources owing to confounding in all quasiexperimental studies [
There was also inconsistency across studies in the measures of secondary outcomes, that is, feasibility, usability, acceptability, and engagement. This finding is consistent with most of the previous systematic reviews that reported mixed findings on secondary outcome measures [
In this review, most studies (14/15, 93%) did not describe the characteristics and handling of the input data, along with other processes related to the AI algorithm. This finding is consistent with the previous systematic literature review that reported inconsistent use of AI-software taxonomy and lack of depth of reported AI techniques and systems [
The selected studies were not representative of diverse geographies, cultures, and age groups, which exerted a strong bias on the generalizability of the studies. Out of the 13 studies that reported the geographical locations, all (100%) were conducted in the high-income countries; the majority of the studies (80%) were embedded in the Western culture, apart from the studies in South Korea and Japan; and most of the studies (>80%) were implemented with adults (≥18 years). These findings are consistent with the previous systematic literature reviews that reported that all the chatbot intervention studies were conducted in high-income countries [
To increase the generalizability of the efficacy and feasibility of AI chatbots, future studies need to test their use in low-income countries or low-resource settings and with children and adolescents. The increased mobile connectivity and internet use in low-income countries [
In this review, the evidence for patient safety was limited; however, the limited evidence stated that chatbots were safe for behavioral and mental health interventions. Only 7% (1/15) of studies, that is, the study by Maher et al [
This systematic literature review has several limitations. First, a meta-analysis was not conducted for the reviewed studies. Owing to heterogeneity in the research design, outcomes reported, and outcome measures, a meta-analysis was not perceived as feasible by the authors. Second, this review did not cover a comprehensive set of behavioral outcomes. The selected studies focused on only 3 behavioral outcomes: healthy lifestyle (physical activity and diet), smoking cessation, and treatment or medication adherence. However, this was also because the authors had adopted strict inclusion criteria for AI chatbots, and studies with rule-based chatbots were ruled out, restricting the number of behavioral outcomes covered. Third, the data matching for the tables was not quantified was not quantified; therefore, intercoder reliability was not reported. However, data extraction and quality assessment were conducted by 2 authors independently, followed by a discussion among the authors to finalize tables. Fourth, articles from outside selected databases (eg, Google Scholar), unpublished work and conference articles, gray literature (eg, government reports), and articles in other languages were not included. Fifth, intervention studies that did not provide a clear description of AI chatbots or did not label AI chatbots as a keyword were excluded.
This review provides an evaluation of AI chatbots as a medium for behavior change interventions. On the basis of the outcomes of the selected studies (N=15), AI chatbots were efficacious in promoting healthy lifestyles (physical activity and diet), smoking cessation, and treatment or medication adherence. However, the studies had mixed results in terms of the feasibility, acceptability, and usability of AI chatbots in diverse settings with diverse populations. The efficacious outcomes of AI-driven chatbot interventions can be attributed to the fundamental characteristics of an AI chatbot: (1) personalized services, (2) nonjudgmental safe space to converse, (3) easy integration into existing services, (4) engaging experience, and (5) scalability to a large and diverse population. However, the outcomes of this review need to be interpreted with caution because most of the included studies had a moderate risk of internal validity, given that the AI chatbot intervention domain is at a nascent stage. Future studies need to adopt robust RCTs and provide detailed descriptions of AI-related processes. Overall, AI chatbots have immense potential to be integrated into existing behavior change services owing to their (1) the ease of integration; (2) potential for affordability, accessibility, scalability, and sustainability; (3) delivery of services to vulnerable populations on sensitive issues in a nonstigmatic and engaging manner; and (4) the potential for consistent data collection to support health care providers’ decisions.
Search string.
Methodology assessment based on the National Institutes of Health’s quality assessment tool for controlled intervention studies.
Quality assessment of chatbot interventions based on CONSORT-AI (Consolidated Standards of Reporting Trials–Artificial Intelligence extension).
Outcomes of the reviewed articles.
Features of the chatbots in the reviewed studies.
artificial intelligence
CureApp Smoking Cessation
cognitive behavioral therapy
Consolidated Standards of Reporting Trials–Artificial Intelligence
Lark Health Coach
Healthy Lifestyle Coaching Chatbot
machine learning
natural language processing
Preferred Reporting Items for Systematic Reviews and Meta-Analyses
randomized control trial
Smoke Free app
transtheoretical model
XL and SQ conceived the research topics and questions. CCT and AA performed the literature search and screening. AA, CCT, and SQ performed the data extraction and analysis. AA and SQ developed the first draft. DW reviewed the paper and provided key feedback and edits. All the authors reviewed the final manuscript.
The research reported in this publication was supported by the National Institutes of Allergy and Infectious Diseases under award R01AI127203-5S1. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
The authors would also like to acknowledge the generous funding support from the University of South Carolina Big Data Health Science Center, a University of South Carolina excellence initiative program (grants BDHSC-2021-14 and BDHSC-2021-11).
None declared.