This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
The personalization of conversational agents with natural language user interfaces is seeing increasing use in health care applications, shaping the content, structure, or purpose of the dialogue between humans and conversational agents.
The goal of this systematic review was to understand the ways in which personalization has been used with conversational agents in health care and characterize the methods of its implementation.
We searched on PubMed, Embase, CINAHL, PsycInfo, and ACM Digital Library using a predefined search strategy. The studies were included if they: (1) were primary research studies that focused on consumers, caregivers, or health care professionals; (2) involved a conversational agent with an unconstrained natural language interface; (3) tested the system with human subjects; and (4) implemented personalization features.
The search found 1958 publications. After abstract and full-text screening, 13 studies were included in the review. Common examples of personalized content included feedback, daily health reports, alerts, warnings, and recommendations. The personalization features were implemented without a theoretical framework of customization and with limited evaluation of its impact. While conversational agents with personalization features were reported to improve user satisfaction, user engagement and dialogue quality, the role of personalization in improving health outcomes was not assessed directly.
Most of the studies in our review implemented the personalization features without theoretical or evidence-based support for them and did not leverage the recent developments in other domains of personalization. Future research could incorporate personalization as a distinct design factor with a more careful consideration of its impact on health outcomes and its implications on patient safety, privacy, and decision-making.
Recent advancements in natural language recognition and synthesis have resulted in the adoption of conversational agents (CAs) in many fields. CAs can be defined as systems that support conversational interaction with users by means of speech or other modalities [
One emerging area in which conversational technologies have been increasingly used is health care. A recent systematic review in this area examined technical performance, user experience, and health-related outcomes and found that most studies had not employed standardized evaluation methods or had failed to address aspects of patient safety [
Personalization is:
the process of making something suitable for the needs of a particular person [
When applied specifically to digital technologies, personalization can be defined as:
a process that changes the functionality, interface, information access and content, or distinctiveness of a system to increase its personal relevance to an individual or a category of individuals [
A recent interdisciplinary review study proposed a framework to characterize personalization along three dimensions: (1) what is personalized (ie, content, user interface, delivery channel, and functionality); (2) for whom is it personalized (either a specific individual or a user group, eg, elderly women); and (3) how automated is the personalization (how the information needed for user modelling is collected) [
One of the earliest applications of personalization in a conversational system was Grundy, a virtual librarian that delivered book recommendations [
Personalization in CAs can be achieved implicitly by processing past interactions with users [
Studies of personalization in health care and medicine have been increasing in number since the early 2000s [
A review of behavior change interventions characterized four intervention groups according to their degree of personalization in the messages delivered to individuals: generic (one-size-fits-all messages), personalized (messages with the person’s name), targeted (messages specific to a subgroup of the general population), or tailored (messages specific to an individual’s characteristics) [
Dialogue systems can offer fine-grained possibilities to personalize the information to be delivered:
on the basis of the inferred goals and beliefs of the user at a particular moment in time, and incorporating everything that has previously been said in the conversation [
Learning from a history of previous conversations plays a key role in ensuring the continuity of health communications that take place over multiple interactions over time [
Informed by the recent theoretical developments in personalization [
This review uses the search protocol of an earlier systematic review that was performed between April 2017 and February 2018, with a focus on technical performance, user experience, and the health-related outcomes of CAs in health care [
We searched in the PubMed, Embase, CINAHL, PsycInfo, and the ACM Digital Library databases, and did not restrict by the publication year or language. The search terms included “conversational agents”, “dialogue systems”, “relational agents” and “chatbots”. The complete search strategy is available in
The identified publications were included if they: (1) were primary research studies that focused on consumers, caregivers, or health care professionals; (2) involved a conversational agent; and (3) tested the system with human users. The studies were excluded if they involved: (1) user input by clicking or tapping an answer amongst a set of predefined choices, or by using the telephone keypad (eg, interactive voice response systems with dual tone multi frequency); (2) output not generated in response to what it received from the human user (eg, predefined and preprogrammed messages that are not dependent on the information obtained from or about the user); (3) question-answer type interactions; (4) asynchronous communication technology such as email; or (5) no personalization features. Furthermore, studies evaluating only individual components of a conversational agent, like automatic speech recognition, or using Wizard of Oz methods were excluded.
Screening was conducted independently by two researchers to extract data from each study. Cohen kappa was used to measure intercoder agreement between the researchers. Any disagreements between the assessments of two researchers were resolved by consensus agreement. To identify the relevant information, the researchers used the personalization definition presented in the introduction section. In addition, the following keywords were used as a guide to identify personalization-related information within the studies: personalizing, adapting, customizing, tailoring, configuring, individualizing, modifying, changing, altering, transforming, modelling, tuning, setting, preference, and profile. The data extraction process was guided by an assessment scheme based on the personalization framework offered by Fan and Poole [
An assessment scheme for personalization.
Assessment categories | Description | ||
|
How the user models needed by personalization are constructed. | ||
|
Implicit | Information needed for user models is obtained automatically through the analysis of observed user activities and interactions with the system (eg, analyzing users’ conversational history to determine the suitable times to send a reminder). | |
|
Explicit | Information needed for user models requires users’ active participation in obtaining the required information (eg, selecting the preferred times to receive a reminder). | |
|
For whom to personalize. | ||
|
Individuated | Personalization is targeted at a specific individual (eg, sending a reminder based on the unique profile of a single user). | |
|
Categorical | Personalization is targeted at a group of people (eg, sending a reminder based on a shared profile of a group of users). | |
|
What to personalize. | ||
|
Content | The information itself (eg, alerts or reminders). | |
|
User interface | How the information is presented (eg, using larger font sizes for elderly users or shortening prompts for experienced users). | |
|
Delivery channel | The media through which information is delivered (eg, sending a reminder as a text message instead of a voice message). | |
|
Functionality | What users can do with the system (eg, making different system functionalities available for patients and carers). | |
Purpose | The purpose of personalization (eg, increasing user engagement or motivation). | ||
Evaluation | The methods to evaluate personalization (eg, using interview questions or standardized questionnaires). | ||
Outcomes | The outcomes in relation to personalization (eg, increased user engagement or motivation). |
aAdapted from Fan and Poole [
The first search found 1513 papers, and the updated search found an additional 445 papers (
Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram.
Information needed for personalization was provided explicitly by the users in seven studies [
Personalization features of conversational agents in the included studies.
Conversational agent (author, year) | CAa purpose | Automation (the basis for personalization) | Target (for whom to personalize) | What to personalize | |
Content | User interface | ||||
Tess (Fulmer et al, 2018) [ |
Delivery of cognitive behavioral therapy to reduce symptoms of depression and anxiety in college students |
Explicit: Expressed emotions and mental health concerns of participants to provide personalized responses. Users' feedback and reported mood used to tailor interventions |
Individuated |
Personalized conversations based on emotions and mental health concerns Personalized therapeutic choices based on user feedback |
NRb |
Wysa (Inkster et al, 2018) [ |
Wellbeing support app for users with symptoms of depression, aiming to build mental resilience and promote mental wellbeing |
Explicit: User responses to built-in assessment questionnaire and emotions expressed in a written conversation |
Individuated |
Personalized conversational pathways based on a user’s interaction, messages, and context |
NR |
Reflection Companion (Kocielnik et al, 2018) [ |
Support reflection on personal physical activity data from fitness trackers |
Explicit: Users enter their behavior change goals and demographic data Implicit: Observed physical activity of the user |
Individuated |
Dialogues to encourage reflection Incorporating user goals into adaptive mini-dialogues Follow-up questions based on users’ earlier responses Visualization of past physical activity |
NR |
Relational Agent (Sillice et al, 2018) [ |
Promote regular exercising and sun protection |
Explicit: Users provide their demographic information, exercising habits, sun protection behaviors and lifestyle goals Implicit: CA tracks user progress to send reminders if needed |
Individuated |
Acknowledgement of difficulties and tailored strategies to overcome these Feedback on progress and encouragement for achieving goals A weekly tracking chart to help participants monitor their exercise and sun protection behaviors Email reminders to support retention |
NR |
Woebot (Fitzpatrick et al, 2017) [ |
Deliver cognitive-behavioral therapy for anxiety and depression to college students |
Explicit: Users enter their mood and goals |
Individuated |
Empathic responses tailored to the reported mood Tailoring of support content depending on the reported mood Daily prompting messages to initiate a conversation Weekly charts depicting the reported mood and textual summary |
NR |
Social Skills Trainer (Tanaka et al, 2017) [ |
Social skills training for people with autism spectrum disorders |
Implicit: CA analyzes the user's audio-visual features, facial expression (smile), and head position to determine its feedback and then performs feature selection |
Individuated |
Personalized score showing similarity to a role model with respect to 10 features Encouraging comments to reinforce motivation, based on features closest to the model Comments on the points that need improvement, based on features dissimilar to the model Homework challenges for participants to complete on their own time throughout the week |
NR |
mASMAAc (Rhee et al, 2014) [ |
Facilitate asthma symptom monitoring, treatment adherence, and adolescent-parent partnership |
Explicit: Users enter symptoms, activity level, and use of rescue and control medications |
Individuated |
Automated inquiries and reminders sent according to user-defined preferences on monitoring symptoms and managing medications and activity Processing of and responses to user-initiated messages at any time Daily report summarizing symptoms, activity, and use emailed to parents |
NR |
Chris (Hudlicka, 2013) [ |
Embodied CA that provides mindfulness training and coaching |
Explicit: Users answer questions asked by the CA and set preferences via multiple-choice questions |
Individuated |
CA’s facial expressions and its responses adapting to the users’ learning needs and motivational state CA's affective reaction adapting to the users' utterances Conversational expressions communicating mental state Customized advice about meditation practice, based on the expressed concerns |
Using didactic, relational, or motivational conversational styles according to the user models |
DI@l-log (Harper et al, 2008; Black et al, 2005) [ |
Voice logbook to document home monitored data by diabetes patients |
Explicit: Users provide weight, blood sugar and blood pressure values |
Individuated |
An alert feature generating a verbal warning if readings are too high Personalized feedback to patients on their current progress |
NR |
Pain Monitoring Voice Diary (Levin and Levin, 2006) [ |
Real-time collection of information from patients for health, behavioral, and lifestyle studies and monitoring |
Explicit: Users answer a series of questions about their pain (location, type, intensity, etc) Implicit: CA utilizes previous sessions to provide personalized content and conversational style |
Individuated Categorical (novice and experienced users) |
Content (what data is collected) and style (how it is collected) of the reporting session Adaptive question-asking (additional questions for follow-ups to sessions with high levels of pain) Adaptive interruptions to better support experienced users |
Adaptive conversational style (eg, shorter question formats for follow-up sessions) |
Intelligent dialogue system (Giorgino et al, 2004; Azzini et al, 2003) [ |
Home care and data acquisition from hypertension patients |
Explicit: Users answer questions about heart rate, pressure, weight, compliance, and more Implicit: CA changes its behavior depending on the progress of the current call and the clinical history of the caller |
Individuated |
The questions to be asked were determined by user profiles Gives advice on recommended health behavior and next visits Issues alerts and prompts |
NR |
aCA: conversational agent.
bNR: not reported.
cmASMAA: mobile phone-based asthma self-management aid.
Personalization purpose, evaluation, and outcomes in the included studies.
Conversational agent (author, year) | Personalization | ||
Purpose | Evaluation | Outcomes | |
Tess (Fulmer et al, 2018) [ |
To improve depression and anxiety symptoms To provide more engaging and convenient user experience To provide appropriate response and strategies based on the users’ reported emotion and health concerns |
Questionnaires to measure depression (PHQ-9a) [ Custom-built user satisfaction questionnaire Number of messages to measure user engagement |
Significantly lower depression ( 86% (43/50) of participants satisfied with CAe (sm) Comparable levels of daily engagement (bmf) |
Wysa (Inkster et al, 2018) [ |
To develop positive self-expression and create a responsive self-reflection environment To encourage users to build emotional resilience skills |
Questionnaire to measure depression (PHQ-9) Thematic analysis of the responses to the in-app feedback questions User engagement through analysis of raised objections and thematic analysis of in-app feedback |
Significant reduction in depression scores in both high ( 67% (191/282) of users reporting on positive app experience (sm) More than 99% (6555/6611) of detected objections were correct (bm) |
Reflection Companion (Kocielnik et al, 2018) [ |
To trigger deeper reflection, which would increase motivation, empowerment, and adoption of new behavior To provide engaging, novel, and diverse conversations around reflection |
Questionnaires to measure health awareness [ Willingness to use the system, number, and length of responses as measures of engagement Responses to mini-dialogues Semi-structured post-study interviews |
Significant increases in habitual action ( Prolonged use of CA (additional two weeks) by half of the participants (16/33) with an avg of 98.4-character response length in this period (bm) High response rates: 96% (443/462) of initial and 90% (386/429) of follow-up questions (bm) Mini-dialogues successfully supporting discussions on awareness related to goal accomplishment, self-tracking data, and trends in behaviour (nqi, sm) Interviews indicating an increase in awareness, mindfulness, and motivation; understanding of alternatives and actions; and newly discovered insights (sm) |
Relational Agent (Sillice et al, 2018) [ |
To increase user engagement and promote more effective behavior change To monitor exercise and sun protection behavior To provide strategies to overcome the reported barriers |
Interviews to assess user experience and a 10-point Likert scale to measure satisfaction with interventions |
The levels of satisfaction ranged between 7 and 10 on a scale of 1 to 10 (sm) Most participants reporting on: (1) positive interactions with the CA (32/34; 94%); (2) tailored feedback supporting regular exercising and sun protection behaviors (29/34; 85%); and (3) email reminders helping to remain on track with the program (23/34; 68%; sm) |
Woebot (Fitzpatrick et al, 2017) [ |
To engage individuals with CA through managing conversation tailored to the reported mood |
Questionnaires to measure depression (PHQ-9), anxiety (GAD-7), and affect (PANAS) Custom-built questionnaire to measure user satisfaction, emotional awareness, learning, and relevancy of content |
Significant reduction in depression symptoms ( Significantly high level of overall satisfaction ( |
Social Skills Trainer (Tanaka et al, 2017) [ |
To provide personalized feedback aimed at improving narrative social skills |
Experienced human social skills trainer assessed the participants' narrative skills |
Improvements in the overall narrative and social skills (Study 1, |
mASMAAj (Rhee et al, 2014) [ |
To make the system more appealing and elicit greater and longer interest in and use of the system |
Six routine asthma-diary questions Focus group interviews to evaluate user experience with CA |
Improved self-management, treatment adherence, accessibility of advice, awareness of symptoms, and sense of control (nq, sm) CA was found to be easy-to-use, convenient, and appealing (nq, sm) |
Chris (Hudlicka, 2013) [ |
To deepen the relationship with the user To support pedagogical strategies necessary for effective training of mindfulness meditation To provide the coaching required to initiate and maintain regular practice To provide interactions for maintaining motivation via empathic dialogue and customized advice |
Custom-built questionnaires to assess the overall experience, meditation frequency, knowledge of mindfulness, sense of self-efficacy, and stages of change within the transtheoretical model of change |
Improved outcomes with CA group compared to a self-administered program: (1) more frequent and longer mindfulness training sessions ( Neutral to mildly positive feedback on CA's ability to provide customized feedback (0.3 on a –2 to +2 Likert scale; sm) |
DI@l-log (Harper et al, 2008; Black et al, 2005) [ |
To provide personalized feedback on the patient's health status and increase their engagement |
Task completion rate and time Number of personalized alerts Qualitative interviews |
92.2% (190/206) successfully completed calls, shortening calls over time, and effective alerts leading to 12 therapeutic interventions (bm) [ 90.4% (38/42) successfully completed calls, users’ appreciation of the personalization and reports on empowerment, peace-of-mind, and sense of care (bm, sm) [ |
Pain Monitoring Voice Diary (Levin and Levin, 2006) [ |
To shorten the dialogue sessions To provide the users a feeling of continuity To have flexible and adaptive support for different types of users |
Session length, completion rate, and turn duration Ratio of prompt interruptions by users |
97% (171/177) of sessions completed with 98% (849/859) input accuracy (bm) Shortening dialogues over time (avg 1.2 seconds over 7 sessions; bm) More prompt-interruptions by the experienced users (73% of the prompts) compared to the novice users (59% of the prompts; bm) |
Intelligent dialogue system (Giorgino et al, 2004; Azzini et al, 2003) [ |
To improve the quality of system dialogues To increase patient compliance with guidelines |
Reliability and recognition error rate Time spent in learning to use the system |
Recognition rate up to 41%-81% (bm) Dialogue time of 3.3-5.9 minutes, with 80% (74/93) of the expert users’ dialogues achieving conclusion (bm) |
aPHQ-9: Patient Health Questionnaire 9-item scale.
bGAD-7: Generalized Anxiety Disorder 7-item scale.
cPANAS: positive and negative affect schedule 20-item scale.
dsm: self-reported measure.
eCA: conversational agent.
fbm: behavioral measure.
gFMI: Freiburg Mindfulness Inventory.
hRQ; Reflection Questionnaire.
inq: not quantified.
jmASMAA: mobile phone-based asthma self-management aid.
Personalization was primarily used for tailoring the content to be delivered. Personalized content included: (1) feedback on mood states [
Two studies personalized the user interface through changing conversational styles according to users’ motivation state, users’ level of expertise with the system, and dialogue history [
The purposes of providing personalized content and conversations were to: (1) improve user engagement [
Only two studies directly assessed users’ perceptions of personalization via custom-built questionnaires with questions on adaptive features [
The use of CAs with unconstrained language input in health care is still limited, but there has been a notable increase in the number of studies in recent years. Almost half of the papers included in this study were published in the last two years. While most studies used quasi-experimental study designs, only two used randomized controlled trials [
While personalization of content to be delivered was common across all the studies, personalization of conversational style was implemented by only two studies [
Only two studies evaluated personalization as a distinct factor [
The implications of different implementations of personalization were not addressed by any studies. For example, a recent research paper drew attention to the limitations of implicit and explicit personalization [
Using CAs with unconstrained natural language input can be risky [
Overall, most of the reviewed papers did not focus explicitly on personalization. Little attention was generally paid to the complexities associated with implementing personalization features and measuring their effects.
In line with our study, a recent scoping review of psychology-focused embodied conversational agents reported that only a few studies employed user models to personalize user-system interactions [
Our results are based on the presence of personalization features of health care CAs in the studies that do not necessarily have an explicit focus on personalization. Therefore, the results are limited by the extent to which the included studies reported on their personalization features. In addition, our review focused on CAs using unconstrained natural language input. Therefore, the results may not be extended to agents using constrained natural language input (eg, multiple-choice of utterance options). Since the conversational systems used in the reviewed studies involved multiple components, the reported outcomes were attributable to the systems rather than only the personalization features. Our paper recommended using a theoretical framework of personalization to support a more systematic treatment of personalization features. However, it may be possible to implement personalization features effectively with no theoretical support. Moreover, other theories not specific to personalization may prove useful for personalization purposes, such as the Theory of Planned Behavior [
Future research can focus on incorporating a theoretical framework [
The use of personalization in health care CAs with unconstrained natural language interfaces has been limited and is not evidence based. While the CAs with personalization features were reported to improve user satisfaction, user engagement, and dialogue quality, little evaluation was performed to measure the extent of personalization and its role in improving health outcomes. Future research in health care CAs could evaluate the impact of personalization on health outcomes and its potential implications on privacy, safety, and decision-making.
The search strategy.
The list of excluded articles.
Conversational Agent
Freiburg Mindfulness Inventory
Generalized Anxiety Disorder
Not reported
Positive and Negative Affect Scale
Patient Health Questionnaire
Reflection Questionnaire
This research was supported by the National Health and Medical Research Council (NHMRC) grant APP1134919 (Centre for Research Excellence in Digital Health). We would like to thank Catalin Tufanaru for his comments on the earlier drafts of this paper.
This study was designed by ABK, JCQ, LL, DR, and EC. Search strategy was employed by ABK, LL, and HLT. Screening was performed by ABK, LL, and HLT. Data extraction was performed by ABK, SB, JCQ, LL, DR, HLT, and AB. First draft was written by ABK. Revisions and subsequent drafts were completed by ABK, SB, JCQ, LL, HLT, DR, AB, and EC.
None declared.