Methodologies Used to Study the Feasibility, Usability, Efficacy, and Effectiveness of Social Robots For Elderly Adults: Scoping Review

Background New research fields to design social robots for older people are emerging. By providing support with communication and social interaction, these robots aim to increase quality of life. Because of the decline in functioning due to cognitive impairment in older people, social robots are regarded as promising, especially for people with dementia. Although study outcomes are hopeful, the quality of studies on the effectiveness of social robots for the elderly is still low due to many methodological limitations. Objective We aimed to review the methodologies used thus far in studies evaluating the feasibility, usability, efficacy, and effectiveness of social robots in clinical and social settings for elderly people, including persons with dementia. Methods Dedicated search strings were developed. Searches in MEDLINE (PubMed), Web of Science, PsycInfo, and CINAHL were performed on August 13, 2020. Results In the 33 included papers, 23 different social robots were investigated for their feasibility, usability, efficacy, and effectiveness. A total of 8 (24.2%) studies included elderly persons in the community, 9 (27.3%) included long-term care facility residents, and 16 (48.5%) included people with dementia. Most of the studies had a single aim, of which 7 (21.2%) focused on efficacy and 7 (21.2%) focused on effectiveness. Moreover, forms of randomized controlled trials were the most applied designs. Feasibility and usability were often studied together in mixed methods or experimental designs and were most often studied in individual interventions. Feasibility was often assessed with the Unified Theory of the Acceptance and Use of Technology model. Efficacy and effectiveness studies used a range of psychosocial and cognitive outcome measures. However, the included studies failed to find significant improvements in quality of life, depression, and cognition. Conclusions This study identified several shortcomings in methodologies used to evaluate social robots, resulting in ambivalent study findings. To improve the quality of these types of studies, efficacy/effectiveness studies will benefit from appropriate randomized controlled trial designs with large sample sizes and individual intervention sessions. Experimental designs might work best for feasibility and usability studies. For each of the 3 goals (efficacy/effectiveness, feasibility, and usability) we also recommend a mixed method of data collection. Multiple interaction sessions running for at least 1 month might aid researchers in drawing significant results and prove the real long-term impact of social robots.


Introduction
In the next few decades, we expect the global population to age due to a combination of high life expectancy, low birth rates, and the baby boomer generation entering their senior years. By 2030, 33% of the population in Western Europe will be over 60 years of age [1]. Dementia is one of the most common neurodegenerative diseases that affects 50 million older people around the world, and it is projected to reach 155 million in 2050 [2].
Dementia is characterized by deterioration in memory, cognition, behavior, and ability to perform everyday activities [3]. It is estimated that approximately one-third of people with dementia live alone [4]. They experience unmet needs because of living alone and are at a higher risk of faster deterioration. In addition, people with dementia who live alone are considered at a higher risk of medication use problems, falls and injuries, inadequate self-care, trouble with activities of daily living, and reduced social networks [5][6][7][8].
In the past decades, technological advances coincided with the great health challenge of aging societies [9]. New research fields in assistive technology are dedicated to designing social robots for older adults with or without cognitive impairment to promote their quality of life (QoL) through communication and social interactions [10]. Social robots are intended to provide and facilitate social contact, psychosocial and cognitive stimulation, and have the potential to support elderly people to maintain their autonomy and independence and enhance their well-being [11].
Socially assistive robots (SARs) can be grouped into 2 main categories based on their feature and function: (1) service robots, and (2) companion robots [12]. The main task of these robots is to establish any form of interaction and communication. This function can be performed by SARs in multiple manners, such as through touch sensors, cameras, (robotic) body movements, tablet interfaces, and sound and speech systems. Within the subgroup of the companion robots, humanoid robots like Pepper and Nao provide users with advanced applications that provide leisure activities (music, photos, and games), cognitive and physical stimulation activities, and assistance with mental or physical tasks. Pet robots, such as PARO, AIBO, and NeCoro as substitutes for pets and companion animals are intended to provide emotional and physiological stimulation, have calming effects, and lead to mood improvements [13].
For the successful implementation and large-scale uptake of social robots or any other psychosocial intervention, their feasibility, usability, and cost-effectiveness should be perceived as good by the end users (people with dementia and healthy older adults), clinicians, and other stakeholders (eg, health care insurers and policy makers). The Monitoring and Evaluating Digital Health Interventions framework recommends evaluating 4 factors to integrate and implement a digital health intervention: (1) feasibility, to assesses whether the digital health system works as intended in a given context; (2) usability, to assess whether the digital health system can be used as intended by users; (3) efficacy, to assess whether the digital health intervention can achieve the intended results in a research (controlled) setting, and (4) effectiveness, to assess whether the digital health intervention can achieve the intended results in a nonresearch (uncontrolled) setting [14].
Despite the rising interest in social robots after the COVID-19 pandemic, there is limited evidence on the effectiveness of social robots in elderly care. The methodological quality of studies on the effectiveness of social robots in elderly adults is still low, and inappropriate study designs, samples, form, duration of interventions, and data collection methods have affected the strength of study outcomes [12].
Currently, there is no state-of-the-art proof of concept for study designs to evaluate the use of social robots for elderly people. Since the degenerative nature of dementia can cause methodological challenges, specific attention should be paid to studies that include people with dementia. To determine what the appropriate research methods are to study feasibility, usability, efficacy, and effectiveness, this article aims to review the methodologies used thus far in studies with social robots in clinical and social settings with elderly people to pave the way for future researchers in this field.

Protocol Registration
The protocol of this scoping review was registered in Open Science Framework (OSF) [15].

Selection Criteria
Publications potentially eligible for inclusion had to study a social robot that was physically embedded in an experimental or clinical study in people aged 65 or above. Studies were excluded if they were (1) conducted in an acute care setting; (2) conference abstracts, case studies, dissertations, books, or review papers; (3) published in a language other than English or Spanish; (4) solely reporting a robot design, development process, or theoretical models (5) stakeholder opinions on robots without any interaction; (6) involved in the implementation of new hardware or software or an assessment tool on a robot (such as assessing a fall detection sensor); and (7) involving telepresence robots with only video call features.

Selection Procedure
After the removal of duplicates, 2 reviewers (authors AM and MM) independently applied the inclusion and exclusion criteria in 3 steps, starting with screening titles, abstracts, and then full texts. The selections were compared, and in case of disagreement, discussed by the 2 reviewers. In cases where no consensus could be reached, a third reviewer was consulted (author HR).

Data Extraction
The literature was mapped according to the following areas of interest: author and country, robot name, the aim of the robot, aim of the study, type of outcome measure, study design, study sample, study setting, methodology of data collection, interaction scenario, relevant outcome measures, measurement instruments, results, and reported limitations of the study. Information was synthesized descriptively, and findings were narratively summarized according to the areas of interest.
The quality of the included articles was appraised independently by 2 authors (AM and MM), through the quality assessment of digital health interventions within the Monitoring and Evaluating Digital Health Interventions framework established by the World Health Organization (WHO) [14]. The tool includes a list of methodological study criteria comprising (1) 23 essential criteria for all types of studies and (2) essential criteria for qualitative and quantitative studies (3 criteria each). The extent to which criteria were met by studies was rated by 3 independent researchers on a 3-point scale (0= poor, 1= fair, 2= good). We calculated the percentage of agreement between the ratings (Multimedia Appendix 1). We also applied this framework to categorize the studies according to their aims in 4 categories: (1) feasibility, (2) usability, (3) efficacy, and (4) effectiveness.

General Findings
The search resulted in a total of 723 individual publications. After the screening process, 33 articles met the selection criteria ( Figure 1). In 33 papers [11,, 23 different social robots were evaluated among elderly adults and people with dementia in 13 different countries. Moreover, 19 studies specifically evaluated either feasibility, usability, efficacy, or effectiveness and were considered as single aim studies. The remaining studies (n=14) had multiple aims, evaluating 2 or 3 of the aforementioned study aims. Overall, feasibility was studied in 17 (51.5%) studies, usability in 13 (39.3%), effectiveness in 12 (36.3%), and efficacy in 10 (30.3%).
The quality appraisal identified that primary and secondary outcomes were clearly defined in all studies. Additionally, the methods of data collection were described well, but the eligibility of the participants was not reported in 12 (36.4%) papers. Moreover, 12 out of 33 (36.4%) papers did not present a clear description of the study design. Multimedia Appendices 2 and 3 show a summary of the characteristics, methodologies, and outcomes of the included studies.

Study Aims and Settings
Only 5 (27.8%) of the 18 studies aiming to evaluate feasibility and/or usability were performed in nursing home settings; 5 (27.8%) were performed in laboratory settings, and the remaining 8 (44.4%) were performed in private households. In 7 (38.9%) of the 18 studies, people with dementia and those with cognitive impairment were included. In the remaining 11 (61.1%) studies, the cognitive status of the participants was not clearly indicated.
Studies exploring usability applied the System Usability Scale [22,25,28], a modification of the Usefulness, Satisfaction, and Ease of Use [26] scale, and the Technology Usage Inventory [39]. Two (6.1%) qualitative studies performed conversation and video analysis [27,28] to extract statements on acceptability and usability.
Efficacy and effectiveness outcomes were evaluated by a wide range of neuropsychosocial measurement instruments: (1) [40], and Assessments of Communication and Interaction Skills [64].
Among the studies applying questionnaires, 5 (19.2%) indirectly collected data via care staff and informal caregivers, and 13 (50%) directly collected data via the researchers.
The reported usability (n=12, 36.4%) was overall positive, except in 2 (16.7%) studies in which the usability was negatively affected by technical issues or lack of robustness of the robots [28,37]. Only 3 (9.1%) studies assessed affordability for Hobbit and Nao, in which the participants did not consider the social robots affordable and were skeptical of buying them [25,28,37].

Reported Study Limitations
Of the 33 studies, 7 (21.2%) did not report any study limitations [16,17,23,25,30,42,46]. A wide range of limitations was reported, and the most common barrier considered in 17 (51.5%) studies was the small sample size, which was mostly reported for efficacy and effectiveness studies. In the feasibility and/or usability studies, the limitations were mainly attributed to technical problems or interaction difficulties. The use of unvalidated questionnaires, homogeneity in the sex of the study sample, inadequate observation, and short duration of interventions were reported as other limitations in general.

Quality Appraisal
The inter-rater agreement for the quality appraisal was 86.1%. Reports of the description of study design, bias, and enrollment procedure were mostly rated as "fair." In most of the articles, the sampling methods, confounding factors, missing data in quantitative studies, and reflexivity of data interpretation in qualitative studies were poorly reported. Other criteria were mostly rated as "good" (Multimedia Appendix 1). The quality appraisal revealed unclear descriptions or insufficient details in 5 (15.2%) studies, especially those in disciplines other than health research [25,30,42,44,45].

Principal Findings
The results of this scoping review revealed a variety of applied study methods in studies with social robots concerning study design, sample size, study setting, method of data collection, interaction scenario (the sequence, duration, and setting of the intervention), outcome measures, measurement instruments, study results, and reported limitations. Feasibility and usability were mainly studied on preprototype social robots in laboratory settings. Considering the relatively short history of the use of social robots in psychosocial interventions, it is crucial to determine the main features and functions of the robots to be considered in the design and development phase. Hence, usability, feasibility, and implementation should be strategic research aims. Fully developed robots such as PARO were evaluated in terms of effectiveness in real-world settings. Most of the identified studies aimed to determine the neuropsychosocial impact of social robots on older adults.
For the studies that explicitly fall within a feasibility and/or usability evaluation, researchers applied experimental, mixed method, and field trial designs, mostly applied outside nursing home care settings. This might imply that feasibility and/or usability for persons that are more severely cognitively impaired are currently understudied. Most of the studies verified the acceptability and usability of the robots within single or multiple interactive sessions in individual or group settings, and all these studies reported positive outcomes in varying degrees on the feasibility and/or usability of the social robots. The quantitative and qualitative data were collected mostly through questionnaires and interviews and a few by direct observation. Regarding this point, researchers should consider using the direct observational methodology to capture main factors of the interaction and emotional relationships fostered by robot use. Within the questionnaires and interview questions based on the UTAUT model, some concepts such as trust, anxiety, perceived enjoyment, and social support can change over time [37,46,47]. Therefore, longer use of the robots might reveal these changes and reduce the novelty effect over time [46].
Overall, efficacy and effectiveness studies were conducted on study populations either with cognitive impairment or residing in long-term care facilities. The studies with significant results [17,24,[29][30][31][34][35][36]45] mostly employed experimental designs including RCTs and quasi-experimental designs with larger sample sizes and longer intervention periods compared to studies showing slight or no improvements. RCTs are likely to be the most appropriate design and a gold standard to confidently demonstrate that a specific intervention has resulted in a change in a process or a health outcome [14]. Biased assessment of outcomes and any confounding effects can be avoidable by large-scale RCTs. However, due to the degenerative character of dementia and personal differences in capacities of people with dementia, difficulties in randomizing subjects often arise [14]. Additionally, when using long study periods, the dropout rate might be high, as participants' cognitive deterioration can hinder their continued participation in the study. On the other hand, when it is not feasible or ethical to conduct an RCT, a quasi-experimental design may serve best for collecting quantitative data. We also recommend randomized controlled block designs in the case of heterogeneous study samples. When, for instance, including people with dementia in studies with long intervention periods, the dementia levels alter. With a randomized controlled block design, some variables in different blocks can be controlled for, or comparable approaches can be applied within the blocks.
Studies targeting the efficacy and effectiveness of the robots delivered interventions diverged in format, duration, dosage, and location. Two (6.1%) studies [32,38] highlighted a need for individual intervention sessions tailored to users' preferences and capacities, and the findings of another study confirmed this approach [65]. Additionally, individualized sessions could omit confounding factors by reducing the chance of interactions with the facilitator or other participants [66]. Group interventions may lower the odds of interaction between potential users and the robot, potentially lowering the effect of the intervention, especially when the intervention is delivered in a noisy setting with many participants [18]. The issues of small sample sizes and short interactions were considered major limitations in studies that failed to find significant results, and they are considered the toughest challenges for researchers in this field [66,67]. These limitations are often cojoined with the study setting. In nursing homes, a larger number of residents are often enrolled in a clinical trial, and the robots are not personalized but must be shared by the entire group. Whereas in private homes, studies are conducted with individuals or dyads, which creates better possibilities for personalization of the robot. Overall, the personalization of the intervention and alleviation of loneliness are 2 advantages of home-based clinical trials. However, there are some challenges to these types of studies, such as the need for several robots, implementation difficulty, and personalization, but it is nevertheless a step in the right direction. We observe a paradox, in that new or experimental robots are employed in research with low numbers of participants, whereas commercially available robots are tested on large study samples. Commercialization allows for better testing and evaluation, which is logical from an economical perspective. However, we urge that before robots are marketed, developers should study the feasibility and usability appropriately in the target group, as well as the effectiveness in a substantial study sample with sufficient power. After bringing the robot to the market, producers should continue to invest in studies to improve their product to tailor it optimally to their users. This should be a joined mission of producers and policy makers to improve the sustainability of health care systems.
Apart from the aforementioned limitations of the studies, some weak aspects of the study designs led to failure of the social robots' impacts. For instance, a mismatch between the studied construct and the main aim of the robot may lead to the poor conclusion that the robot is not efficient. An example of this is the studies on PARO that failed to demonstrate significant results for cognition, as PARO is not developed to stimulate cognitive functioning [31,33,41,43]. Additionally, to capture significant results in constructs such as cognition, QoL, and depression, a long intervention period is necessary because these are constructs that do not change very quickly. In studies with people with dementia, it might also be useful to study stability of these constructs instead of improvement, since it is inherent to the disease that these constructs deteriorate over time.
Regarding the broad concept of QoL ranging from physical health to psychological state and social relationships, the application of a suitable QoL measurement instrument that corresponds to the robot's aim should be taken into account.

Implications For Efficacy and Effectiveness Studies
Appropriate RCTs with large sample sizes and individual interaction sessions running for longer than 1 month would serve best for such studies to draw relatively robust and reliable results.

Implications For Feasibility and Usability Studies
The study methods are similar for both aims, so researchers could apply the same design. Experimental designs with mixed methods of data collection are recommended for these studies. Multiple interaction sessions might reveal the changes in feasibility and usability.

Implication For Studies With Multiple Aims
We recommend performing separate studies for multiple aims since the study designs for each aim are comparable.
We gathered further practical recommendations through which future work may address existing shortcomings (Table 1). Regarding the mixed methods of data collection, studies suggest a combination of qualitative and quantitative methods for data collection, which will enable the researcher to capture different details in users' responses and address different aspects of the research question. A mixed methods approach was helpful in studies that could not derive positive results from quantitative data but did from qualitative data [16]. Regarding the difficulties of recruiting many users in case of availability of just a few robots, these mixed methods should be mandatory. Even though we did not find any negative results regarding the intervention dosage, there are shreds of evidence of highly intense intervention resulting in negative effects or exhaustion [18]. Hence, the dose response for specific measures remain an open question for future researchers.

Limitations and Strengths
Although the use of social robots is promising to support people with dementia, we did not include dementia specifically in the search strings, since this scoping review focused on elderly people in general. However, we believe that our search captured most of the studies executed in the field of dementia because many of the identified studies included people with dementia in either mixed or specific study samples. However, some relevant studies on elderly people with dementia may be missing in this review, as well as may studies that are only traceable in databases that were not taken into account in this review. The searches were conducted in scientific databases deemed the most viable to retrieve valid and reliable results for this scoping review. The exclusion of studies focusing only on the development phase of social robots can be considered a limitation of this study. Some information on the evaluation of the feasibility and usability executed in the development stage might have been missed. In addition, studies on telepresence robots were excluded due to their relatively simple features. Compared to pet robots and humanoid robots, telepresence robots are limited in interactions with users, which occur merely through a touch screen, making use of visual and audio stimuli but omitting other sensory stimulation. Although mainly used for medical visits, some telepresence robots might support social functioning. Information on studies performed on these robots might have been missed.
Our study is the first scoping review on the methodologies for studying social robots in elderly people and people with dementia. The existing reviews on this topic mostly focus on design, use, effectiveness, facilitators, and barriers to the implementation of social robots [12,66,67,[68][69][70][71][72][73]. This study might support future researchers to design a research study on social robots in elderly adults and answer some study design queries.

Conclusions
This review narratively synthesizes information on the methodology of studying social robots in elderly adults and people with dementia. Relevant recommendations were formulated, directed for studies with specific aims that may aid future researchers in developing adequate study designs to evaluate social robots, allowing for more reliable information on study outcomes. Our research leads us to the conclusion that more studies with large sample sizes are needed on the effectiveness of social robots in private households of elderly adults and people with dementia to demonstrate the actual usefulness of social robots on delaying institutionalization by improving QoL, cognition, and social contact, and counteracting loneliness. Most of the identified studies focused on usability, and the robots appeared to be favorably accepted in most cases. It is time to encourage investigations in private homes to supplement existing knowledge about the effectiveness of robots and personalization of their functions. We expect that additional research will corroborate the impact of social robots on loneliness, mood, QoL, and social health in people with dementia and the elderly.