App-Based Feedback for Rehabilitation Exercise Correction in Patients With Knee or Hip Osteoarthritis: Prospective Cohort Study

Background: The use of digital therapeutic solutions for rehabilitation of conditions such as osteoarthritis provides scalable access to rehabilitation. Few validated technological solutions exist to ensure supervision of users while they exercise at home. Motion Coach (Kaia Health GmbH) provides audiovisual feedback on exercise execution in real time on conventional smartphones. Objective: We hypothesized that the interrater agreement between physiotherapists and Motion Coach would be noninferior to physiotherapists’ interrater agreement for exercise evaluations in a cohort with osteoarthritis. Methods: Patients diagnosed with osteoarthritis of the knee or hip were recruited at a university hospital to perform a set of 6 exercises. Agreement between Motion Coach and 2 physiotherapists’ corrections for segments of the exercises were compared using Cohen κ and percent agreement. Results: Participants (n=24) were enrolled and evaluated. There were no significant differences between interrater agreements (Motion Coach app vs physiotherapists: percent agreement 0.828; physiotherapist 1 vs physiotherapist 2: percent agreement 0.833; P<.001). Age (70 years or under, older than 70 years), gender (male, female), or BMI (30 kg/m or under, greater than 30 kg/m) subgroup analysis revealed no detectable difference in interrater agreement. There was no detectable difference in levels of interrater agreement between Motion Coach vs physiotherapists and between physiotherapists in any of the 6 exercises. Conclusions: The results demonstrated that Motion Coach is noninferior to physiotherapist evaluations. Interrater agreement did not differ between 2 physiotherapists or between physiotherapists and the Motion Coach app. This finding was valid for all investigated exercises and subgroups. These results confirm the ability of Motion Coach to detect user form during exercise and provide valid feedback to users with musculoskeletal disorders. (J Med Internet Res 2021;23(7):e26658) doi: 10.2196/26658


Introduction
Musculoskeletal conditions such as osteoarthritis and back pain result in a huge burden for patients and health care systems. Impaired mobility affects both the quality of life of the individual, for example, by increasing social isolation, and the health care system, by raising costs due to factors such as hospitalizations and secondary diseases [1][2][3]. Osteoarthritis can lead to pain-related fear of movement and an increased probability of further functional impairment [4]. In addition, osteoarthritis is a predictor for developing disabilities that affect activities of daily living, underlining the importance of effective interventions [5].
Current guidelines [6] recommend self-management programs and exercise as first-line therapies for managing osteoarthritis. The prevalence of osteoarthritis is increasing, yet cost and resource constraints limit in-person access to these therapies [7]. Digital therapeutics have emerged as an option to provide access to exercise therapy and multidisciplinary rehabilitation for patients with musculoskeletal pain conditions such as osteoarthritis and back pain [8][9][10]. Even though a recent survey among health professionals indicated widespread support of use of mobile health technologies in osteoarthritis treatment [11], a primary concern with using digital therapeutics for home-based exercise is the lack of supervision by health care professionals.
Several different digital solutions have been proposed to correct and optimize body pose during exercise execution to improve access to therapeutic exercises [12]. Many mobile health apps for musculoskeletal rehabilitation rely upon video instructions only and provide no means of detecting and correcting pose during exercise [9,13]. These systems, by default, leave users exposed to the risk of incorrectly performing exercise but allow for scalable access without requiring external hardware. To the best of our knowledge, there are no reports on the quality of exercise execution during the use of these systems. Other technologies, such as integrated devices containing inertial sensors, have also been validated to a limited extent, and whether they are suitable for detecting and correcting form during therapeutic exercises has not been evaluated [14,15]. Digital therapeutics that have been validated for this purpose require additional hardware such as a Microsoft Kinect device [16,17].
Motion Coach (Kaia Health GmbH) was recently introduced to address these issues (ie, requiring that equipment be worn on the body or additional hardware) by using only smartphone front camera data and machine learning algorithms to detect the position of body segments during exercise in real time in order to provide personalized feedback.
The aim of this study was to evaluate the ability of Motion Coach to detect and correct form during physiotherapeutic exercises in patients with osteoarthritis. We hypothesized that interrater agreement between physiotherapists and Motion Coach would be noninferior to that between 2 physiotherapists.

Participants
Participants with a confirmed prior diagnosis of osteoarthritis of the hip or knee were enrolled from the outpatient population of the Department of Orthopedics, Physical Medicine and Rehabilitation, University Hospital, Ludwig Maximilians University of Munich.

Ethics and Registration
The study was approved by the Ethics Committee of Ludwig Maximilians University of Munich (20-162) and all participants provided informed consent before study procedures were carried out. The study was registered with the German Study Registry (Deutsches Register Klinischer Studien; DRKS00021828) prior to beginning enrollment.

Procedure
To evaluate the correction of osteoarthritis-specific exercises, Motion Coach provides instructions visually through an iPad's screen and acoustically via headphones to the participants. While participants performed exercises using Motion Coach, 2 physiotherapists evaluated whether the exercises were being performed correctly. (Physiotherapists were blinded to the audiovisual feedback of Motion Coach). Furthermore, the physiotherapists evaluated the execution of an exercise set or the performance over the predefined time for static exercises as a whole on a 6-point Likert scale (0=insufficient, 5=excellent execution of movement).

Exercises
For assessment, 6 exercises (Table 1 and Figure 1) that reflected several aspects of therapeutic exercises were chosen from the app to ensure detection by the algorithm was reliable in different circumstances. We included exercises that required a varying range of technical ability; exercises that had different modes of execution (4 dynamic and 2 static), to differentiate between exercises requiring rapid feedback in real time (due to continuous movement) and those that do not; and exercises with different levels of difficulty (low, medium, or high).  Figure 1.

Overview
In order to give audiovisual feedback on exercise form in real time, Motion Coach uses the camera stream of a user's mobile device and artificial intelligence-based image processing. Users place their device on the ground approximately 2 meters away, tilted slightly so they can be seen in the frame of view of the camera. The app guides the user with interactive setup instructions ( Figure 2). A 2-step process is applied to each new image frame as it is captured by the camera. User is guided as to where to stand by a series of interactive screens.

Step 1: Estimating Pose
First, a Pose Estimation Machine Learning Model is applied to infer the user's pose for each captured image frame in real time ( Figure 3). This Pose Estimation Model is a convolutional neural network (typically used for image-based machine learning tasks [18]) with a proprietary architecture that runs entirely on the user's mobile device (therefore, no raw video data leave the user's device). The model was specifically optimized to run on a wide variety of iOS and Android devices, and the model achieves state-of-the-art performance on academic benchmarks such as the MPII Human Pose Data Set Benchmark [19]. Kaia Health trained this model using a proprietary image data set that consisted of data from people with a variety of characteristics (body shape, height, skin color, movement limitations, etc) exercising in front of their mobile device, with a wide variety of exercise movements and environmental conditions such as varying lighting and background to make the model robust. Each image in the data set had been manually labeled according to a taxonomy designed to best capture the human body in physiotherapeutic exercises.

Step 2: Evaluating Geometric Expert System
For audiovisual feedback, spatiotemporal constraints, which were configured in advance by medical, physiotherapeutic, or sport science-trained Kaia staff, are triggered based on movement; there was no need for reconfiguration on a per-user or per-session basis. While the system was in use, constraints were checked automatically in real time, and feedback was provided if any of the configured constraints were violated. If multiple constraints were violated, the prioritization mechanism selects the feedback based on risk of injury.

Overview
Physiotherapists' evaluations were collected on a rating sheet for each participant. Data from the app were obtained by taking a screenshot of the report of corrections after the exercises had been executed. Baseline data were collected from participants using paper-based surveys or from participants' medical reports if they were available in the system. Data from all sources were entered into a metafile in a spreadsheet (Excel; Microsoft Inc).

Data Collection
Gender, age, diagnosis, location of osteoarthritis, height, weight, and the Western Ontario and McMaster Universities Arthritis Index (WOMAC) score were collected at baseline [20].
Each participant performed 6 exercises with a total of 23 rated segments (a set of repetitions of 10 for each exercise or 30 seconds of stable posing for static exercises). For each segment, each physiotherapist's evaluation and Motion Coach's evaluation (ie, whether correction was required or not) were collected after the participants completed each exercise. Furthermore, the overall form rating by physiotherapists was recorded on a 6-point Likert scale. Data were pooled for the primary analysis.

Study Endpoints
The primary endpoint was overall agreement between physiotherapists' and Motion Coach' evaluations during exercise execution. For each segment, there was a dichotomous outcome (correction recommended or not).

Sample Size
We calculated the sample size required for a noninferiority trial with dichotomous outcome (ie, agreement or disagreement, either between app and physiotherapists' ratings or between the 2 physiotherapists). We used pilot data (app-physiotherapists mean ratio 0.83; physiotherapist 1-physiotherapist 2 mean ratio 0.845) from the first 16 participants of the study. We determined that 552 exercise segments would be required; therefore, given an assumption of 23 segments per participant, the number of required participants was 24 (noninferiority margin 0.05; α=5%; β=90%). A noninferiority margin of 0.05 was recently used in a comparable study [16] for evaluation of exercise correction with a digital tool.

Statistical Analysis
Continuous data (age, weight, height, and BMI) are described using means and standard deviations; discrete data (gender, location of osteoarthritis, WOMAC score) are described using absolute and relative numbers. Motion Coach-physiotherapist 1, Motion Coach-physiotherapist 2, Motion Coach-both physiotherapists, and physiotherapist 1-physiotherapist 2 interrater reliabilities (Cohen κ and percent agreement) were compared using z scores (α=5%). To assess whether demographic variables had any significant effect on the interrater agreement between Motion Coach and physiotherapists, subgroups for age (70 years or under, older than 70 years), gender (male, female), and BMI (30 kg/m 2 or under, greater than 30 kg/m 2 ) were formed and compared. We also assessed interrater agreement by exercise. Interrater agreement was categorized according to Cohen κ values as suggested by Landis and Koch [21]: κ < 0.00, poor agreement; κ=0.00-0.20, slight agreement; κ=0.21-0.40, fair agreement; κ=0.41-0.60, moderate agreement; κ=0.61-0.80, substantial agreement; κ=0.81-1.00, almost perfect agreement. All analyses were conducted with R software (version 4.0.2; R Foundation for Statistical Computing).

Participants
The study population's mean age was 67.6 (SD 8.98 years), and 20 out of the 24 participants (83%) were female. Participants ( Table 2)

Primary Analysis
Mean agreement between the app and physiotherapists (percent agreement 0.828) was not inferior (margin 0.05; P<.001) to that between physiotherapist 1 and physiotherapist 2 (percent agreement 0.833).

Subgroup Analysis
No differences were found between app-physiotherapist interrater reliabilities and physiotherapist 1-physiotherapist 2 interrater reliability in any of the subgroups (Table 4 and Figure  5).

Interrater Agreement in Different Exercises
The analysis showed no detectable difference in the rates of interrater agreement in any of the exercises (Table 5 and Table  6).    c App-physiotherapist 2 vs physiotherapist 1-physiotherapist 2.

Discussion
The purpose of this study was to compare interrater agreement of osteoarthritis knee and hip exercise assessments between Motion Coach (a novel digital tool) and trained physiotherapists; we hypothesized that assessment agreement for the Motion Coach app would not be inferior to that of physiotherapists. Our data support the hypothesis that Motion Coach is noninferior to physiotherapists in assessing whether exercise poses required correction. There was no difference between the interrater agreement of Motion Coach and physiotherapists and that among physiotherapists. This finding was also true in analyses of subgroups that consisted of men, women, participants 70 years or older, participants below 70 years, participants with BMI greater than 30 kg/m 2 , and participants with BMI less than 30 kg/m 2 and in analyses by exercise. To the best of our knowledge, this is the first report comparing a digital software-based exercise feedback tool with conventional smartphone technology and physiotherapeutic exercise feedback for musculoskeletal conditions.
Previous studies [16,17] have used 3D sensors such as the Microsoft Kinect system to assess pose during exercise and give feedback to users if correction was needed. However, 3D-sensor systems are expensive and require extensive external hardware and a stationary television set, and thus have limited scalability in providing access to digital rehabilitation. Komatireddy [16] found no detectable difference in agreement between a software solution for Microsoft Kinect and a panel of physiotherapists for repetition count and the number of acceptable exercises. Wochartz et al [17] evaluated agreement with regard to joint angles and positions of the lower limb between a Microsoft Kinect based-system and a 3D camera-based motion system but did not evaluate its capacity to trigger corrections during therapeutic exercises; they concluded that the validity of the Kinect system to detect pose without postprocessing was restricted.
Other digital rehabilitation tools for musculoskeletal pain use external inertial sensors attached to specific limbs or joints to detect exercise poses [22][23][24]. By nature, these systems are limited to detecting the poses of joints or body areas only where they are placed, and users must typically attach the hardware to their bodies themselves. Studies [14,15] have shown that these systems are generally capable of detecting exercise poses; however, these systems have not been systematically evaluated for their ability to provide feedback on pose during exercise execution.
Built-in smartphone inertia sensors are a viable option to deliver pose correction in rehabilitation without requiring specialized equipment or installations. Spina et al evaluated real-time smartphone motion sensor data processing as an option to assess pose in physical exercises by people with chronic obstructive pulmonary disease [25]. The system was able to provide feedback on pose and exercise feedback similar to the feedback of a trained therapist. The system required a holster to hold the smartphone and that was repositioned on the body depending on the exercise performed. While previous reports have addressed the general feasibility of exercise-related feedback using 2D RGB camera streams, the percent agreement of those systems without postprocessing limited their use [26,27]. In contrast, Motion Coach relies upon 2D camera stream postprocessing of using machine learning algorithms for valid real-time feedback for exercise correction.
To the best of our knowledge, this study is the first to evaluate the potential of a technology (Motion Coach) to trigger suitable corrections of therapeutic exercises in musculoskeletal pain rehabilitation, with the findings suggesting that Motion Coach technology triggers valid corrections as compared to trained physiotherapists. Motion Coach is a software only solution operating on off-the-shelf smartphones, without any need for additional hardware, which makes this digital therapeutic solution accessible to a broad patient population.
The interrater reliability of trained physiotherapists assessments of pose during lower extremity exercises for the has been investigated: Chmielewski et al [28] investigated interrater agreement during 2 exercises performed by healthy volunteers for the lower extremity with 2 distinct methods (overall rating and investigation of deviation from the neutral plane during exercise) in a panel of 3 physiotherapists and found agreement better than chance but no high levels of agreement between physiotherapists. Whatman et al [29] investigated interrater agreement for lower extremity exercises in a panel of physiotherapists (segment-specific and overall agreement) with ordinal and dichotomous outcomes; interrater agreement was generally fair to good and increased with experience of the rater. The interrater agreement observed in our study, among the physiotherapists and also between the physiotherapists and Motion Coach, was high compared to those in previous studies [28,29]. This finding can be explained by the high level of experience of the physiotherapists and training of the physiotherapists on evaluation criteria prior to patient enrollment. Compared to other approaches requiring specialized hardware, the degree of agreement between both physiotherapists and Motion Coach remains high; a similar study [30] using data from the Kinect version 2 Skeleton Tracking system to assess rehabilitation exercises in 19 people with musculoskeletal and neurological limitations showed a limited correlation (r=0.60, P<.01 for the clinical subgroup) between expert's clinical judgement and the results of various models based on sensor data.
The study had several limitations. First, the pool of raters was small with n=2, and a third rater was not used (in cases of disagreement between the 2 raters). In addition, the sample was heterogeneous in terms of gender distribution and localization of osteoarthritis, limiting the generalizability of the results. Other limitations arise from the fact that the assessment of pose during therapeutic exercise execution is not standardized, and thus, in this study as in comparable previous studies [28,29], no well-established standard measurement could be used to quantify exercise execution. Furthermore, dichotomous assessment of acceptable exercise is only one of several measures used in prior studies to assess form during exercise. Future studies evaluating Motion Coach will need to use more diverse outcome measures of form during exercise, for example calculations with a musculoskeletal human model.
The interrater agreement for suggesting corrections during therapeutic exercises between both physiotherapists and Motion Coach was moderate to substantial and did not differ between physiotherapists themselves and physiotherapists and Motion Coach. This finding was valid for all investigated exercises and subgroup analysis. These findings validate the ability of Motion Coach to detect form during exercise and provide audiovisual feedback to users with preexisting musculoskeletal conditions.