The Reliability of Remote Patient-Reported Outcome Measures via Mobile Apps to Replace Outpatient Visits After Rotator Cuff Repair Surgery: Repetitive Test-Retest Comparison Study for 1-Year Follow-up

Background: With the development of health care–related mobile apps, attempts have been made to implement remote patient-reported outcome measures (PROMs). In order for remote PROMs to be widely used by mobile apps, the results should not be different depending on the location; that is, remote PROM results performed in locations other than hospitals should be able to obtain reliable results equivalent to those performed in hospitals, and this is very important. However, to our knowledge, there are no studies that have assessed the reliability of PROMs using mobile apps according to the location by comparing the results performed remotely from the hospital and performed at the outpatient visits. Objective: The purpose of this study was to evaluate the reliability of remote PROMs using mobile apps compared to PROMs performed during outpatient follow-up visits after arthroscopic shoulder surgery. Methods: A total of 174 patients who underwent arthroscopic rotator cuff repair completed questionnaires 2 days before visiting the clinic for the 1-, 2-, 3-, 6-, and 12-month follow-ups (test A). The patients completed the questionnaires at the clinic (test B) using the same mobile app and device for the 1-, 2-, 3-, 6-, and 12-month follow-ups. Test-retest comparisons were performed to analyze the differences and reliability of the PROMs according to the period. Results: Comparisons of tests A and B showed statistically significant differences at 1, 2, and 3 months (all Ps<.05 except for the ASES function scale at 3-months) but not 6 or 12 months after surgery (all Ps>.05). The intraclass correlation values between the two groups were relatively low at the 1-, 2-, and 3-month follow-ups but were within the reliable range at 6 and 12 months after surgery. The rate of completion of tests A and B using the mobile app was significantly lower in the group older than 70 years than in the other groups for all postoperative periods (P<.001). Conclusions: PROMs using mobile apps with different locations differed soon after surgery but were reliably similar after 6 months. The remote PROMs using mobile apps could be used reliably for the patient more than 6 months after surgery. However, it is to be expected that the use of mobile app–based questionnaires is not as useful in the group older than 70 years as in other age groups. (J Med Internet Res 2021;23(3):e20989) doi: 10.2196/20989


Background
With the dramatically increased penetration rates worldwide [1], at 81% in the United States and 95% in South Korea [2], smartphones are becoming increasingly indispensable in everyday life [3]. A variety of mobile apps for information, communication, education, and entertainment purposes have been developed for smartphones [3], including mobile health care systems. Seto et al [4] developed a mobile phone-based telemonitoring program for patients with heart failure following acute decompensation. Denono et al [5] suggested that postoperative mobile apps after ambulatory lumbar discectomy were effective tools for spine surgeons.
With the development of health care-related mobile apps, attempts have been made to implement remote patient-reported outcome measures (PROMs). Skrepnik et al [6] assessed the impact of a novel smartphone app compared with standard follow-up on mobility following treatment with intra-articular injection in patients with knee osteoarthritis. Armstrong et al [7] evaluated the effect of home monitoring via a mobile app on the number of in-person visits following ambulatory surgery. Most studies reported that patients found mobile apps for remote follow-ups to be convenient, safe, and highly satisfactory [4][5][6][7][8].
Reliable remote follow-ups by mobile health care systems have several advantages over face-to-face follow-ups. In general, follow-up durations of at least 12 months to several years are required for reliable clinical study findings after surgery [9,10]. However, maintaining high rates of long-term follow-up is challenging due to poor patient compliance [10,11]. Remote follow-ups using mobile PROMs are also efficient in terms of health care costs compared to outpatient visits [12]. Considering the difficulty in long-term follow-up [10], the reduction in outpatient follow-ups, and the reduced health care costs [12], PROMs using mobile apps performed outside of clinics may be good alternatives. In order for remote PROMs to be widely used by the mobile app, the results should not be different depending on the location; that is, remote PROM results performed in locations other than hospitals should be able to obtain reliable results equivalent to those performed in hospitals, and this is very important. However, to our knowledge, there are no studies that have assessed the reliability of PROMs using mobile apps according to the location by comparing the results performed remotely from the hospital and performed at the outpatient visits.

Goal of This Study
Therefore, this study evaluated the reliability of remote PROMs using mobile apps compared to the PROMs performed by the same mobile apps during outpatient follow-up visits after arthroscopic shoulder surgery. We also analyzed the tendencies in differences with increasing time after surgery and observed the PROM participation rates of patients according to the follow-up periods with repetitive test-retest studies. We hypothesized that the results of the PROMs would be similar between those measured in outpatient clinic visits and those measured remotely using mobile apps.

Patients and Study Design
205 consecutive patients who underwent arthroscopic rotator cuff repair by a single surgeon were initially considered for this study between April 2018 and April 2019. Patients diagnosed with large or massive rotator cuff tears were excluded because of the difference in their rehabilitation schedules. Patients with dementia, mental retardation, illiteracy, or inability to use electronic devices were excluded because of the difficulty in completing questionnaires using electronic equipment. After exclusion, the remaining 174 patients (92 men and 82 women) prospectively conducted the test-retest comparisons, which were performed 5 times each to assess the results after surgery. The patients were instructed to complete questionnaires (visual analog scale [VAS], American Shoulder and Elbow Society [ASES] scale [13], and Disabilities of the Arm, Shoulder, and Hand [DASH] scale [14]) at other locations (test A) 2 days before visiting the clinic for the 1-, 2-, 3-, 6-, and 12-month postoperative follow-ups. Using the same app and electronic devices, namely, their mobile phones, each patient completed the same questionnaires at the clinic (test B) at 1, 2, 3, 6, and 12 months after surgery (Figure 1). The patients received mobile messages linked to an app for an electronic PROM system (Proscore, Incheon, South Korea). All patients who visited our clinic answered the same questionnaires with the mobile app before treatment. The timing of mobile messaging was determined to be 48 hours before the clinic visit based on a previous systematic review that reported test-retest reliability [15]. Of the 174 patients, test A (PROMs completed via the mobile app installed on the mobile phone of each patient 2 days before the clinic visit) was completed by 148 at 1 month, 135 at 2 months, 106 at 3 months, 77 at 6 months, and 59 at 12 months. All 174 patients visited our clinic at 1 month after surgery. However, the rates of outpatient visits with patients completing test B using the same app and electronic devices (the mobile phone of each patient) decreased over time, with 170 visiting at 2 months, 142 at 3 months, 112 at 6 months, and 95 at 12 months after surgery ( Figure 1). All patients underwent the same course of rehabilitation. An abduction brace was applied for 4 weeks after surgery. Passive range of motion exercises were allowed from 4 to 8 weeks after surgery. Active range of motion exercises were conducted 8 weeks after surgery. This study, including the subject selection and data collection,

Scale Definitions and Measures
The VAS score is measured ranging from 0 to 10, with scores of 0 and 10 indicating "no pain" and "worst pain imaginable," respectively. The ASES scale [13] consists of two subscales, namely, pain (1 item) and function (10 items). Each subscale is transformed to scores ranging from 0 to 50, based on patient responses. The sum of the two scales is the total score on the ASES scale, with a score of 100 points indicating perfect conditions of the shoulder. This study analyzed the total ASES scale score as well as the scores for the two subscales. The DASH scale comprises 30 items (21 on daily activities, 5 on symptoms, 3 on participation, and 1 on confidence in ability) [14]. Higher scores indicate worse upper limb function. We used an electronic PROM system (Proscore, Incheon, South Korea) available as an app for electronic devices that measures VAS, ASES scale, and DASH scale scores at locations other than the clinic. In this system, patients touched the answer on the screen instead of marking their responses on original paper questionnaires using a writing instrument. This change from paper-based to electronic-based measures is minor, according to the Food and Drug Administration guidelines [16].

Statistical Analyses
The data are expressed as means (standard deviations) or medians (ranges). Paired t tests (2-tailed) were used to evaluate differences between the answers for tests A and B; more specifically, the average score with standard deviations of the scale's scores was calculated and analyzed using paired t tests.
We also calculated the average absolute value of the differences between tests A and B. Intraclass correlation coefficients (ICCs) were calculated to estimate reproducibility and reliability between tests A and B. Statistical significance was indicated by P<.05. All statistical analyses were performed using IBM SPSS Statistics for Windows, version 19.0 (IBM Corp, Armonk, NY).

Results
The demographics of patients undergoing rotator cuff surgery are summarized in Table 1.
The average scores and absolute values of the differences between tests A and B are shown in Table 2 and Figure 2 for the 1-, 2-, 3-, 6-, and 12-month postoperative results. At 1, 2, and 3 months after surgery, test B showed significantly better outcomes compared to those of test A (Ps<.05), except for the ASES function subscale (P=.06 at 3 months). All parameters did not show statistically significant differences (all Ps>.05) between tests A and B at 6 and 12 months after surgery. The average absolute differences in VAS, ASES total, and DASH scores between tests A and B were 1.68, 14.72 and 11.28 points at 1 month after surgery, respectively. In most of the scales, the differences in the average and absolute differences gradually decreased with time after surgery. At 12 months after surgery, the average absolute value differences in VAS, ASES total, and DASH scores between tests A and B were greatly reduced (0.32, 5.48, and 4.46 points, respectively).  To estimate the reproducibility and reliability between tests A and B, ICC values were calculated for each scale and subscale (  The rates of outpatient visits and completions of tests A and B according to the period for each age group are shown in Table  4. At 1 month, all 174 patients visited our clinic. However, with time after surgery, the number of outpatient visits gradually decreased. No significant differences in the numbers of outpatient visits were observed in terms of age (P=.60, .54, .91, and .70 for 2, 3, 6, and 12 months after surgery, respectively). The rate of completion of tests A and B using the mobile app was significantly lower in the group older than 70 years than in the other groups for all postoperative periods (P<.001). In this study, 36 of 174 patients (20.7%) completed all follow-up visits (1, 2, 3, 6, and 12 months after surgery) and also completed tests A and B (completely implemented group). We also performed comparisons between tests A and B in this group to determine the average difference for each scale ( Table 5).
The ICC values between tests A and B in the completely implemented group (n=36) were similar to those for all 174 patients (Table 6). At 1, 2, and 3 months after surgery, test B showed significantly better outcomes than those of test A (Ps<.05), except for the ASES pain subscale and DASH scale.
No parameter differed significantly between tests A and B at 6 and 12 months after surgery. The average absolute value of the differences for the VAS, ASES total, and DASH scores between tests A and B were 1.50, 15.97, and 10.28 points, respectively, at 1 month after surgery. In most of the scales, the average and the absolute differences gradually decreased with time after surgery. All parameters showed poor or moderate ICCs at 1, 2, and 3 months after surgery but showed moderate or good values at 6 months and peaked at 12 months after surgery for all parameters. The VAS score, ASES pain subscale, and ASES total scale showed good ICCs at 12 months after surgery (0.80, 0.82, and 0.88 respectively), while the ASES function scale and the DASH scale showed excellent ICCs at 12 months after surgery (0.92 and 0.90, respectively).

Discussion
The results of this study revealed that the PROMs varied depending on the location for the initial 1-, 2-, and 3-month follow-ups after arthroscopic shoulder surgery. However, at 6 months or more after surgery, the PROMs using the mobile apps showed similar results regardless of location. The ICC analysis also showed a tendency toward relatively low values for 1, 2, and 3 months postoperatively according to the PROM location, while high values were recorded at the 6-and 12-month follow-ups. These findings indicated that PROMs performed using mobile apps at 6 months after surgery were adequately reliable and reproducible regardless of location. Therefore, the use of remote PROMs via mobile apps may be more valuable for follow-ups at 6 months or more after surgery, when the rate of follow-up loss is increased.
Most scales showed different outcomes for test B compared to those for test A at the initial 1, 2, and 3 months postsurgery. However, at 6 and 12 months after surgery, none of the scales differed significantly between tests A and B. The absolute values of the differences were also greatly reduced with time, and the reliability as assessed by ICC was adequately high after 6 months. These outcomes are consistent with those of previous studies on the test-retest reliability of PROMs. Chahal et al [17] reported good reliability of PROM for knee joint-specific questionnaires in a test-retest study conducted 6 months after multiligament knee injury. Bramming et al [18] reported that a PROM (forgotten joint score-12) showed high relative reliability in a test-retest study conducted at 6 months after hip arthroscopic surgery. The differences in follow-ups performed in the first 3 months postsurgery might be due to variability in patient conditions during the acute phase following surgery. Additionally, the differences may have decreased over time due to patients getting used to the test items by repeatedly performing PROMs. The absolute values of the differences between the two tests were also noteworthy, given that the purpose of this study was to measure the difference between outpatient and remote mobile apps. The absolute values of the differences for each scale were relatively high at the 1-, 2-, and 3-month follow-ups. However, at the 6-and 12-month follow-ups, all parameters showed reduced absolute differences.
These results also reinforce the reliability of the remote PROMs compared to outpatient PROMs for long-term follow-ups.
Clinical studies on patient outcomes after surgery generally require at least 12 months to several years of follow-up for recognition as reliable clinical studies [9,10]. To avoid biases in clinical studies using PROMs performed at the clinic, it is important to minimize loss to follow-up to the hospital [19,20]. However, maintaining high rates of long-term follow-up is challenging due to poor patient compliance [10]. Cronin et al [21] showed that 40% of patients with orthopedic trauma did not complete 90 days of follow-up. Zelle et al [19] also reported that patients with undifferentiated orthopedic trauma showed high rates (>70%) of noncompliance in the initial 6 months postsurgery. Considering that patients' compliance with outpatient follow-up decreases over time after surgery [21], the reliability of PROMs via mobile apps regardless of the location for long-term follow-up after surgery is meaningful as these PROMs may be an option to assess patient condition without a need to travel to the hospital.
Even in terms of the cost benefits and efficient follow-ups for patients [12], the reliability of remote PROMs is also important. Higgins et al [22] compared a conventional in-person visit follow-up group (conventional group) to a non-face-to-face follow-up group using a mobile app (mobile app group) for 6 weeks after anterior cruciate ligament reconstruction. The mobile app group had 0.36 clinic visits during the study period, compared to 2.44 visits in the conventional group. The mobile app group spent Can $211 (US $166.16) less over 6 weeks than the other group. Thus, in terms of cost burden, remote PROMs may also have advantages over outpatient visits if the assessments are reliable.
Due to the recent infectious disease epidemics of COVID-19 [11], it is difficult to expect patients to comply with outpatient follow-ups in the absence of an emergency [23]. Remote PROMs are particularly valuable [11] as medical staff and national health care system resources are focused on a particular infectious disease [24][25][26]. Recent guidelines from the Journal of Bone and Joint Surgery [27] recommend the assessment of all planned elective or nonemergency surgical procedures and clinical visits to determine whether they can be postponed or canceled. If remote PROMs are reliable, they can be effective and highly utilized for reducing patient visits [11] and allow efficient distribution of the national health care system capacity when infectious disease outbreaks occur.
This study used an electronic PROM system (Proscore, Incheon, South Korea) available for mobile phones. The correlation between electronic measuring systems and conventional paper-and-pencil methods is reportedly reliable [28]. The compliance of patients for completing scoring tools using electronic systems is generally better than that for paper-and-pencil methods because it is more convenient and quicker [29]. However, older patients may not prefer performing PROMs with electronic devices because of less exposure to and familiarity with electronic devices compared to younger patients [29]. In this study, the rate of outpatient visits did not differ significantly by age; however, the rates of test-retest completion for both PROMs at outpatient visits and remote PROMs using mobile apps were statistically significantly lower in patients older than 70 years than those in other groups for all postoperative periods (P<.001). Instructions for the use of smartphone devices and apps must be provided to the elderly in order to use PROMs via mobile apps at locations other than hospitals.
Test-retest assessments to evaluate the reliability of tools are generally conducted once for comparisons. However, this study conducted test-retest comparisons 5 times each to determine the tendencies with increasing time after surgery, which is a strength of this study. The limitations of this study were its inclusion of only patients who underwent arthroscopic rotator cuff repair. However, this could also be considered a strength as confounding variables due to many disease entities are reduced. Several diseases and treatment options for the shoulder joint, including intra-articular injection for frozen shoulder, reverse total shoulder arthroplasty for rotator cuff arthropathy, and other disease categories, might be candidates for further study.
In conclusion, PROMs performed using mobile apps in different locations showed varied results soon after surgery but were similar after 6 months, with reliable ICC values. The remote PROMs using mobile apps could be used reliably for the patient more than 6 months after surgery. However, it is to be expected that the use of mobile app-based questionnaires is not as useful in the group older than 70 years as in other age groups.