Electronic Data Capture Versus Conventional Data Collection Methods in Clinical Pain Studies: Systematic Review and Meta-Analysis

Background: The most commonly used means to assess pain is by patient self-reported questionnaires. These questionnaires have traditionally been completed using paper-and-pencil, telephone, or in-person methods, which may limit the validity of the collected data. Electronic data capture methods represent a potential way to validly, reliably, and feasibly collect pain-related data from patients in both clinical and research settings. Objective: The aim of this study was to conduct a systematic review and meta-analysis to compare electronic and conventional pain-related data collection methods with respect to pain score equivalence, data completeness, ease of use, efficiency, and acceptability between methods. Methods: We searched the Medical Literature Analysis and Retrieval System Online (MEDLINE), Excerpta Medica Database (EMBASE), and Cochrane Central Register of Controlled Trials (CENTRAL) from database inception until November 2019. We included all peer-reviewed studies that compared electronic (any modality) and conventional (paper-, telephone-, or in-person–based) data capture methods for patient-reported pain data on one of the following outcomes: pain score equivalence, data completeness, ease of use, efficiency, and acceptability. We used random effects models to combine score equivalence data across studies that reported correlations or measures of agreement between electronic and conventional pain assessment methods. Results: A total of 53 unique studies were included in this systematic review, of which 21 were included in the meta-analysis. Overall, the pain scores reported electronically were congruent with those reported using conventional modalities, with the majority of studies (36/44, 82%) that reported on pain scores demonstrating this relationship. The weighted summary correlation coefficient of pain score equivalence from our meta-analysis was 0.92 (95% CI 0.88-0.95). Studies on data completeness, patientor provider-reported ease of use, and efficiency generally indicated that electronic data capture methods were equivalent or J Med Internet Res 2020 | vol. 22 | iss. 6 | e16480 | p. 1 https://www.jmir.org/2020/6/e16480 (page number not for citation purposes) Jibb et al JOURNAL OF MEDICAL INTERNET RESEARCH


Background
Pain is an unpleasant sensory and emotional experience that is unique to the individual. It is also a dynamic process and fluctuates in a multidimensional manner across its sensory (eg, intensity, location, duration, etc), evaluative (ie, impact on functioning) and affective (ie, emotional effect) qualities within both the short and long term [1]. Pain is influenced by a variety of biopsychosocial factors, including genetics, mood, emotions, memory, and interpersonal relationships as well as external stimuli such as physical movement [1][2][3]. The accurate measurement of pain is of utmost importance to clinicians and researchers.
The most commonly used methods of measuring pain within a clinical and research context are self-reported questionnaires. Clinically, pain measurements are generally performed before and after an intervention to assess a patient's response to therapy. These assessments are typically performed using paper-based questionnaires or via face-to-face or telephone-based verbal surveys or interviews. Although widely used, these conventional data collection methods can introduce a number of biases in the collected pain data. In particular, these methods often rely heavily on a patient's recall of their pain symptoms (eg, pain intensity over the preceding week). Unfortunately, the recall of pain is problematic because memories of pain are vulnerable to distortion due to physical and psychological contextual factors and selective coding and retrieval of memories [4,5]. Additional issues with conventional data collection methods include limitations in conducting ecologically valid assessments of pain in the patient's natural environment and social context, logistical challenges for repeated measurements over time, potential burden to patients, clinicians, and researchers, and possibly reduced data quality due to incomplete or back-filled pain diaries [6][7][8].
The advent of mobile electronic devices has created novel opportunities to collect pain-related data in clinical and research settings. Electronic data collection methods have been used to assess variables related to a variety of conditions, including mood disorders, asthma, tobacco cessation, urinary incontinence, brain injury, diabetes, cancer, and pain [7,[9][10][11]. Specialists in pain medicine have widely advocated for the use of electronic data capture over the past two decades [12,13], and mounting evidence suggests that data collected via electronic methods may be more accurate and contain fewer errors than conventional methods [14,15]. Although randomized controlled trials and observational studies comparing electronic and conventional data collection methods suggest benefits to the use of electronic devices in pain clinical trials, no review providing an overview of these benefits currently exists. Furthermore, with the advent of smartphone-style mobile phones and their nearly ubiquitous use in developed countries [16], electronic data collection methods are becoming more widely available. As such, a review of the literature is needed to understand the potential advantages and disadvantages of collecting pain data using electronic methods.

Objective
We aimed to identify and synthesize data from studies comparing electronic and conventional pain-related data collection methods to describe similarities and differences in pain scores, data completeness, ease of use, efficiency, and acceptability between methods.
interference (including affect) scores assessed using an electronic and a conventional data capture method. Pain intensity and interference were the focus of the analysis as these constructs are commonly assessed, single-item aspects of both acute and chronic pain and are routinely used to determine treatment effectiveness and guide therapy [18,19]. As recalled pain reports may not be an accurate reflection of the momentary pain experience, we included only studies that compared momentary pain reports. No restrictions were placed on the type of data collection method (eg, mobile phone, computer-based, and tablet), pain assessment instrument (eg, numerical rating scale [NRS]), frequency of data collection, or other pain-related assessments (ie, studies that also assessed constructs such as quality of life or disease activity in addition to pain intensity or interference were included).

Study Selection
We developed a comprehensive search strategy in consultation with a tertiary hospital librarian with expertise in the scientific literature related to digital health. We customized the search strategy to conduct tailored searches of MEDLINE, EMBASE, and Cochrane Central Register of Controlled Trials (CENTRAL) from inception until November 19, 2019. Medical Subject Headings (MeSH) keywords in the search included: pain, pain measurement, pain threshold, pain perception, electronics, cellular phone, computers, handheld, wireless technology, internet, computer communication networks, mobile applications, randomized controlled trial, multicenter study, observational study, humans, and prospective studies. Additional keywords used in the search included: pain, pain reporting, personal digital assistant, smartphone, and prospective study. An example of the search strategy can be found in Multimedia Appendix 1. We supplemented our search with searches of the author's own databases of electronic pain assessment studies.
Search results were initially electronically screened for intradatabase and interdatabase duplicates. After the electronic removal of duplicates, titles and abstracts were screened independently by 2 authors using piloted standardized screening forms (all authors involved). Subsequently, the full texts of the included citations were reviewed in duplicate to confirm study inclusion (all authors involved). The kappa statistic was calculated as a metric of screening agreement at the full-text stage. Following the literature-based precedent, we interpreted the kappa as follows: <0.00, poor; 0.00-0.20, slight; 0.21-0.40, fair; 0.41-0.60, moderate; 0.61-0.80, substantial; and 0.81-1.00, almost perfect [20]. Disagreements among reviewers about study eligibility were resolved by consensus through discussion by at least three authors.

Data Collection Process
A standard data collection form was created and piloted. Data abstraction occurred independently and in duplicate. Data extracted included study design, sample size, study population, electronic and conventional data collection method, duration of data collection, score equivalence between data capture methods (ie, correlations, score differences, and descriptive reports), data completeness, ease and efficiency of data collection, and patient or participant acceptability. An a priori decision was made to not formally assess study quality given the nature of the intervention (ie, data collection method) and the diverse study designs collected in the systematic search.

Data Synthesis
Descriptive statistics (ie, frequencies and percentages) were used to synthesize and present data across all included studies. Meta-analysis was performed to synthesize results related to score equivalence across data capture methods. For the analysis, reported correlation coefficients (or kappa in the case of 2 studies [21,22]) served as effect size indices. In all studies where more than one coefficient for a correlation or measure of agreement between electronic and conventional pain data collection methods was available, we used the average of the coefficients so that a single study did not disproportionately impact the summary effect size. Whenever available, the reported sample size used to produce the score equivalence coefficient was used in the model. In cases where the sample size for the score equivalence analysis was not explicitly mentioned, we used the sample size reported for the entire study. Random-effects models were used to combine data across studies, and the I 2 statistic was used to quantify heterogeneity. The criteria set out by Higgins et al [23] were used to interpret the I 2 statistic; namely, 25%, 50%, and 75% were considered low, moderate, and high heterogeneity, respectively. To further examine the impact of heterogeneity on the results, the standardized residual score (ie, the standardized difference between each study effect size and the weighted mean effect size) for each study was calculated and compared [9]. A conservative cutoff of ±2 was set to examine extreme effect sizes as determined by the standardized residuals. We performed a sensitivity analysis to evaluate any impact of the type of correlation or measure of agreement on the weighted summary correlation. Specifically, following previously used methods, separate meta-analyses were conducted with studies reporting ICC or weighted kappa, which account for covariance and score mean and variability, and studies reporting the more conventional Pearson or Spearman rho coefficients [9]. Possible publication bias was assessed by visual inspection of an asymmetrical funnel plot. To investigate the sources of heterogeneity, we conducted further subgroup analyses. Our subgroup analyses focused on elucidating the impact of (1) the similarity of pain assessment measure between electronic and conventional modalities (ie, same measure or different) and (2) the duration of data collection (ie, once or multiple times). Subgroup analyses by study participant age and pain condition were precluded by the structure of data reported in our included studies. Meta-analysis procedures were conducted using Microsoft Excel (Microsoft Corporation) and Distiller SR Forest Plot Generator (Evidence Partners Inc).

Study Selection
The search strategy identified 4927 studies, of which 183 underwent full-text review and 129 were excluded (Figure 1). The kappa agreement score between appraisers at this stage was 0.69, which indicated substantial agreement. In all, 54 papers reporting on 53 unique studies were included in the qualitative synthesis. Stinson et al [5,24] reported different results from the same study, so were grouped presently for analyses purposes. In all, 21 studies were included in the quantitative synthesis.
The number of published studies meeting our inclusion criteria increased steadily over time ( Figure 2).
In total, 35% (19/53) studies used a randomized, crossover design, 14 (26%) studies used a nonrandomized cohort design, 9 (17%) studies were randomized controlled trials, 5 (9%)studies used a nonrandomized crossover design, 5 (9%) studies used a crossover design with unclear randomization (no mention of whether a randomization procedure was employed), and 1 (22%) study did not specify the study design. The duration of data collection varied across studies, ranging from a single assessment being conducted to repeated assessments over the course of a year.
Pain assessment tools using electronic data capture most often were multidimensional in nature (35/53, 66%). Electronic data collection methods were used to capture multidimensional aspects of pain using the following validated questionnaires:

Qualitative Synthesis of Score Equivalence
In total, 83% (44/53) of studies reported pain score equivalence between electronic and conventional data capture methods ( Table 2). Statistical methods used to compare scores differed between studies: 47% (21/44) of these studies used correlational analyses (ie, ICC, Pearson coefficient, Spearman coefficient, or weighted kappa) to examine the agreement between pain scores; 29% (13/44) studies statistically examined the differences between mean or median score, SDs, or ranges between methods; 76% (3/44) studies used descriptive methods to examine agreement; and 15% (7/44) studies used a combination of these statistical methods.  [42] and the same anatomical locations on both body map versions, in 20 cases, the markings were relatively similar, and in 7 cases, the markings were dissimilar. Across all methods used to compare scores, 82% (36/44) studies demonstrated equivalence between scores reported electronically or using conventional methods. One of these 44 studies (2%) reported nonequivalent scores between data collection methods, and 16% (7/44) studies reported discrepant results. Among studies reporting nonequivalence or discrepancies, purported reasons were recall bias, differences in question layout wherein paper assessments made all items visible to participants simultaneously allowing item scoring in relation to other responses, capacity to change item response using paper methods, and differences in scale presentation (eg, numerical values for NRS not shown using electronic data capture method).

Data Completeness
Overall, 45% (24/53) studies reported the completeness of data collected via electronic or conventional methods ( Table 3). All of these studies compared an electronic data capture modality to paper-based assessments with 8% (2/24) paper-based assessments being mailed to participants. The assessment of data completeness differed across studies and was largely defined as either the percentage of study participants not completing pain assessments or the percentage of missing or incomplete pain assessments. In total, 37% (9/24) studies reported superior data completeness in the electronic data capture group, 33% (8/24) studies reported superior data completeness in the conventional data capture group, 8% (2/24) studies reported mixed results, and 20% (5/24) studies did not conduct a direct comparison between data collection modalities, but reported a high data completeness using electronic data capture.  [25] Defined as the percent of participants completing assessments Missing data: 16/63 (25%) Missing data: 7/63 (11%) Athale et al (2004) [26] Defined as the number of missing items irrespective of inaccurate completion Long-paper diaries had significantly higher missing data scores in data completion than -a Bandarian-Balooch et al (2017) [27] the e-diaries and short-paper diaries (P<.05). The short-paper diary had significantly more missing data than the mobile phone groups (P<.05) but was not significantly different than the computer group.  [33] Defined as the percent assessments not completed across all participants over 14 days Missing data: 0% (participants reported retrospectively completing assessments when they Missing data: 8% of all daily assessments Gaertner et al (2004) [38] forgot to do so at the scheduled time) Defined as mean number of complete assessments across participants out of possible records  [15] ments) and percent of assessments completed for 7 days each month for 1 year (84 days; conventional assessment) Defined as number of missing items across each assessment Noticeably more missing data on the conventional method when compared with the electronic pain assessment Not reported Junker et al (2008) [46] Defined as concerns about a specific data point raised by the data manager or study coordinator relating to inappropriate or missing data

Ease of Use
The ease of use of electronic and/or conventional pain data capture methods was reported in 45% (24/53) studies (Table  4). Ease was assessed subjectively using administered quantitative or qualitative surveys or verbal reports in all studies. Overall, electronic data collection modalities were considered easy to use by patients in pain or their care providers. In 91% (22/24) of the studies, the electronic modality was considered easy to use, easy to understand, or easy to review or report pain. In all, 29% (7/24) studies conducted inferential testing comparing ease between pain data capture modalities. Of these studies, 57% (4/7) showed that electronic versions were significantly easier to use, 14% (1/7) study showed that the paper version was significantly easier to use, and 28% (2/7) studies showed no significant differences between groups.

Principal Findings
This is the first systematic review and meta-analysis to compare electronic and conventional data collection methods for pain-related outcomes. The results from our review suggest strong correspondence in pain scores collected across electronic and conventional modalities as well as ease of use and acceptability for electronic data capture methods. Comparisons of data completeness and efficiency showed mixed results in terms of the superiority of electronic modalities over conventional methods. Overall, these results indicate that electronic data capture is a viable means to assess pain and has the potential to overcome many of the known limitations associated with conventional methods.
The capacity to obtain equivalently scored data from patients across electronic and conventional data capture modalities is paramount to the use of more novel collection methods in clinical and research settings. Studies included in this review (ie, in 82% of cases) commonly reported on the correspondence of pain scores between assessments. Regardless of whether the data analyses were qualitative or quantitative, the general consensus across studies was that pain was reported equivalently across assessment modalities. The meta-analysis of correlations between scores reported electronically and conventionally resulted in a summary coefficient of 0.92, indicating high correspondence. The summary coefficients produced by studies reporting ICC or weighted kappa and studies reporting Pearson or Spearman rho coefficients were not different from the overall summary score, suggesting negligible change in patient-reported scores across modalities. These findings agree with those of a meta-analysis published in 2008 that evaluated the equivalence of scores for patient-reported outcomes (not specifically pain) completed using PDA, computer, or tablet and paper-based methods and that showed a summary correlation of 0.90 [9]. Together, these reviews suggest that score equivalence between electronic and conventional data capture methods is a robust finding across patient-reported outcomes.
Despite our use of random effects models, we observed substantial heterogeneity across studies included in the meta-analysis that was not accounted for by the single study that met our criterion for extreme effect size, sensitivity analyses by correlation type, the similarity of pain assessment measure used in each modality, or duration of data collection. Studies varied in terms of study design, participant group, type of electronic and conventional data collection method, and pain measurement instrument-the heterogeneity may be explained by these differences in methodology. For instance, the type of electronic device used to collect pain data varied across studies, meaning that aspects of the device such as interface design, user familiarity, and screen size could each have contributed to our heterogeneous results [11]. The included studies also varied in terms of the type of pain intensity scale or pain interference instrument used (eg, NRS, VAS, etc). Although good congruence in patient self-report across instruments has been shown [75], and that the transfer of the assessment instrument to the electronic format generally appeared to be in good faith, as reported previously, differences in pain ratings across instruments are possible [76]. Irrespective of the observed heterogeneity, the correlation coefficients were strong across all studies with no reported coefficients less than 0.64, suggesting that heterogeneity should not temper the meta-analysis conclusion.
The collection of high-quality and complete patient-reported data is of utmost importance to clinicians, researchers, and study sponsors [12]. Data completeness was a commonly reported comparison outcome across data collection methods in the included studies. The results regarding the superiority of data completeness were mixed. However, the electronic method was most often associated with more complete data being collected. Ultimately, methodological and logistical issues related to paper-based data collection methods may support the use of electronic data capture. For instance, research has shown that the completeness and accuracy of pain data collected via paper methods is adversely impacted by patients back-filling diaries and, therefore, introducing recall bias into datasets (a behavior that can be rendered impossible using electronic methods) [8]. In addition, the capacity to efficiently and cost-effectively develop large databases for clinical and research purposes may be improved with electronic data capture. For instance, one of the studies included in this review [47] showed that over 4-fold more research assistant time was required to manage postoperative pain data collected using conventional means compared with electronic data. This finding suggests that cost savings may result from the use of electronic pain assessments in research, and this savings might be pronounced at scale. Furthermore, the likelihood of inaccurate or missing data in these databases resulting from human input error is reduced in the case of electronic entry [77].
Almost all studies that assessed ease indicated, in some manner, that electronic methods were easy to use, easy to understand, or easy to review or report pain. The time difference required to complete pain assessments via each data collection method was minimal, and the majority of studies showed that the electronic method required equal or less time to complete than conventional methods. The methodological advantages of electronic data capture include high-density sampling in all environments. Evidence of ease of use and efficiency in electronic data capture is useful to researchers and clinicians considering leveraging these utilities to collect repeated ecologically relevant pain assessments [78].
Electronic data capture was also shown to be a highly acceptable method for pain assessment and was more likely to be the method of choice for reporting by patients. These findings agree with those of previous studies comparing electronic and conventional methods [10]. Given the heterogeneity of electronic pain data capture methods, participant populations, and sampling densities of included studies, our results suggest acceptability across a range of data collection contexts. This result is meaningful as the acceptability of an intervention has been linked to adoption, especially in relation to long-term sustainability [79].

Limitations
Some included studies did not administer the same pain measurement instrument or use the same sampling schedule via electronic and conventional methods, making it difficult to directly compare results across modalities. Owing to variations in study design and the fact that our outcomes of interest were often times not the main objective of our included studies, we did not perform an assessment of quality for included studies; instead, we elected to include all identified studies in our review. Our results and conclusions are, therefore, the product of studies that may have included significant methodological weaknesses. In addition, as is an issue with all systematic reviews, we are constrained by possible publication bias, which was suggested by the funnel plot inspection of our quantitative synthesis data. However, given the objectives of the studies we included, we believe that the likelihood of a file-drawer effect is low. Finally, we included studies conducted in controlled (eg, research and health care institutions) and uncontrolled (eg, participant home) environments. We are, therefore, limited in our ability to make more definitive conclusions about our outcomes as they pertain to ecologically relevant data collection, which is considered a major methodological advantage of the electronic method.

Conclusions
Overall, this review demonstrates that electronic pain-related data capture methods are comparable with conventional methods in terms of score equivalence, data completeness, ease, efficiency, and acceptability. Specifically, pain-related outcome scores reported across methods were congruent in terms of score correlations and mean or median differences between scores. Data completeness, ease of use, efficiency, and acceptability outcomes were also comparable or superior using electronic data capture. Our results suggest that electronic methods are a feasible means to collect pain data, and the use of these methods is likely to increase with the ubiquitous use of mobile phones outside of the clinical or research setting. However, a critical caveat to this conclusion relates to the validation of pain instruments that are implemented electronically. To ensure the collection of accurate data, rigorous methods should be used to establish the sound psychometric properties of electronic pain measurement instruments. Validation of electronic methods will facilitate the capture of pain data in clinical settings but will also support its use in data collection for interventional research, an area that has largely not been explored to date [6].

Conflicts of Interest
PS works for and owns shares of a digital health company that makes electronic medical records. All other authors have no conflicts of interest to disclose.