This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
Self-report measures can guide clinical decisions and are useful when evaluating treatment outcomes. However, many clinicians do not use self-report measures systematically in their clinical practice. Internet-based questionnaires could facilitate administration, but the psychometric properties of the online version of an instrument should be explored before implementation. The recommendation from the International Test Commission is to test the psychometric properties of each questionnaire separately.
Our objective was to compare the psychometric properties of paper-and-pencil versions and Internet versions of two questionnaires measuring depressive symptoms.
The 87 participating patients were recruited from primary care and psychiatric care within the public health care system in Sweden. Participants completed the Beck Depression Inventory (BDI-II) and the Montgomery-Åsberg Depression Rating Scale—Self-rated (MADRS-S), both on paper and on the Internet. The order was randomized to control for order effects. Symptom severity in the sample ranged from mild to severe depressive symptoms.
Psychometric properties of the two administration formats were mostly equivalent. The internal consistency was similar for the Internet and paper versions, and significant correlations were found between the formats for both MADRS-S (
The MADRS-S can be transferred to online use without affecting the psychometric properties in a clinically meaningful way. The full BDI-II also seems to retain its properties when transferred; however, the item measuring suicidality in the Internet version needs further investigation since it was associated with a lower score in this study. The use of online questionnaires offers clinicians a more practical way of measuring depressive symptoms and has the potential to save resources.
Routine use of self-report measures (of depressive symptoms, for example) can be useful in clinical settings. A recent study found that clinicians in psychiatric practice rated a self-report measure as helpful for making treatment decisions in 93% of patient visits [
One medium that has the potential to make self-report questionnaires easier to use is the Internet. Traditional paper-and-pencil questionnaires are now being complemented by Internet-based questionnaires that can be completed anywhere, reach their destination instantly, automatically calculate scores correctly, and be stored in a practical way. By employing Internet-based questionnaires, clinicians could easily access information about symptom levels and use this information to inform their decisions about treatment. Using the Internet to facilitate this kind of “reflective practice” has been suggested previously [
There is also an indication that patients may prefer Internet-based questionnaires. One study found that more people reported a preference for responding to mental health questionnaires on a computer compared with answering on paper [
Since the spread of the Internet, researchers and test developers have adapted several tests for Internet administration by simply moving the items from established paper questionnaires to websites. However, it has been argued that we cannot assume that the psychometric properties remain the same after such adaptation [
The International Test Commission (ITC) guidelines on good practice in Internet-based testing contain recommendations about the process of adapting an established paper-and-pencil questionnaire for online use. The equivalence of the psychometric properties of the two versions is a central issue, and the ITC recommends presenting evidence that the two versions produce scores with comparable means and standard deviations, comparable reliabilities, and a correlation at the expected level from the reliability estimates. It was also stated that there should be the same level of test taker control (for instance the possibility to review or skip items in a similar fashion) [
In two previous randomized studies [
Previous research on these questionnaires indicates equivalence between Internet and paper versions. Participants in those trials, however, are probably not representative of patients seeking help at psychiatric or primary care clinics, making the results less relevant to this context. From a health care perspective, a study of clinic patients could increase the external validity of a psychometric study, not only because such a study population would consist of persons exhibiting higher levels of depressive symptoms, but also because it is reasonable to assume that clinic patients would have less experience with computers compared with study populations composed of students. Having higher levels of depressive symptoms, individuals in a clinical sample may have some concentration difficulties and therefore might also be more negatively affected by new technology.
The aim of this study was to compare the psychometric properties of two administration formats for the BDI-II and the MADRS-S using a sample of clinic patients. We contrasted paper-and-pencil versus Internet administration of the two questionnaires.
Participants completed the paper and Internet-based questionnaires as part of their registration for a clinical trial of Internet-based treatment for depression. Participants were recruited within public health care in Örebro County Council and Värmland County Council in Sweden. Patients and staff in primary care and psychiatric care were informed about the trial at staff meetings and via posters in waiting rooms. Both referrals and self-referrals were accepted. Participants were required to be at least 18 years of age, have access to a computer with an Internet connection, be fluent in the Swedish language, and be willing to attend two interviews with a psychologist.
All patients who expressed interest in the clinical trial received a letter with an informed consent form. When written consent was received from the patient, the patient was randomized to complete the questionnaires on paper or Internet first to control for order effects. A block randomization sequence was created by a statistician and concealed from study personnel by the use of sealed envelopes. After randomization, a letter was sent to the patient’s home that contained instructions on how to proceed. Patients randomized to completing the paper version first received the questionnaires together with the instructions and a return envelope. Patients randomized to completing the Internet version first received a letter with instructions on how to fill out the Internet version as well as a user name and password. As soon as the trial staff received the responses from the first administration of the questionnaires, a new letter was sent out that contained instructions for the opposite administration format. MADRS-S was completed first and BDI-II second both on the Internet and on paper. Group 1 (n = 43) completed the paper version first and then the Internet version. Group 2 (n = 44) answered the questionnaires in the opposite order. Ethical approval was obtained from the Regional Ethics Committee in Uppsala, Sweden.
A website was constructed for the study, and all Internet-based measurements were carried out by patients logging in and completing the questionnaires under their user name. All items had to be answered one-by-one, and only one item at a time appeared on the screen. After the answer to an item had been provided, the next item appeared on the screen. It was possible to change the answers of previous items until the last item was completed for each test (BDI-II and MADRS-S).
The Montgomery-Åsberg Depression Rating Scale—Self-rated (MADRS-S) is a 9-item self-report scale that measures depressive symptoms. The patients are asked to rate their symptom severity on a scale ranging from 0 to 6, resulting in a total score ranging from 0 to 54. A higher score indicates a higher level of depressive symptoms. Satisfactory internal consistency was found in a recent study, in which a Cronbach alpha of .84 was reported [
The Beck Depression Inventory—second edition (BDI-II) is a 21-item self-report scale of depressive symptoms. Each item yields a score ranging from 0 to 3 resulting in a total score ranging from 0 to 63, and a higher score indicates a higher level of depressive symptoms. The internal consistency of the BDI-II has been reported to be good in several studies, for example, a Cronbach alpha of .90 has been reported [
Cronbach alpha coefficients were used to estimate internal consistency, and Pearson correlations were calculated between the different administration formats. To test differences between the two orders of administration (paper first or Internet first), and formats (paper or Internet) two-way Analyses of Variance (ANOVA) were calculated. Significant interactions were post tested with
Out of 119 patients that showed interest in a clinical trial, 112 gave written consent and were asked to fill out the questionnaires on both Internet and paper. The response rate was 77.7%, that is, 25 patients did not complete the task (4 filled out questionnaires only on the Internet, 8 filled out questionnaires only on paper and 13 didn’t fill out any). A total of 87 patients filled out both questionnaires on paper and on the Internet and are included in the analyses. On average, 9.79 days (SD 9.83) passed between the first and second assessment. Of the 87 study participants, 57 (65.5%) were women, and 30 (34.5%) were men; the mean age was 41.1 years (SD 13.0), ranging from 20 to 72 years of age. The degree of depressive symptoms ranged from minimal to severe, with a range from 7 to 57 on the BDI-II and 6 to 39 on the MADRS-S (paper versions). The mean scores on the paper version of the MADRS-S indicated moderate depressive symptoms, and on the paper BDI-II the mean value indicated severe depressive symptoms.
Cronbach alpha levels were similar for the Internet and paper versions of the Montgomery-Åsberg Depression Rating Scale – Self-rated (MADRS-S). The alpha levels for the different orders and formats of administration are presented in
Internal consistency (Cronbach alpha) for the two groups and administration formats
Paper-First Group |
Internet-First |
Paper-First Group |
Internet-First |
|
MADRS-S | .81 | .73 | .81 | .81 |
BDI-II | .90 | .87 | .89 | .89 |
Pearson Correlations between scores from paper and the Internet
Paper First |
Internet First |
Both Groups Together |
|
MADRS-S | .86 | .80 | .84 |
MADRS-S item 9 | .88 | .64 | .79 |
BDI-II | .91 | .85 | .89 |
BDI-II item 9 | .84 | .73 | .80 |
aAll correlations are significant at the
MADRS-S item, mean score on paper and Internet and the correlation (Pearson) between them
Item | Paper Format |
Internet Format |
Correlationa |
1 Mood | 2.38 (1.73) | 2.43 (1.34) | .57 |
2 Anxiety | 3.53 (1.36) | 3.55 (1.02) | .58 |
3 Sleep | 2.32 (1.45) | 2.32 (1.52) | .66 |
4 Appetite | 1.36 (1.36) | 1.48 (1.26) | .65 |
5 Ability to concentrate | 2.83 (1.38) | 2.90 (1.32) | .71 |
6 Initiative | 3.18 (1.48) | 3.22 (1.43) | .74 |
7 Emotional involvement | 2.66 (1.28) | 2.63 (1.21) | .65 |
8 Pessimism | 3.20 (1.38) | 3.48 (1.21) | .63 |
9 Zest for life | 2.34 (1.26) | 2.41 (1.10) | .79 |
Total | 24.43 (6.97) | 23.79 (7.98) | .84 |
aAll correlations are significant at the
For the MADRS-S there was no significant main effect for administration format (paper or Internet). There was, however, a significant main effect of administration order, indicating higher scores for the group that answered the questionnaire on paper first compared with the Internet-first group (means 26.2 vs 22.08), and the effect size was moderate (Cohen’s
For MADRS-S item 9 (suicidality) there was no significant main effect for format or order of administration. There was, however, a significant interaction effect between format and order of administration. The subsequent
Means (SD), main effects, and interaction effect
Group | Paper Format | Internet Format | Main Effect | Interaction | ||
Mean (SD) | Mean (SD) | Format |
Order |
|
||
MADRS-S | Paper first | 26.35 (7.93) | 26.02 (7.32) | |||
Internet first | 21.30 (7.28) | 22.86 (6.31) | 1.88, |
7.68, |
4.36, |
|
MADRS-S item 9 | Paper first | 2.67 (1.39) | 2.56 (1.24) | |||
Internet first | 2.02 (1.05) | 2.27 (0.92) | 0.68, |
3.95, |
5.1, |
|
BDI-II | Paper first | 34.21 (10.9) | 31.93 (10.54) | |||
Internet first | 26.98 (9.34) | 27.48 (9.2) | 2.97, |
7.86, |
7.26, |
|
BDI-II item 9 | Paper first | 0.88 (0.66) | 0.72 (0.66) | |||
Internet first | 0.52 (0.66) | 0.5 (0.55) | 4.28, |
5.08, |
2.44, |
BDI-II item, mean score on paper and Internet, and the correlation between them
Item | Paper Format Mean (SD) | Internet Format |
Correlationa |
(1) Sadness | 1.26 (.58) | 1.29 (.61) | .70 |
(2) Pessimism | 1.38 (.72) | 1.39 (.62) | .66 |
(3) Feelings of failure | 1.55 (.92) | 1.53 (.86) | .68 |
(4) Loss of pleasure | 1.72 (.77) | 1.59 (.77) | .70 |
(5) Guilty feelings | 1.53 (1.0) | 1.53 (.94) | .69 |
(6) Punishment feelings | 0.68 (.97) | 0.80 (1.03) | .74 |
(7) Self-dislike | 1.72 (.98) | 1.78 (1.02) | .59 |
(8) Self-criticism | 1.43 (.90) | 1.43 (.86) | .59 |
(9) Suicidal thoughts or wishes | 0.70 (.68) | 0.61 (.62) | .80 |
(10) Crying | 1.68 (1.21) | 1.69 (1.19) | .80 |
(11) Agitation | 1.25 (.81) | 1.05 (.70) | .66 |
(12) Loss of interest | 1.49 (.83) | 1.48 (.85) | .63 |
(13) Indecisiveness | 1.53 (.91) | 1.53 (.91) | .71 |
(14) Worthlessness | 1.43 (.90) | 1.38 (.90) | .79 |
(15) Loss of energy | 1.84 (.64) | 1.68 (.69) | .61 |
(16) Change in sleeping patterns | 1.64 (.85) | 1.59 (.87) | .66 |
(17) Irritability | 1.47 (.89) | 1.40 (.90) | .68 |
(18) Changes in appetite | 1.39 (.98) | 1.22 (.99) | .64 |
(19) Concentration difficulty | 1.52 (.66) | 1.44 (.68) | .63 |
(20) Tiredness or fatigue | 1.85 (.77) | 1.79 (.88) | .71 |
(21) Loss of interest in sex | 1.48 (1.06) | 1.49 (1.04) | .88 |
Total | 30.55 (10.72) | 29.68 (10.07) | .89 |
aAll correlations are significant at the
For the BDI-II Cronbach alpha levels were similar in the Internet and paper versions. The alpha levels for the different orders and formats of administration are presented in
For the Beck Depression Inventory (BDI-II), there was no significant main effect for administration format (paper or Internet). There was, however, a significant main effect for administration order, indicating higher scores for the paper first group compared with the Internet-first group (means 33.07 vs 27.23), and the effect size was moderate (
For BDI item 9 (suicidality), there were significant main effects of format and order of administration, but no significant interaction between them. The mean score for the paper BDI item 9 (both groups) was higher than the Internet BDI item 9 (means 0.7 vs 0.61) and the effect size was small (Cohen’s
The internal consistency of both questionnaires was similar across administration formats, and medium to high correlations were found between paper and Internet total scores, and for each individual item. No significant main effect separated the paper total scores from the Internet total scores, but interaction effects were found as well as main effects for order of administration. Participants rated their suicidality on the same level on paper and Internet-based MADRS-S, but rated lower suicidality levels on the Internet BDI-II compared with the paper version.
These results do not indicate any clinically relevant differences between the total scores from paper and Internet versions of the BDI-II and MADRS-S, but rather that people suffering from depression rate their overall depressive symptoms on the same level with both administration formats. An important clinical implication is that it is probable that the questionnaires tested in this study can be used online with the same cutoff points and without changed internal consistency. Online versions should make it easier for clinicians to administer these questionnaires, hopefully making them more common in everyday practice.
If people tend to rate their suicidality lower on the Internet, this has to be taken into account in clinical use. In a previous study [
In an earlier study [
A case could be made for a possible difference between the two administration formats, mainly concerning computer anxiety and social disinhibition on the Internet, although this was not directly investigated in the current study. Since no clinically relevant differences were found, these arguments are probably less important in our study. In the case of computer anxiety, a recently published study [
The significant main effects for order of administration mean that the paper-first group had higher scores regardless of administration format. It is difficult to interpret these results, but one possible explanation is a small difference surrounding the administration of the paper and Internet versions. Before completing the Internet versions, patients had to identify themselves by means of a personal username and password, after which they were asked some questions about sociodemographic characteristics. It is unclear whether this could affect results of both administration formats. Another possible contributing factor could be an actual difference in depressive symptoms between the two groups. The interaction effects found in this study indicate that the order of administration affects the difference between the first and the second measurement if different administration formats are used. In a clinical context it is therefore important to use the same administration format for all measurements made by the same individual.
Since all patients in this study showed an interest in an Internet-based treatment trial, it is possible that they are relatively positive toward using the Internet, which could limit the generalizability of the results. Another limitation of this study is that although the MADRS-S has a maximum score of 54, no subjects in the sample had a score higher than 39 (on the paper version). The full range of the scale was not used and thus the results should not be generalized outside the score range of the sample in the study. A third limitation is that the design did not address the question of test–retest reliability of the Internet versions of the tests. Future studies should address this question by using repeated measures with Internet-based tests. A fourth limitation is that computer anxiety and social disinhibition were not measured. A fifth possible limitation is that the items were presented one at a time on the Internet, which differs from the paper versions. However, earlier research shows that the two methods are psychometrically equivalent [
The results in this study, and in previous studies, suggest that the Internet-based BDI-II generates a total score that does not differ in a clinically meaningful way from the total score generated from the paper version. The suicidality rating in the BDI-II, however, needs further investigation since we found a small but significant difference in this study, but no difference was found in a previous study. Future research on this is needed and should be made with samples with higher levels of suicidality compared with the levels found in this study.
The psychometric properties of MADRS-S were not affected when the scale was transferred for use on the Internet in this study. Since this finding is consistent with two previous studies, it seems safe to transfer the MADRS-S to online use without affecting the psychometric properties in any clinically relevant way. Internet-based MADRS-S is, therefore, a clear candidate to complement traditional self-report measures in clinical work.
Besides the psychometric properties, however, there might also be other problems that have to be addressed before clinical implementation of Internet-based self-report measures, one of which is the security of information technology solutions. Another challenge may be test taker preferences. If patients, or subgroups of patients, find Internet-based questionnaires less attractive than traditional administration formats, it could lower response rates. Future research should investigate the possibilities and challenges associated with implementing online questionnaires in clinical practice. Patient acceptability, information security, and cost effectiveness are some important aspects.
An agreement was made with Harcourt Assessment for administration of the BDI-II on the Internet. The study was partly funded by the Research Committee of Örebro County Council.
None declared
Beck Depression Inventory—Second Edition
International Test Commission
Montgomery-Åsberg Depression Rating Scale—Self-rated