This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
Satisfactory psychometric properties in offline questionnaires do not guarantee the same outcome in Web-based versions. Any construct that is measured online should be compared to a paper-based assessment so that the appropriateness of online questionnaire data can be tested. Little research has been done in this area regarding Attention-Deficit/Hyperactivity Disorder (ADHD) in adults.
The objective was to simultaneously collect paper-based and Web-based ADHD questionnaire data in adults not diagnosed with ADHD in order to compare the two data sources regarding their equivalence in raw scores, in measures of reliability, and in factorial structures.
Data from the German versions of the Connors Adult ADHD Rating Scales (CAARS-S), the Wender Utah Rating Scale (WURS-k), and the ADHD Self Rating Scale (ADHS-SB) were collected via online and paper questionnaires in a cross-sectional study with convenience sampling. We performed confirmatory factor analyses to examine the postulated factor structures in both groups separately and multiple group confirmatory factor analyses to test whether the postulated factor structures of the questionnaires were equivalent across groups. With Cronbach alpha, we investigated the internal consistency of the postulated factors in the different questionnaires. Mann-Whitney U tests with the effect size “Probability of Superiority (PS)” were used to compare absolute values in the questionnaires between the two groups.
In the paper-based sample, there were 311 subjects (73.3% female); in the online sample, we reached 255 subjects (69% female). The paper-based sample had a mean age of 39.2 years (SD 18.6); the Web-based sample had a mean age of 30.4 years (SD 10.5) and had a higher educational background. The original four factor structure of the CAARS-S could be replicated in both samples, but factor loadings were different. The Web-based sample had significantly higher total scores on three scales. The five-factor structure of the German short form of the WURS-k could be replicated only in the Web-based sample. The Web-based sample had substantially higher total scores, and nearly 40% of the Web-based sample scored above the clinically relevant cut-off value. The three-factor structure of the ADHS-SB could be replicated in both samples, but factor loadings were different. Women in the Web-based sample had substantially higher total scores, and 30% of the Web-based sample scored above the clinically relevant cut-off value. Internal consistencies in all questionnaires were acceptable to high in both groups.
Data from the Web-based administration of ADHD questionnaires for adults should not be used for the extraction of population norms. Separate norms should be established for ADHD online questionnaires. General psychometric properties of ADHD questionnaires (factor structure, internal consistency) were largely unaffected by sampling bias. Extended validity studies of existing ADHD questionnaires should be performed by including subjects with a diagnosis of ADHD and by randomizing them to Web- or paper-based administration.
Satisfactory psychometric properties in offline questionnaires do not guarantee the same outcome in Web-based versions. Any construct that is measured online should be compared to a paper-based assessment so that the appropriateness of online questionnaire data can be tested [
Several studies did not find substantial differences between Web-based and paper-based modes of administration [
Buchanan [
Attention-Deficit/Hyperactivity Disorder (ADHD), with its core symptoms of inattention, hyperactivity, and impulsivity, is listed under disorders usually first diagnosed in childhood or adolescence in DSM-IV and ICD-10. It was shown that ADHD often persists into adulthood with prevalence rates between 4 to 5% [
The Conners Adult ADHD Rating Scales (CAARS) [
We conducted a cross-sectional study on German adults with no serious chronic disease, who were over 18 years of age and without a lifetime diagnosis of ADHD. Participants in the paper-based sample were recruited by convenience sampling (university students, people from apprentice institutions, local neighborhoods, waiting areas such as airports, hairdressers, primary care physicians, and colleagues). Subjects were provided with a short study description and asked to complete the CAARS self-report (CAARS-S) as well as the German version of the Wender Utah Rating Scale (WURS-k), the German ADHD Self Rating Scale (ADHS-SB), and questions on age, gender, and education level. We disseminated approximately 500 printed questionnaires.
The Web-based questionnaire was also a cross-sectional convenience sample. We advertised for the online study on the websites of the Departments of General Practice/Family Medicine and Clinical Psychology at Philipps University Marburg and on a special Facebook page created exclusively for our study. Additionally, flyers with the online address of the Web-based questionnaire were distributed in the same recruitment areas as the paper-based questionnaires. For informed consent in the online study, the homepage prompted subjects to open a file with the study information and to check a box agreeing to participate in the study. Without checking this box, further pages of the questionnaire were not accessible. Since the survey was voluntary, all subjects had the ability to discontinue completing the questionnaire at any time. Subjects could see their progress in completing the questionnaire via a small progress bar on the upper right side of the screen. On average, subjects needed 15:34 minutes to complete the survey; the majority of participants completed the survey during the afternoon (hour 15, or 3 p.m.). At the end of the questionnaire, subjects had the opportunity to receive feedback on their responses. This indicated whether their scores were within the normal range or higher. In cases of the latter, no diagnosis was offered, but it was suggested to seek professional assessment. Data protection was insured in that only the principal investigator (HC) had access to the unipark page [
For development and testing, the paper versions of the questionnaires were entered into unipark. The research team and then students were asked to test this online version. The link was activated after testing for functionality and usability.
The survey was online from July 12, 2010, to August 30, 2011. On average, the page was accessed 26.16 times per week (view rate), though only 6.69 (25 %) subjects per week completed the survey (completion rate). Cookies were used to assign a unique user identifier to each client computer and were set on the first page. A session was valid for a total of 120 minutes.
The items were presented in the same order as the paper-and-pencil questionnaires, but only an average of six items were displayed per page (see
For analyses, only questionnaires where subjects indicated they had not received a lifetime diagnosis of ADHD were analyzed. Apart from replacing missing items in paper versions with the expectation-maximization or the multiple imputation algorithms, no statistical corrections were performed.
Our study conforms to the Declaration of Helsinki and was approved by the local ethics committee of the Faculty of Medicine at the Philipps University in Marburg, Germany.
Example of survey items per page.
The German version of the CAARS-S assesses ADHD symptoms in adults aged 18 years or older. Symptoms are rated on a Likert-type scale (0 =
The German version of the Wender Utah Rating Scale (WURS-k) [
The German ADHD Self Rating Scale (ADHS-SB) consists of the 18 DSM-IV items that are broken down into the factors “inattention” (9 items), “hyperactivity”, and “impulsivity” (9 items together) [
We performed confirmatory factor analyses to examine the postulated factor structures in both groups separately and multiple group confirmatory factor analyses using AMOS 19 to test whether the postulated factor structures of the questionnaires were equivalent across the groups. The factors were allowed to correlate because this is theoretically plausible in all three questionnaires. We used unweighted least squares as this estimation method makes no distributional assumptions [
Using multiple group analysis, we examined several levels of invariance between the groups. Configural invariance as the lowest level of invariance exists when the structure of the factor loading matrices is identical in all groups. Metric invariance occurs when factor loadings are identical in all groups. Scale invariance means that the measurement intercepts are the same across groups. Invariance of measurement errors exists if the error variables of measurement models, factor covariances, and factor variances are identical across groups.
We calculated several model fit indices to evaluate the results of our analyses. The root mean square residual (RMR) measures the mean absolute value of the covariance residuals [
With Cronbach alpha, we investigated the internal consistency of the postulated factors in the different questionnaires. Values >.70 are considered to be acceptable [
Huber’s M estimators were calculated when standard deviation values were close to their respective means, signaling high variance [
Mann-Whitney U tests were used to compare absolute values in the questionnaires between the two groups. The effect size “Probability of Superiority (PS)”, PS=U/(n1*n2), indicates the probability that a randomly selected subject of group n1 has a higher score than a randomly selected subject of group n2. A PS of .50 means that both groups are equal regarding a specific variable, and that there is no effect. Consequently, the larger the effect, the more PS deviates from .50 [
The alpha level for statistical significance was set at .05 (two-sided). Missing responses in the paper versions were replaced using the expectation-maximization or the multiple imputation algorithms [
In the paper-based sample, we received responses from 328 participants of which 6 indicated they were diagnosed with ADHD, and 11 did not answer this question. Therefore, a total sample of 311 subjects resulted, meaning that 65.6% of our 500 printed questionnaires were returned. This cannot be regarded as a return rate as we did not record those subjects who were personally asked and refused to participate. In the Web-based sample, we received responses from 273 participants of which 18 indicated that they were diagnosed with ADHD so that a total sample of 255 subjects resulted. The flow of subjects in our study samples is depicted in
Demographic characteristics of the paper-based and Web-based samples.
|
|
Paper-based (n=311) | Web-based (n=255) |
|
|
|
|
|
Female | 228 (73.3%) | 176 (69.0%) |
|
Male | 83 (26.7%) | 79 (31.0%) |
|
|
39.2 (18.6) | 30.4 (10.5) |
|
|
|
|
|
University | 61 (19.7%) | 73 (28.6%) |
|
Apprenticeship | 86 (27.7%) | 35 (13.7%) |
|
High school | 66 (21.3%) | 130 (51.0%) |
|
Middle school | 61 (19.7%) | 15 (5.9%) |
|
Basic school | 36 (11.6%) | 2 (0.8%) |
Flow of subjects in the paper-based and Web-based samples.
The samples did not differ with respect to gender (χ2 test:
There was a maximum of 9% missing values on single variables in the paper sample; these were missing completely at random (Little’s MCAR test,
The four-factor model (df = 813) was supported in both groups. In the paper-based sample, the standardized RMR was .08, the RMR was .04, the GFI was .93, and the AGFI was .92. In the Web-based sample, the standardized RMR was .07, the RMR was .05, the GFI was .98, and the AGFI was .97. These fit indices signal a good model fit.
Except for Item 3 (“I don’t plan ahead”) of the factor “inattention/memory”, Items 1 (“I like to be doing active things”) and 5 (“I am a risk-taker or a daredevil”) of the hyperactivity factor, and Item 43 (“I step on people’s toes without meaning to”) on the impulsivity factor, all other items have loadings > .40 in both samples.
The intercorrelations between the factors are consistently higher in the Web-based sample. The largest differences between the two groups were found in correlations involving “self-concept” (see
Multiple group analysis revealed that the factor structures were the same in both samples, signaling configural invariance (SRMR=.04, RMR=.04, GFI=.99, AGFI=.99). However, factor loadings were different (SRMR=.06, RMR=.10, GFI=.97, AGFI=.97) because all model fit indices increased > .01 when testing metric invariance. Consequently, other invariance assumptions were also not supported.
Cronbach alpha of the subscales ranged from .81 to .85 in the paper-based sample and from .89 to .91 in the Web-based sample.
Absolute subscale differences between the two groups were all significant with the Web-based sample scoring substantially higher with pronounced effect sizes (
As there is no normative data for Germany to date, we applied strict cut-off values based on American normative data T-value of 65, 94th percentile). The cut-off for “inattention/memory” was > 22 points; “hyperactivity” > 26 points; “impulsivity” > 22 points; and “self-concept” > 13 points. Regarding the total score of “inattention/memory”, 27 subjects (10.6%) in the Web-based sample scored above this value while 4 (1.3%) did so in the paper-based sample. This difference was significant with a moderate effect size (χ2 test:
There was a maximum of 2% missing values on single variables in the paper sample that were missing completely at random (Little’s MCAR test,
The model in the paper-based sample was not admissible because the covariance matrix between the postulated five factors was not positive definite. This leads to the conclusion that the model is wrong [
In the Web-based sample, the model (df = 179) was supported: SRMR = .07, RMR = .09, GFI = .98, AGFI = .97.
As shown in
Due to the rejected model in the paper-based group, no multiple group analysis could be calculated.
Cronbach alpha of the subscales ranged from .68 to .82 in the paper-based sample and from .79 to .89 in the Web-based sample. No coefficients were calculated for the subscale “social adaptation” as it consists of only two items.
Absolute values of the total score were significantly higher (Mann-Whitney U test:
After applying the recommended cut-off value for the total score (> 29 points) [
There was a maximum of 1.3% missing values on single variables in the paper sample, except for Item 4 that asks for difficulties in the field of work. Student participants in the paper version did not complete this item, so 28.5% of missing at random data resulted. These were replaced with the multiple imputation algorithm by five imputations. The following calculations were done separately for the five imputations, and the respective results were averaged. Enders [
The four-factor model (df = 132) was supported in both groups. In the paper-based sample, the standardized RMR was .06, the RMR was .06, the GFI was .97, and the AGFI was .96. In the Web-based sample, the standardized RMR was .06, the RMR was .04, the GFI was .98, and the AGFI was .98. These fit indices signal a good model fit.
As shown in
The correlation between the factors inattention and hyperactivity is significantly higher in the paper-based sample, while the intercorrelations between the other factors are higher in the Web-based sample (
Multiple group analysis revealed that the factor structures were the same in both samples, signaling configural invariance (SRMR = .06, RMR = .03, GFI = .98, AGFI = .98). However, factor loadings were different (SRMR = .15, RMR = .08, GFI = .83, AGFI = .79) because all model fit indices increased > .01 when testing metric invariance. Consequently, other invariance assumptions were also not supported.
Cronbach alpha of the subscales ranged from .60 to .83 in the paper-based sample and from .79 to .91 in the Web-based sample.
Absolute differences between the two groups were significant with the Web-based sample (mean 12.8, SD 9.1; Huber’s M estimator 11.1) scoring substantially higher than the paper-based sample (mean 2.2, SD 3.0; Huber’s M estimator 1.4) with a high effect size (Mann-Whitney U test:
After applying the recommended cut-off value for the total score (> 17 points) [
Correlations of CAARS items (loadings) with their postulated factors (latent constructs) in the paper-based and Web-based samples.
|
Paper-based | Web-based | |
|
|
|
|
|
ITEM 03 | .14 | .26 |
|
ITEM 07 | .53 | .74 |
|
ITEM 11 | .59 | .73 |
|
ITEM 16 | .47 | .67 |
|
ITEM 18 | .55 | .74 |
|
ITEM 32 | .57 | .61 |
|
ITEM 36 | .69 | .75 |
|
ITEM 40 | .52 | .73 |
|
ITEM 44 | .55 | .78 |
|
ITEM 49 | .55 | .73 |
|
ITEM 51 | .55 | .63 |
|
ITEM 66 | .57 | .74 |
|
|
|
|
|
ITEM 01 | .31 | .06 |
|
ITEM 05 | .43 | .32 |
|
ITEM 10 | .46 | .50 |
|
ITEM 13 | .70 | .74 |
|
ITEM 20 | .65 | .69 |
|
ITEM 25 | .57 | .46 |
|
ITEM 27 | .57 | .82 |
|
ITEM 31 | .65 | .66 |
|
ITEM 38 | .54 | .76 |
|
ITEM 46 | .67 | .80 |
|
ITEM 54 | .61 | .71 |
|
ITEM 57 | .73 | .81 |
|
|
|
|
|
ITEM 04 | .56 | .62 |
|
ITEM 08 | .49 | .72 |
|
ITEM 12 | .56 | .69 |
|
ITEM 19 | .66 | .58 |
|
ITEM 23 | .58 | .61 |
|
ITEM 30 | .64 | .76 |
|
ITEM 35 | .47 | .54 |
|
ITEM 39 | .60 | .67 |
|
ITEM 43 | .37 | .59 |
|
ITEM 47 | .59 | .78 |
|
ITEM 52 | .53 | .62 |
|
ITEM 61 | .61 | .69 |
|
|
|
|
|
ITEM 06 | .59 | .58 |
|
ITEM 15 | .60 | .75 |
|
ITEM 26 | .58 | .69 |
|
ITEM 37 | .81 | .86 |
|
ITEM 56 | .75 | .79 |
|
ITEM 63 | .81 | .84 |
Intercorrelations between the CAARS factors (latent constructs) in the paper-based and Web-based samples.
Factors | Paper-based | Web-based | ||
Hyperactivity | <--> | Impulsivity | .73 | .81 |
Inattention/Memory | <--> | Hyperactivity | .54 | .74 |
Inattention/Memory | <--> | Impulsivity | .65 | .79 |
Inattention/Memory | <--> | Self-concept | .47 | .74 |
Hyperactivity | <--> | Self-concept | .24 | .57 |
Impulsivity | <--> | Self-concept | .45 | .71 |
Means, standard deviations, and Huber’s M estimators of the CAARS subscales in the paper-based and Web-based samples with their respective
|
Paper-based | Web-based | Mann-Whitney U Test ( |
Inattention/Memory | 8.6 (SD 4.8) |
12.3 (SD 7.1) |
<.001; PS=.35 |
Hyperactivity | 9.0 (SD 5.3) |
11.2 (SD 6.1) |
<.001; PS=.38 |
Impulsivity | 9.4 (SD 5.2) |
12.3 (SD 6.6) |
<.001; PS=.37 |
Self-concept | 5.6 (SD 3.6) |
7.5 (SD 4.3) |
<.001; PS=.37 |
aHuber’s M estimator.
bPS = probability of superiority.
Correlations of WURS-k items (loadings) with their postulated factors (latent constructs) in the Web-based sample.
|
|
Web-based |
|
|
|
|
ITEM 01 | .82 |
|
ITEM 02 | .76 |
|
ITEM 03 | .78 |
|
ITEM 06 | .77 |
|
ITEM 10 | .75 |
|
ITEM 15 | .59 |
|
ITEM 17 | .67 |
|
ITEM 24 | .51 |
|
|
|
|
ITEM 05 | .75 |
|
ITEM 11 | .83 |
|
ITEM 13 | .83 |
|
ITEM 16 | .89 |
|
|
|
|
ITEM 07 | .74 |
|
ITEM 09 | .62 |
|
ITEM 18 | .74 |
|
ITEM 19 | .80 |
|
|
|
|
ITEM 08 | .88 |
|
ITEM 21 | .54 |
|
ITEM 22 | .79 |
|
|
|
|
ITEM 20 | .64 |
|
ITEM 23 | .33 |
Intercorrelations between the WURS-k factors (latent constructs) in the Web-based sample.
Factors | Web-based | ||
Inattention | <--> | Impulsivity | .79 |
Impulsivity | <--> | Anxiety/Depression | .69 |
Inattention | <--> | Anxiety/Depression | .73 |
Inattention | <--> | Social adaptation | .50 |
Impulsivity | <--> | Oppositional behavior | .72 |
Impulsivity | <--> | Social adaptation | .49 |
Anxiety/Depression | <--> | Oppositional behavior | .35 |
Anxiety/Depression | <--> | Social adaptation | .57 |
Oppositional behavior | <--> | Social adaptation | .58 |
Inattention | <--> | Oppositional behavior | .72 |
Correlations of ADHS-SB items (loadings) with their postulated factors (latent constructs) in the paper-based and Web-based samples.
|
Paper-based | Web-based | |
|
|
|
|
|
ITEM 01 | .47 | .72 |
|
ITEM 02 | .48 | .73 |
|
ITEM 03 | .56 | .68 |
|
ITEM 04 | .40 | .65 |
|
ITEM 05 | .45 | .63 |
|
ITEM 06 | .22 | .62 |
|
ITEM 07 | .48 | .54 |
|
ITEM 08 | .60 | .72 |
|
ITEM 09 | .45 | .64 |
|
|
|
|
ITEM 10 | .71 | .74 | |
|
ITEM 11 | .66 | .69 |
|
ITEM 12 | .58 | .82 |
|
ITEM 13 | .54 | .65 |
|
ITEM 14 | .31 | .59 |
|
|
|
|
|
ITEM 15 | .68 | .72 |
|
ITEM 16 | .49 | .72 |
|
ITEM 17 | .48 | .73 |
|
ITEM 18 | .58 | .58 |
Intercorrelations between the ADHS-SB factors (latent constructs) in the paper-based and Web-based samples.
Factors | Paper-based | Web-based | ||
Inattention | <--> | Hyperactivity | .92 | .63 |
Hyperactivity | <--> | Impulsivity | .64 | .80 |
Inattention | <--> | Impulsivity | .66 | .72 |
We compared Web-based and paper-based administrations of three ADHD questionnaires for adults. Subjects in the online sample were older and had a higher educational background. The original four-factor structure of the Conners Adult ADHD Rating Scales could be replicated in both samples, but factor loadings were different. Internal consistencies were high in both groups, but the Web-based sample had significantly higher total scores in three subscales with 7.8 to 11.8% above clinically relevant cut-off values, compared to 1.3 to 4.2% in the paper-based sample. The five-factor structure of the German short form of the Wender Utah Rating Scale could be replicated only in the Web-based sample. Internal consistencies were acceptable to high in both groups. The Web-based sample had substantially higher total scores and nearly 40% of the Web-based sample scored above the clinically relevant cut-off value. The three-factor structure of the ADHD Self Rating Scale could be replicated in both samples, but factor loadings were different. Internal consistencies were acceptable to high in both groups. The Web-based sample had substantially higher total scores, and 30% of the Web-based sample scored above the clinically relevant cut-off value. Therefore, psychometric properties were similar in both samples, but the Web-based sample had substantially higher scores on all three questionnaires.
The relatively high dropout rate in our Web-based sample is also reported in the literature. Additional informed consent procedures were shown to increase early dropout in Web-based studies [
Demographic differences (younger age, higher education) might have influenced the results [
Our results contradict the conclusion of Gosling et al [
On the other hand, our results corroborate the assumption of Rhodes et al [
Several limitations have to be mentioned. We did not randomize subjects to online and paper versions, so differences between the two groups might have arisen by sampling biases and should be replicated under randomized conditions. Different recruitment strategies for the paper and online samples might have influenced the results. Although relatively high discontinuation rates are common in online research, they might have caused bias in the results. In future online studies, leaving out questions should also be possible to create conditions similar to paper administration.
Data from the Web-based administration of ADHD questionnaires for adults should not be used for the extraction of population norms. Separate norms should be established for ADHD online questionnaires. General psychometric properties of ADHD questionnaires (factor structure, internal consistency) were largely unaffected by sampling bias. Extended validity studies of existing ADHD questionnaires should be performed by including subjects with a diagnosis of ADHD and by randomizing them to Web- or paper-based administration.
Attention-Deficit/Hyperactivity Disorder
ADHD Self Rating Scale
Adjusted Global Fit Index
Conners Adult ADHD Rating Scales
CAARS self-report
Diagnostic and Statistical Manual of Mental Disorders-Fourth Edition
Global Fit Index
International Classification of Diseases—Tenth Edition
Missing completely at random
Probability of Superiority
Root Mean Square Residual
Standardized Root Mean Square Residual
Wender Utah Rating Scale-Short Form
This study received no financial grants.
None declared.