Background

JMIR

J Med Internet Res

Journal of Medical Internet Research

1438-8871

Gunther Eysenbach

JMIR Publications Inc., Toronto, Canada

v15i3e47

23518816

10.2196/jmir.2225

Original Paper

Comparison of Web-Based and Paper-Based Administration of ADHD Questionnaires for Adults

Eysenbach

Gunther

Whitehead

Lisa

Donker

Tara

Hirsch

Oliver

PhD 1

Faculty of Medicine Department of General Practice/Family Medicine Philipps University Marburg

Karl-von-Frisch-Str.4

Marburg, 35043

Germany 49 64212826520 49 64212865121 oliver.hirsch@staff.uni-marburg.de

Hauschild

Franziska

MD 1 Schmidt

Martin H

MSc 2 Baum

Erika

MD 1 Christiansen

Hanna

PhD 2

¹ Faculty of Medicine Department of General Practice/Family Medicine Philipps University Marburg

Marburg

Germany ² Faculty of Psychology Department of Clinical Psychology Philipps University Marburg

Marburg

Germany

Corresponding Author: Oliver Hirsch oliver.hirsch@staff.uni-marburg.de

03 2013

21 03 2013

15 3

e47

17 06 2012 26 09 2012 15 10 2012 17 01 2013

©Oliver Hirsch, Franziska Hauschild, Martin H. Schmidt, Erika Baum, Hanna Christiansen. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 21.03.2013.

2013

This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.

Background

Objective

The objective was to simultaneously collect paper-based and Web-based ADHD questionnaire data in adults not diagnosed with ADHD in order to compare the two data sources regarding their equivalence in raw scores, in measures of reliability, and in factorial structures.

Methods

Data from the German versions of the Connors Adult ADHD Rating Scales (CAARS-S), the Wender Utah Rating Scale (WURS-k), and the ADHD Self Rating Scale (ADHS-SB) were collected via online and paper questionnaires in a cross-sectional study with convenience sampling. We performed confirmatory factor analyses to examine the postulated factor structures in both groups separately and multiple group confirmatory factor analyses to test whether the postulated factor structures of the questionnaires were equivalent across groups. With Cronbach alpha, we investigated the internal consistency of the postulated factors in the different questionnaires. Mann-Whitney U tests with the effect size “Probability of Superiority (PS)” were used to compare absolute values in the questionnaires between the two groups.

Results

In the paper-based sample, there were 311 subjects (73.3% female); in the online sample, we reached 255 subjects (69% female). The paper-based sample had a mean age of 39.2 years (SD 18.6); the Web-based sample had a mean age of 30.4 years (SD 10.5) and had a higher educational background. The original four factor structure of the CAARS-S could be replicated in both samples, but factor loadings were different. The Web-based sample had significantly higher total scores on three scales. The five-factor structure of the German short form of the WURS-k could be replicated only in the Web-based sample. The Web-based sample had substantially higher total scores, and nearly 40% of the Web-based sample scored above the clinically relevant cut-off value. The three-factor structure of the ADHS-SB could be replicated in both samples, but factor loadings were different. Women in the Web-based sample had substantially higher total scores, and 30% of the Web-based sample scored above the clinically relevant cut-off value. Internal consistencies in all questionnaires were acceptable to high in both groups.

Conclusions

Data from the Web-based administration of ADHD questionnaires for adults should not be used for the extraction of population norms. Separate norms should be established for ADHD online questionnaires. General psychometric properties of ADHD questionnaires (factor structure, internal consistency) were largely unaffected by sampling bias. Extended validity studies of existing ADHD questionnaires should be performed by including subjects with a diagnosis of ADHD and by randomizing them to Web- or paper-based administration.

computers Attention-Deficit/Hyperactivity Disorder questionnaires Internet psychometrics

Introduction

Satisfactory psychometric properties in offline questionnaires do not guarantee the same outcome in Web-based versions. Any construct that is measured online should be compared to a paper-based assessment so that the appropriateness of online questionnaire data can be tested [1]. After analyzing common preconceptions about Internet questionnaires, Gosling et al [2] conclude that the quality of online data is comparable to traditional paper-and-pencil methods. The authors argue that Internet samples are not representative of the general population, but that traditional methods also do not achieve this. Web-based questionnaires are even considered to be more feasible in order to collect data in large population-based epidemiological studies [3].

Several studies did not find substantial differences between Web-based and paper-based modes of administration [4-7]. They were able to show similar psychometric properties or identical factor structures [8-10]. Those latter studies did not use clinical questionnaires to assess psychopathology, but no significant differences were found when depression questionnaires were administered in paper and online versions among the same individuals [11]. Comparable results regarding psychometric properties and absolute differences in Internet versus paper-and-pencil administration of several panic and agoraphobia questionnaires were found by Carlbring et al [12]. Both studies included pre-selected or self-recruited patient groups applying for treatment. A Web Screening Questionnaire for mental disorders yielded a high number of false positives though [13], while others reported satisfactory diagnostic accuracy in the Web-based detection of depressive disorders [14].

Buchanan [15] is skeptical of using Web-based questionnaires for normative comparisons, especially in clinical psychology. Several studies showed that score distributions between paper and online administration differed, with higher scores in Internet samples [16]. He therefore argued not to compare online questionnaire data to established norms.

Attention-Deficit/Hyperactivity Disorder (ADHD), with its core symptoms of inattention, hyperactivity, and impulsivity, is listed under disorders usually first diagnosed in childhood or adolescence in DSM-IV and ICD-10. It was shown that ADHD often persists into adulthood with prevalence rates between 4 to 5% [17-19]. We found only two studies in which measurement of ADHD via Web-based versions was examined. Steenhuis et al [20] applied a within-subject design to administer the ADHD section of the Diagnostic Interview Schedule for Children (DISC-IV) to parents. Intraclass correlation coefficients ranged between .87 and .94. A qualitative study examined acceptability of the Web-based version of the ADHD rating scale T-SKAMP in 19 teachers [21]. A large majority of teachers preferred the Web-based version over a paper version. They perceived it to be easier, shorter, simpler, and more informative, time saving, and flexible. Communication between teachers and physicians might be improved with this tool. No further ADHD diagnostic instruments, such as the SNAP and SWAN Scale for children [22] or common adult ADHD assessment instruments (see [23] for a review) were implemented as Web-based versions.

The Conners Adult ADHD Rating Scales (CAARS) [24] had satisfactory psychometric properties in their German translation [25-27] and were found to have the same factor structure as the American original, enabling them to be used for cross-cultural research. The aim of our study was to simultaneously collect paper-based and Web-based CAARS questionnaire data together with two other established ADHD questionnaires available in German and to compare the two data sources by different statistical measures regarding their equivalence in raw scores, measures of reliability, and factorial structures. To do so, we intended to collect normative data online and via paper questionnaires from subjects without a diagnosis of ADHD in order to examine whether online normative data can be merged with data from paper questionnaires.

Methods Recruitment

We conducted a cross-sectional study on German adults with no serious chronic disease, who were over 18 years of age and without a lifetime diagnosis of ADHD. Participants in the paper-based sample were recruited by convenience sampling (university students, people from apprentice institutions, local neighborhoods, waiting areas such as airports, hairdressers, primary care physicians, and colleagues). Subjects were provided with a short study description and asked to complete the CAARS self-report (CAARS-S) as well as the German version of the Wender Utah Rating Scale (WURS-k), the German ADHD Self Rating Scale (ADHS-SB), and questions on age, gender, and education level. We disseminated approximately 500 printed questionnaires.

The Web-based questionnaire was also a cross-sectional convenience sample. We advertised for the online study on the websites of the Departments of General Practice/Family Medicine and Clinical Psychology at Philipps University Marburg and on a special Facebook page created exclusively for our study. Additionally, flyers with the online address of the Web-based questionnaire were distributed in the same recruitment areas as the paper-based questionnaires. For informed consent in the online study, the homepage prompted subjects to open a file with the study information and to check a box agreeing to participate in the study. Without checking this box, further pages of the questionnaire were not accessible. Since the survey was voluntary, all subjects had the ability to discontinue completing the questionnaire at any time. Subjects could see their progress in completing the questionnaire via a small progress bar on the upper right side of the screen. On average, subjects needed 15:34 minutes to complete the survey; the majority of participants completed the survey during the afternoon (hour 15, or 3 p.m.). At the end of the questionnaire, subjects had the opportunity to receive feedback on their responses. This indicated whether their scores were within the normal range or higher. In cases of the latter, no diagnosis was offered, but it was suggested to seek professional assessment. Data protection was insured in that only the principal investigator (HC) had access to the unipark page [28] that generated and stored the data. Additionally, no personal information was requested of the subjects.

For development and testing, the paper versions of the questionnaires were entered into unipark. The research team and then students were asked to test this online version. The link was activated after testing for functionality and usability.

The survey was online from July 12, 2010, to August 30, 2011. On average, the page was accessed 26.16 times per week (view rate), though only 6.69 (25 %) subjects per week completed the survey (completion rate). Cookies were used to assign a unique user identifier to each client computer and were set on the first page. A session was valid for a total of 120 minutes.

The items were presented in the same order as the paper-and-pencil questionnaires, but only an average of six items were displayed per page (see Figure 1). When subjects did not complete all items on a page, they were asked to fill in the missing items in order to activate the “next” button. Therefore, no missing data could result in the online sample. Subjects were always able to review and change their answers with a “back” button. If subjects decided not to complete an answer, they could stop the survey. The majority of participants (57.44 %) discontinued filling out the survey on the first page.

For analyses, only questionnaires where subjects indicated they had not received a lifetime diagnosis of ADHD were analyzed. Apart from replacing missing items in paper versions with the expectation-maximization or the multiple imputation algorithms, no statistical corrections were performed.

Our study conforms to the Declaration of Helsinki and was approved by the local ethics committee of the Faculty of Medicine at the Philipps University in Marburg, Germany.

Figure 1

Example of survey items per page.

Measurements Connors Adult ADHD Rating Scales

The German version of the CAARS-S assesses ADHD symptoms in adults aged 18 years or older. Symptoms are rated on a Likert-type scale (0 = not at all/never to 3 = very much/very frequently). The long version consists of 66 items, but only 42 items were included in the original factor analysis by Conners et al [24] due to statistical restrictions made by the authors. Four factors emerged from their analyses: inattention/memory problems, hyperactivity/restlessness, impulsivity/emotional ability, and problems with self-concept. Confirmatory factor analyses of the German version in healthy adults and ADHD patients supported this factor analytic solution [25,27]. The four subscales were significantly influenced by age, gender, and the number of years in education. Symptom severity decreased with increasing age, males scored higher than females on hyperactivity and sensation-seeking behavior, and females scored higher than males on problems with self-concept. Overall symptom ratings were higher for individuals who had received less education. Test-retest reliability ranged between .85 and .92; sensitivity and specificity were high for all four subscales. The CAARS-S represents a reliable and cross-culturally valid measure of current ADHD symptoms in adults [26].

Wender Utah Rating Scale

The German version of the Wender Utah Rating Scale (WURS-k) [29,30] retrospectively assesses ADHD-relevant childhood behaviors and symptoms in adults. It consists of 25 items that distinguished patients with ADHD from a nonpatient comparison group. Subjects are instructed to rate 25 items that complete sentence stems such as ‘‘As a child I was or had...’’. Ratings are to be completed on a 5-point Likert scale (0 =not at all or very slightly to 4=very much). Test-retest reliability and Cronbach alpha were around .90. Factor analyses generated a 5-factor solution with the factors inattention/hyperactivity, impulsivity, anxiety/depression, oppositional behavior, and social adaptation by using 21 items. A total score >29 points hints at the possibility of ADHD during childhood.

ADHD Self Rating Scale

The German ADHD Self Rating Scale (ADHS-SB) consists of the 18 DSM-IV items that are broken down into the factors “inattention” (9 items), “hyperactivity”, and “impulsivity” (9 items together) [31]. The items are scored on a 4-point Likert scale (0=not at all to 3=very pronounced/almost always the case). Test-retest reliability coefficients were between .78 and .89. Correlations with subscales of the NEO Five Factor Inventory were in the expected directions. A total score >17 points hints at the possibility of adult ADHD.

Statistical Analysis

We performed confirmatory factor analyses to examine the postulated factor structures in both groups separately and multiple group confirmatory factor analyses using AMOS 19 to test whether the postulated factor structures of the questionnaires were equivalent across the groups. The factors were allowed to correlate because this is theoretically plausible in all three questionnaires. We used unweighted least squares as this estimation method makes no distributional assumptions [32].

Using multiple group analysis, we examined several levels of invariance between the groups. Configural invariance as the lowest level of invariance exists when the structure of the factor loading matrices is identical in all groups. Metric invariance occurs when factor loadings are identical in all groups. Scale invariance means that the measurement intercepts are the same across groups. Invariance of measurement errors exists if the error variables of measurement models, factor covariances, and factor variances are identical across groups.

We calculated several model fit indices to evaluate the results of our analyses. The root mean square residual (RMR) measures the mean absolute value of the covariance residuals [33]. Values less than .05 indicate a good model fit [32], but other authors state that a value of less than .10 signals an acceptable model fit [34-35]. The standardized root mean square residual (SRMR) eliminates scaling effects of the RMR. Values ≤ .10 indicate a good model fit [35]. The Global Fit Index (GFI) can measure the proportion of variance and covariance that a given model is able to explain. A GFI equal or higher than .90 can be considered as reflecting a good model fit [36]. The adjusted global fit index (AGFI) takes the number of parameters used in computing the GFI into account. An AGFI equal or higher than .90 can be considered as showing a good model fit [35]. These fit indices were calculated for each of the aforementioned invariance levels. Differences of fit indices between these invariance levels should not be larger than .01 [35], otherwise the criteria for a higher invariance level are not reached.

With Cronbach alpha, we investigated the internal consistency of the postulated factors in the different questionnaires. Values >.70 are considered to be acceptable [37].

Huber’s M estimators were calculated when standard deviation values were close to their respective means, signaling high variance [38].

Mann-Whitney U tests were used to compare absolute values in the questionnaires between the two groups. The effect size “Probability of Superiority (PS)”, PS=U/(n1*n2), indicates the probability that a randomly selected subject of group n1 has a higher score than a randomly selected subject of group n2. A PS of .50 means that both groups are equal regarding a specific variable, and that there is no effect. Consequently, the larger the effect, the more PS deviates from .50 [39].

The alpha level for statistical significance was set at .05 (two-sided). Missing responses in the paper versions were replaced using the expectation-maximization or the multiple imputation algorithms [40-42].

Results Samples

In the paper-based sample, we received responses from 328 participants of which 6 indicated they were diagnosed with ADHD, and 11 did not answer this question. Therefore, a total sample of 311 subjects resulted, meaning that 65.6% of our 500 printed questionnaires were returned. This cannot be regarded as a return rate as we did not record those subjects who were personally asked and refused to participate. In the Web-based sample, we received responses from 273 participants of which 18 indicated that they were diagnosed with ADHD so that a total sample of 255 subjects resulted. The flow of subjects in our study samples is depicted in Figure 2. Table 1 shows the demographic characteristics of the two samples.

Table 1

Demographic characteristics of the paper-based and Web-based samples.

		Paper-based (n=311)	Web-based (n=255)
Gender
	Female	228 (73.3%)	176 (69.0%)
	Male	83 (26.7%)	79 (31.0%)
Age, mean years (SD)		39.2 (18.6)	30.4 (10.5)
Education
	University	61 (19.7%)	73 (28.6%)
	Apprenticeship	86 (27.7%)	35 (13.7%)
	High school	66 (21.3%)	130 (51.0%)
	Middle school	61 (19.7%)	15 (5.9%)
	Basic school	36 (11.6%)	2 (0.8%)

Figure 2

Flow of subjects in the paper-based and Web-based samples.

The samples did not differ with respect to gender (χ² test: P=.26, Cramer V=.05). The Web-based sample was, on average, younger than the paper-based sample. This difference was statistically significant with a rather moderate effect size (Mann-Whitney U test: P<.001, PS = .41). In the Web-based sample, there were more participants with a university degree and more subjects attending high school, while in the paper-based sample, there were more participants attending middle or basic school or with a completed apprenticeship. This difference was statistically significant with a high effect size (χ² test: P<.001, Cramer V=.42).

Connors Adult ADHD Rating Scales

There was a maximum of 9% missing values on single variables in the paper sample; these were missing completely at random (Little’s MCAR test, P=.27). They were replaced with the expectation maximization (EM) algorithm [40].

The four-factor model (df = 813) was supported in both groups. In the paper-based sample, the standardized RMR was .08, the RMR was .04, the GFI was .93, and the AGFI was .92. In the Web-based sample, the standardized RMR was .07, the RMR was .05, the GFI was .98, and the AGFI was .97. These fit indices signal a good model fit. Table 2 lists the correlations of CAARS items (loadings) with their postulated factors.

Except for Item 3 (“I don’t plan ahead”) of the factor “inattention/memory”, Items 1 (“I like to be doing active things”) and 5 (“I am a risk-taker or a daredevil”) of the hyperactivity factor, and Item 43 (“I step on people’s toes without meaning to”) on the impulsivity factor, all other items have loadings > .40 in both samples.

The intercorrelations between the factors are consistently higher in the Web-based sample. The largest differences between the two groups were found in correlations involving “self-concept” (see Table 3).

Multiple group analysis revealed that the factor structures were the same in both samples, signaling configural invariance (SRMR=.04, RMR=.04, GFI=.99, AGFI=.99). However, factor loadings were different (SRMR=.06, RMR=.10, GFI=.97, AGFI=.97) because all model fit indices increased > .01 when testing metric invariance. Consequently, other invariance assumptions were also not supported.

Cronbach alpha of the subscales ranged from .81 to .85 in the paper-based sample and from .89 to .91 in the Web-based sample.

Absolute subscale differences between the two groups were all significant with the Web-based sample scoring substantially higher with pronounced effect sizes (Table 4). For example, the probability that a randomly chosen subject from the paper-based sample has a higher inattention/memory score than a randomly chosen subject from the Web-based sample is .35.

As there is no normative data for Germany to date, we applied strict cut-off values based on American normative data T-value of 65, 94th percentile). The cut-off for “inattention/memory” was > 22 points; “hyperactivity” > 26 points; “impulsivity” > 22 points; and “self-concept” > 13 points. Regarding the total score of “inattention/memory”, 27 subjects (10.6%) in the Web-based sample scored above this value while 4 (1.3%) did so in the paper-based sample. This difference was significant with a moderate effect size (χ² test: P<.001, Cramer V=.20). Regarding the total score of “hyperactivity”, 5 subjects (2.0%) in the Web-based sample scored above this value while 2 (0.6%) did so in the paper-based sample. This difference was not significant (χ² test: P=.16, Cramer V=.06). Regarding the total score of “impulsivity”, 20 subjects (7.8%) in the Web-based sample scored above this value while 6 (1.9%) did so in the paper-based sample. This difference was significant with a small effect size (χ² test: P=.001, Cramer V=.14). Regarding the total score of “self-concept”, 30 (11.8%) in the Web-based sample scored above this value while 13 (4.2%) did so in the paper-based sample. This difference was significant with a small effect size (χ² test: P=.001, Cramer V=.14).

Wender Utah Rating Scale (Short Form)

There was a maximum of 2% missing values on single variables in the paper sample that were missing completely at random (Little’s MCAR test, P=.57). These were replaced with the EM algorithm [40]. Due to technical difficulties, WURS-k data of 11 participants in the Web-based sample were not available, resulting in n=244.

The model in the paper-based sample was not admissible because the covariance matrix between the postulated five factors was not positive definite. This leads to the conclusion that the model is wrong [34], and it was thus rejected.

In the Web-based sample, the model (df = 179) was supported: SRMR = .07, RMR = .09, GFI = .98, AGFI = .97. Table 5 depicts the loadings on the postulated factors.

As shown in Table 5, except for Item 23 (problems with police) on the factor “social adaptation”, all other items have high loadings > .4 on their postulated factors. The factors “inattention” and “impulsivity” correlated highest in the Web-based sample (r=.79), followed by “inattention” and “anxiety/depression” (r=.73), “impulsivity” and “oppositional behavior” (r=.72), and “inattention” and “oppositional behavior” (r=.72) (Table 6).

Due to the rejected model in the paper-based group, no multiple group analysis could be calculated.

Cronbach alpha of the subscales ranged from .68 to .82 in the paper-based sample and from .79 to .89 in the Web-based sample. No coefficients were calculated for the subscale “social adaptation” as it consists of only two items.

Absolute values of the total score were significantly higher (Mann-Whitney U test: P<.001; PS = .09) in the Web-based sample (mean 28.6, SD 14.0; Huber’s M estimator 26.2) than in the paper-based sample (mean 11.0, SD 6.8; Huber’s M estimator 9.6).

After applying the recommended cut-off value for the total score (> 29 points) [29,30], 38.1% in the Web-based sample scored above this value while merely 2.5% did so in the paper-based sample. This difference was significant with a high effect size (χ² test: P<.001, Cramer V=.46).

ADHD Self Rating Scale

There was a maximum of 1.3% missing values on single variables in the paper sample, except for Item 4 that asks for difficulties in the field of work. Student participants in the paper version did not complete this item, so 28.5% of missing at random data resulted. These were replaced with the multiple imputation algorithm by five imputations. The following calculations were done separately for the five imputations, and the respective results were averaged. Enders [40] recommends a larger number of imputations, but results showed only marginal differences between imputed datasets.

The four-factor model (df = 132) was supported in both groups. In the paper-based sample, the standardized RMR was .06, the RMR was .06, the GFI was .97, and the AGFI was .96. In the Web-based sample, the standardized RMR was .06, the RMR was .04, the GFI was .98, and the AGFI was .98. These fit indices signal a good model fit. Table 7 lists the loadings of ADHS-SB items on their postulated factors.

As shown in Table 7, except for Item 6 (avoidance of tasks with mental load) on the factor “inattention”, and Item 14 (feel like driven by a motor) on the factor “hyperactivity”, both in the paper-based sample, all other items have high loadings >.4 on their postulated factors.

The correlation between the factors inattention and hyperactivity is significantly higher in the paper-based sample, while the intercorrelations between the other factors are higher in the Web-based sample (Table 8).

Multiple group analysis revealed that the factor structures were the same in both samples, signaling configural invariance (SRMR = .06, RMR = .03, GFI = .98, AGFI = .98). However, factor loadings were different (SRMR = .15, RMR = .08, GFI = .83, AGFI = .79) because all model fit indices increased > .01 when testing metric invariance. Consequently, other invariance assumptions were also not supported.

Cronbach alpha of the subscales ranged from .60 to .83 in the paper-based sample and from .79 to .91 in the Web-based sample.

Absolute differences between the two groups were significant with the Web-based sample (mean 12.8, SD 9.1; Huber’s M estimator 11.1) scoring substantially higher than the paper-based sample (mean 2.2, SD 3.0; Huber’s M estimator 1.4) with a high effect size (Mann-Whitney U test: P<.001, PS = .08).

After applying the recommended cut-off value for the total score (> 17 points) [31], 30% in the Web-based sample scored above this value, while only 0.6% did so in the paper-based sample. This difference was significant with a large effect size (χ² test: P<.001, Cramer V=.42).

Table 2

Correlations of CAARS items (loadings) with their postulated factors (latent constructs) in the paper-based and Web-based samples.

		Paper-based	Web-based
Inattention/Memory
	ITEM 03	.14	.26
	ITEM 07	.53	.74
	ITEM 11	.59	.73
	ITEM 16	.47	.67
	ITEM 18	.55	.74
	ITEM 32	.57	.61
	ITEM 36	.69	.75
	ITEM 40	.52	.73
	ITEM 44	.55	.78
	ITEM 49	.55	.73
	ITEM 51	.55	.63
	ITEM 66	.57	.74
Hyperactivity
	ITEM 01	.31	.06
	ITEM 05	.43	.32
	ITEM 10	.46	.50
	ITEM 13	.70	.74
	ITEM 20	.65	.69
	ITEM 25	.57	.46
	ITEM 27	.57	.82
	ITEM 31	.65	.66
	ITEM 38	.54	.76
	ITEM 46	.67	.80
	ITEM 54	.61	.71
	ITEM 57	.73	.81
Impulsivity
	ITEM 04	.56	.62
	ITEM 08	.49	.72
	ITEM 12	.56	.69
	ITEM 19	.66	.58
	ITEM 23	.58	.61
	ITEM 30	.64	.76
	ITEM 35	.47	.54
	ITEM 39	.60	.67
	ITEM 43	.37	.59
	ITEM 47	.59	.78
	ITEM 52	.53	.62
	ITEM 61	.61	.69
Self-concept
	ITEM 06	.59	.58
	ITEM 15	.60	.75
	ITEM 26	.58	.69
	ITEM 37	.81	.86
	ITEM 56	.75	.79
	ITEM 63	.81	.84

Table 3

Intercorrelations between the CAARS factors (latent constructs) in the paper-based and Web-based samples.

Factors			Paper-based	Web-based
Hyperactivity	<-->	Impulsivity	.73	.81
Inattention/Memory	<-->	Hyperactivity	.54	.74
Inattention/Memory	<-->	Impulsivity	.65	.79
Inattention/Memory	<-->	Self-concept	.47	.74
Hyperactivity	<-->	Self-concept	.24	.57
Impulsivity	<-->	Self-concept	.45	.71

Table 4

Means, standard deviations, and Huber’s M estimators of the CAARS subscales in the paper-based and Web-based samples with their respective P and effect size values.

	Paper-based	Web-based	Mann-Whitney U Test (P) & effect size (PS^b)
Inattention/Memory	8.6 (SD 4.8)Huber’s M^a 8.2	12.3 (SD 7.1) Huber’s M 11.2	<.001; PS=.35
Hyperactivity	9.0 (SD 5.3) Huber’s M 8.0	11.2 (SD 6.1) Huber’s M 10.3	<.001; PS=.38
Impulsivity	9.4 (SD 5.2) Huber’s M 8.8	12.3 (SD 6.6) Huber’s M 11.5	<.001; PS=.37
Self-concept	5.6 (SD 3.6) Huber’s M 5.1	7.5 (SD 4.3) Huber’s M 7.1	<.001; PS=.37

^aHuber’s M estimator.

^bPS = probability of superiority.

Table 5

Correlations of WURS-k items (loadings) with their postulated factors (latent constructs) in the Web-based sample.

		Web-based
Inattention
	ITEM 01	.82
	ITEM 02	.76
	ITEM 03	.78
	ITEM 06	.77
	ITEM 10	.75
	ITEM 15	.59
	ITEM 17	.67
	ITEM 24	.51
Impulsivity
	ITEM 05	.75
	ITEM 11	.83
	ITEM 13	.83
	ITEM 16	.89
Anxiety/Depression
	ITEM 07	.74
	ITEM 09	.62
	ITEM 18	.74
	ITEM 19	.80
Oppositional behavior
	ITEM 08	.88
	ITEM 21	.54
	ITEM 22	.79
Social adaptation
	ITEM 20	.64
	ITEM 23	.33

Table 6

Intercorrelations between the WURS-k factors (latent constructs) in the Web-based sample.

Factors			Web-based
Inattention	<-->	Impulsivity	.79
Impulsivity	<-->	Anxiety/Depression	.69
Inattention	<-->	Anxiety/Depression	.73
Inattention	<-->	Social adaptation	.50
Impulsivity	<-->	Oppositional behavior	.72
Impulsivity	<-->	Social adaptation	.49
Anxiety/Depression	<-->	Oppositional behavior	.35
Anxiety/Depression	<-->	Social adaptation	.57
Oppositional behavior	<-->	Social adaptation	.58
Inattention	<-->	Oppositional behavior	.72

Table 7

Correlations of ADHS-SB items (loadings) with their postulated factors (latent constructs) in the paper-based and Web-based samples.

		Paper-based	Web-based
Inattention
	ITEM 01	.47	.72
	ITEM 02	.48	.73
	ITEM 03	.56	.68
	ITEM 04	.40	.65
	ITEM 05	.45	.63
	ITEM 06	.22	.62
	ITEM 07	.48	.54
	ITEM 08	.60	.72
	ITEM 09	.45	.64
Hyperactivity
	ITEM 10	.71	.74
	ITEM 11	.66	.69
	ITEM 12	.58	.82
	ITEM 13	.54	.65
	ITEM 14	.31	.59
Impulsivity
	ITEM 15	.68	.72
	ITEM 16	.49	.72
	ITEM 17	.48	.73
	ITEM 18	.58	.58

Table 8

Intercorrelations between the ADHS-SB factors (latent constructs) in the paper-based and Web-based samples.

Factors			Paper-based	Web-based
Inattention	<-->	Hyperactivity	.92	.63
Hyperactivity	<-->	Impulsivity	.64	.80
Inattention	<-->	Impulsivity	.66	.72

Discussion

We compared Web-based and paper-based administrations of three ADHD questionnaires for adults. Subjects in the online sample were older and had a higher educational background. The original four-factor structure of the Conners Adult ADHD Rating Scales could be replicated in both samples, but factor loadings were different. Internal consistencies were high in both groups, but the Web-based sample had significantly higher total scores in three subscales with 7.8 to 11.8% above clinically relevant cut-off values, compared to 1.3 to 4.2% in the paper-based sample. The five-factor structure of the German short form of the Wender Utah Rating Scale could be replicated only in the Web-based sample. Internal consistencies were acceptable to high in both groups. The Web-based sample had substantially higher total scores and nearly 40% of the Web-based sample scored above the clinically relevant cut-off value. The three-factor structure of the ADHD Self Rating Scale could be replicated in both samples, but factor loadings were different. Internal consistencies were acceptable to high in both groups. The Web-based sample had substantially higher total scores, and 30% of the Web-based sample scored above the clinically relevant cut-off value. Therefore, psychometric properties were similar in both samples, but the Web-based sample had substantially higher scores on all three questionnaires.

The relatively high dropout rate in our Web-based sample is also reported in the literature. Additional informed consent procedures were shown to increase early dropout in Web-based studies [43]. Kongsved et al [44] administered paper and online questionnaires to women referred for mammography; their questionnaires were comparable in length to our study. In their study, the Internet version had a higher completeness of data but a lower response rate. A lower response rate was also found in surgeons responding to an online questionnaire [45]. We cannot compare dropout rates in our samples because we did not record those who were personally asked and refused to participate in the paper-based sample.

Demographic differences (younger age, higher education) might have influenced the results [46]. ADHD is a disorder with a higher prevalence for men. Our predominantly female samples in both versions are therefore not suitable for deriving normative data. Women tend to participate more in psychological studies. The gender distributions in our study are quite comparable to common sample characteristics regarding online study participation [2,47]. Subjects in the Internet sample might have experienced psychological distress regarding ADHD symptoms so that they saw their participation in the context of assessing themselves for the disorder. One also has to consider that the probability that subjects in a certain geographic region have a high prevalence of ADHD symptoms in paper-based questionnaires is much lower than the probability of reaching such individuals via the Internet without any geographic barriers. On the other hand, an increased self-disclosure might also be an important factor. Subjects might have considered the completion of online questionnaires to be more anonymous than giving away hand-written information, since participants in an online study reported lower social anxiety and lower social desirability than those in the paper-based group [48].

Our results contradict the conclusion of Gosling et al [2] that subjects in Internet samples are not unusually maladjusted. It clearly depends on the context of the study. Even when the intention is to collect normative data for clinical questionnaires, one has to consider that the scores of online samples can be inflated [49].

On the other hand, our results corroborate the assumption of Rhodes et al [47] that previously hidden subgroups can be reached by Internet research. This might also be true for other medical disorders [50]. Whether our subgroup of women with higher scores on ADHD questionnaires has clinical significance must be determined by future studies with more controlled recruitment strategies.

Several limitations have to be mentioned. We did not randomize subjects to online and paper versions, so differences between the two groups might have arisen by sampling biases and should be replicated under randomized conditions. Different recruitment strategies for the paper and online samples might have influenced the results. Although relatively high discontinuation rates are common in online research, they might have caused bias in the results. In future online studies, leaving out questions should also be possible to create conditions similar to paper administration.

Conclusions

Abbreviations

ADHD

Attention-Deficit/Hyperactivity Disorder

ADHS-SB

ADHD Self Rating Scale

AGFI

Adjusted Global Fit Index

CAARS

Conners Adult ADHD Rating Scales

CAARS-S

CAARS self-report

DSM-IV

Diagnostic and Statistical Manual of Mental Disorders-Fourth Edition

GFI

Global Fit Index

ICD-10

International Classification of Diseases—Tenth Edition

MCAR

Missing completely at random

Probability of Superiority

RMR

Root Mean Square Residual

SRMR

Standardized Root Mean Square Residual

WURS-k

Wender Utah Rating Scale-Short Form

This study received no financial grants.

None declared.

Strickland

Moloney

Dietrich

Myerburg

Cotsonis

Johnson

Measurement issues related to data collection on the World Wide Web

ANS Adv Nurs Sci 2003 26 4 246 56

14674574

Gosling

Vazire

Srivastava

John

Should we trust Web-based studies? A comparative analysis of six preconceptions about internet questionnaires

Am Psychol 2004 03 59 2 93 104

10.1037/0003-066X.59.2.93

14992636

2004-11287-002

Ekman

Dickman

Klint

Weiderpass

Litton

Feasibility of using Web-based questionnaires in large population-based epidemiological studies

Eur J Epidemiol 2006 21 2 103 11

10.1007/s10654-005-6030-4

16518678

Ritter

Lorig

Laurent

Matthews

Internet versus mailed questionnaires: a randomized comparison

J Med Internet Res 2004 09 15 6 3 e29

10.2196/jmir.6.3.e29

15471755

v6e29

PMC1550608

Van De Looij-Jansen

De Wilde

Comparison of Web-based versus paper-and-pencil self-administered questionnaire: effects on health indicators in Dutch adolescents

Health Serv Res 2008 10 43 5 Pt 1 1708 21

10.1111/j.1475-6773.2008.00860.x

18479404

HESR860

PMC2653887

Denscombe

Web-based questionnaires and the mode effect

Social Science Computer Review 2006 24 2 246 54

10.1177/0894439305284522

Hardré

Crowson

Xie

Testing differential effects of computer-based, Web-based and paper-based administration of questionnaire research instruments

Br J Educ Technol 2007 01 2007 38 1 5 22

10.1111/j.1467-8535.2006.00591.x

Vallejo

Jordán

Díaz

Comeche

Ortega

Psychological assessment via the internet: a reliability and validity study of online (vs paper-and-pencil) versions of the General Health Questionnaire-28 (GHQ-28) and the Symptoms Check-List-90-Revised (SCL-90-R)

J Med Internet Res 2007 9 1 e2

10.2196/jmir.9.1.e2

17478411

v9i1e2

PMC1794673

Hardré

Crowson

Xie

Differential Effects of Web-Based and Paper-Based Administration of Questionnaire Research Instruments in Authentic Contexts-of-Use

Journal of Educational Computing Research 2010 1 2010 42 1 103 133

10.2190/EC.42.1.e

Riva

Teruzzi

Anolli

The use of the internet in psychological research: comparison of online and offline questionnaires

Cyberpsychol Behav 2003 02 6 1 73 80

10.1089/109493103321167983

12650565

Holländare

Andersson

Engström

A comparison of psychometric properties between internet and paper versions of two depression instruments (BDI-II and MADRS-S) administered to clinic patients

J Med Internet Res 2010 12 5 e49

10.2196/jmir.1392

21169165

v12i5e49

PMC3057311

Carlbring

Brunt

Bohman

Austin

Richards

Öst

Andersson

Internet vs. paper and pencil administration of questionnaires commonly used in panic/agoraphobia research

Computers in Human Behavior 2007 5 2007 23 3 1421 1434

10.1016/j.chb.2005.05.002

Donker

van Straten

Marks

Cuijpers

A brief Web-based screening questionnaire for common mental disorders: development and validation

J Med Internet Res 2009 11 3 e19

10.2196/jmir.1134

19632977

v11i3e19

PMC2763401

Lin

Bai

Liu

Hsiao

Chen

Tsai

Ouyang

Web-based tools can be used reliably to detect patients with major depressive disorder and subsyndromal depressive symptoms

BMC Psychiatry 2007 7 12

10.1186/1471-244X-7-12

17425774

1471-244X-7-12

PMC1855926

Buchanan

Internet-based questionnaire assessment: appropriate use in clinical contexts

Cogn Behav Ther 2003 32 3 100 9

10.1080/16506070310000957

16291542

R3EEUV6VWJXBFLVT

Whitehead

Methodological issues in Internet-mediated research: a randomized comparison of internet versus mailed questionnaires

J Med Internet Res 2011 13 4 e109

10.2196/jmir.1593

22155721

v13i4e109

PMC3278095

Davidson

ADHD in adults: a review of the literature

J Atten Disord 2008 05 11 6 628 41

10.1177/1087054707310878

18094324

1087054707310878

Ramsay

Rostain

Adult ADHD research: current status and future directions

J Atten Disord 2008 05 11 6 624 7

10.1177/1087054708314590

18417728

11/6/624

Weiss

Murray

Assessment and management of attention-deficit hyperactivity disorder in adults

CMAJ 2003 03 18 168 6 715 22

12642429

PMC154919

Steenhuis

Serra

Minderaa

Hartman

An Internet version of the Diagnostic Interview Schedule for Children (DISC-IV): correspondence of the ADHD section with the paper-and-pencil version

Psychol Assess 2009 06 21 2 231 4

10.1037/a0015925

19485678

2009-08126-010

Bhatara

Vogt

Patrick

Doniparthi

Ellis

Acceptability of a Web-based attention-deficit/hyperactivity disorder scale (T-SKAMP) by teachers: a pilot study

J Am Board Fam Med 2006 19 2 195 200

16513909

19/2/195

Swanson

Schuck

Mann-Porter

Carlson

Hartman

Sergeant

Clevenger

Wasdell

McCleary

Wigal

Categorical and Dimensional Definitions and Evaluations of Symptoms of ADHD: History of the SNAP and the SWAN Rating Scales

The International Journal of Educational and Psychological Assessment 2012 10 51 70

Rösler

Retz

Stieglitz

Psychopathological rating scales as efficacy parameters in adult ADHD treatment investigations - benchmarking instruments for international multicentre trials

Pharmacopsychiatry 2010 05 43 3 92 8

10.1055/s-0029-1242819

20127615

Conners

Erhardt

Sparrow

Conner´s Adult ADHD Rating Scales (CAARS). Technical manual 1999

North Tonawanda, NY

Multi-Health Systems

Christiansen

Hirsch

Philipsen

Oades

Matthies

Hebebrand

Ueckermann

Abdel-Hamid

Kraemer

Wiltfang

Graf

Colla

Sobanski

Alm

Rösler

Jacob

Jans

Huss

Schimmelmann

Kis

German Validation of the Conners Adult ADHD Rating Scale-Self-Report: Confirmation of Factor Structure in a Large Sample of Participants With ADHD

J Atten Disord 2012 03 21

10.1177/1087054711435680

22441889

1087054711435680

Christiansen

Kis

Hirsch

Matthies

Hebebrand

Uekermann

Abdel-Hamid

Kraemer

Wiltfang

Graf

Colla

Sobanski

Alm

Rösler

Jacob

Jans

Huss

Schimmelmann

Philipsen

German validation of the Conners Adult ADHD Rating Scales (CAARS) II: reliability, validity, diagnostic sensitivity and specificity

Eur Psychiatry 2012 07 27 5 321 8

10.1016/j.eurpsy.2010.12.010

21392946

S0924-9338(11)00003-4

Christiansen

Kis

Hirsch

Philipsen

Henneck

Panczuk

Pietrowsky

Hebebrand

Schimmelmann

German validation of the Conners Adult ADHD Rating Scales-self-report (CAARS-S) I: factor structure and normative data

Eur Psychiatry 2011 03 26 2 100 7

10.1016/j.eurpsy.2009.12.024

20619613

S0924-9338(10)00104-5

Unipark 2013-02-19

http://www.unipark.info/1-0-online-befragungssoftware-fuer-studenten-und-universitaeten-unipark-home.htm

6EY9AVRHg

Retz-Junginger

Retz

Blocher

Weijers

Trott

Wender

Rössler

[Wender Utah rating scale. The short-version for the assessment of the attention-deficit hyperactivity disorder in adults]

Nervenarzt 2002 09 73 9 830 8

10.1007/s00115-001-1215-x

12215873

Retz-Junginger

Retz

Blocher

Stieglitz

Georg

Supprian

Wender

Rösler

[Reliability and validity of the Wender-Utah-Rating-Scale short form. Retrospective assessment of symptoms for attention deficit/hyperactivity disorder]

Nervenarzt 2003 11 74 11 987 93

10.1007/s00115-002-1447-4

14598035

Rösler

Retz

Retz-Junginger

Thome

Supprian

Nissen

Stieglitz

Blocher

Hengesch

Trott

[Tools for the diagnosis of attention-deficit/hyperactivity disorder in adults. Self-rating behaviour questionnaire and diagnostic checklist]

Nervenarzt 2004 09 75 9 888 95

10.1007/s00115-003-1622-2

15378249

Blunch

Introduction to structural equation modeling using SPSS and AMOS 2008

London

Sage

Kline

Principles and practice of structural equation modeling 2005

New York

Guilford

Arbuckle

Amos 17.0 User´s Guide 2008

Chicago

SPSS Inc

Weiber

Mühlhaus

Strukturgleichungsmodellierung Structural Equation Modeling 2009

Berlin

Springer

Raykov

Marcoulides

A first course in structural equation modeling 2006

Mahwah

Lawrence Erlbaum Associates

George

Mallery

SPSS for Windows Step by Step: A Simple Guide and Reference, 18.0 Update 2010

Boston

Allyn & Bacon

Huber

Ronchetti

Robust Statistics 2009

Hoboken, NJ

John Wiley & Sons

Grissom

Kim

Effect sizes for research. Univariate and multivariate applications 2012

New York

Routledge

Enders

Applied missing data analysis 2010

New York

Guilford

Donders

van der Heijden

Stijnen

Moons

Review: a gentle introduction to imputation of missing values

J Clin Epidemiol 2006 10 59 10 1087 91

10.1016/j.jclinepi.2006.01.014

16980149

S0895-4356(06)00197-1

Sterne

White

Carlin

Spratt

Royston

Kenward

Wood

Carpenter

Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls

BMJ 2009 338 b2393

19564179

PMC2714692

O'Neil

Penrod

Bornstein

Web-based research: methodological variables' effects on dropout and sample characteristics

Behav Res Methods Instrum Comput 2003 05 35 2 217 26

12834076

Kongsved

Basnov

Holm-Christensen

Hjollund

Response rate and completeness of questionnaires: a randomized study of Internet versus paper-and-pencil versions

J Med Internet Res 2007 9 3 e25

10.2196/jmir.9.3.e25

17942387

v9i3e25

PMC2047288

Leece

Bhandari

Sprague

Swiontkowski

Schemitsch

Tornetta

Devereaux

Guyatt

Internet versus mailed questionnaires: a controlled comparison (2)

J Med Internet Res 2004 10 29 6 4 e39

10.2196/jmir.6.4.e39

15631963

v6e39

PMC1550620

van den Berg

Overbeek

van der Pal

Versluys

Bresters

van Leeuwen

Lambalk

Kaspers

van Dulmen-den Broeder

Using web-based and paper-based questionnaires for collecting data on fertility issues among female childhood cancer survivors: differences in response characteristics

J Med Internet Res 2011 13 3 e76

10.2196/jmir.1707

21955527

v13i3e76

PMC3222164

Rhodes

Bowie

Hergenrather

Collecting behavioural data using the world wide web: considerations for researchers

J Epidemiol Community Health 2003 01 57 1 68 73

12490652

PMC1732282

Joinson

Social desirability, anonymity, and Internet-based questionnaires

Behav Res Methods Instrum Comput 1999 08 31 3 433 8

10502866

Houston

Cooper

Kahn

Toser

Ford

Screening the public for depression through the Internet

Psychiatr Serv 2001 03 52 3 362 7

11239106

Klovning

Sandvik

Hunskaar

Web-based survey attracted age-biased sample with more severe illness than paper-based survey

J Clin Epidemiol 2009 10 62 10 1068 74

10.1016/j.jclinepi.2008.10.015

19246177

S0895-4356(08)00328-4