Published on 15.08.11 in Vol 13, No 3 (2011): Jul-Sep
A Web-Based Computerized Adaptive Testing (CAT) to Assess Patient Perception in Hospitalization
Background: Many hospitals have adopted mobile nursing carts that can be easily rolled up to a patient’s bedside to access charts and help nurses perform their rounds. However, few papers have reported data regarding the use of wireless computers on wheels (COW) at patients’ bedsides to collect questionnaire-based information of their perception of hospitalization on discharge from the hospital.
Objective: The purpose of this study was to evaluate the relative efficiency of computerized adaptive testing (CAT) and the precision of CAT-based measures of perceptions of hospitalized patients, as compared with those of nonadaptive testing (NAT). An Excel module of our CAT multicategory assessment is provided as an example.
Method: A total of 200 patients who were discharged from the hospital responded to the CAT-based 18-item inpatient perception questionnaire on COW. The numbers of question administrated were recorded and the responses were calibrated using the Rasch model. They were compared with those from NAT to show the advantage of CAT over NAT.
Results: Patient measures derived from CAT and NAT were highly correlated (r = 0.98) and their measurement precisions were not statistically different (P = .14). CAT required fewer questions than NAT (an efficiency gain of 42%), suggesting a reduced burden for patients. There were no significant differences between groups in terms of gender and other demographic characteristics.
Conclusions: CAT-based administration of surveys of patient perception substantially reduced patient burden without compromising the precision of measuring patients’ perceptions of hospitalization. The Excel module of animation-CAT on the wireless COW that we developed is recommended for use in hospitals.
J Med Internet Res 2011;13(3):e61
- Computerized adaptive testing;
- computer on wheels;
- classic test theory;
- item response theory;
- nonadaptive testing
As computer technology and health care become more integrated, many hospitals have adopted mobile nursing carts that can be easily rolled up to a patient’s bedside to access charts and help nurses perform their rounds [- ]. Besides increasing efficiency by including basic functions such as billing records and decreasing the number of trips nurses need to take to the medication room [ ], the carts can reduce patient burden by allowing them to answer questions on activities of daily living using computerized adaptive testing (CAT) [ ]. However, few papers have reported data regarding the bedside use of wireless computers on wheels (COW) to collect questionnaire-based information on their perception of hospitalization on discharge from the hospital. Collecting patients’ feedback on their perspectives has become an important part of patient involvement and participation for health caregivers; thus, this question is important [ - ].
Gathering Feedback Efficiently From Patients
Two new modes of survey administration have been reported to make surveys more easily accessible to those who cannot read or write . These include using automated telephone technology through an interactive voice response system and using Internet-like visualizations to complete questionnaires online. In medical practice, hospital staff usually hand a questionnaire to patients at the end of their visit and ask them to complete it prior to leaving hospital. At the Picker Institute Europe [ ], questionnaires are sent annually to a randomized list of eligible patients who had been discharged from the hospital. Both of these methods are less prompt and efficient than using wireless COW to collect data on patients’ perspectives on being discharged from the hospital.
Computer Assessment and Computer-Adaptive Testing
There is no doubt that using wireless COW at a patient’s bedside is an efficient way of instantly gathering feedback from patients. Traditional paper-and-pencil or computer-based devices (nonadaptive testing [NAT]) impose a large respondent burden because patients are required to answer all the questions. In contrast, CAT-based tests developed using item response theory (IRT)  can achieve a similar degree of measurement precision to NAT using only about half the test length [ , - ]. Most studies investigating IRT- and CAT-based tests have evaluated both efficiency and precision for CAT-based tests with dichotomous items. Whether CAT-based tests with polytomously scored items (CAT as defined in this study) can be incorporated with wireless COW in hospitals for gathering feedback from patients should be investigated.
In classical test theory, raw scores (or linear transformation scores, eg, T score) are usually adopted as respondent measures. However, subsequent parametric statistical analyses, such as computing mean, variance, correlation coefficient, and Cronbach alpha [, ], would be incorrect because raw scores are not on an additive interval scale [ ].
To overcome this obstacle, the IRT-based Rasch model , a probabilistic relationship between a person’s level of a latent trait (commonly referred to as ability or measure) and an item’s property (difficulty or threshold), was developed. Both person ability and item difficulty (calibrated in terms of log odds or logits) are located along the same continuum. A useful scale (or questionnaire) is usually examined by 3 important criteria for the Rasch model, namely, unidimensionality, item fit, and item invariance (or so-called differential item functioning [ ]). These criteria are detailed in Smith et al [ ]. There are many published papers [ , - ] of studies using the Rasch model to develop CAT in clinical settings, but none of them have incorporated the Internet-based polytomously scored CAT to gather feedback from patients in hospitals.
The purpose of this study was to evaluate the relative efficiency of an Internet-based polytomously scored CAT and the precision of CAT-based measures of perceptions of hospitalized patients, as compared with those measured by NAT. An Excel (Microsoft Corporation, Redmond, WA, USA) module of our CAT multicategory assessment is provided as an example.
The study sample was recruited from inpatients at a 1333-bed medical center in southern Taiwan. Patients who had been discharged were selected randomly by the digit code of their invoice number during each morning and afternoon interval from Monday through Friday in summer 2010.
As an incentive for participation, patients were offered a gift card for US $3.20 good for purchases at 7-11 convenience stores. A total of 200 patients either completed the questionnaire on COW themselves or were helped by a trained volunteer (if they were unable to personally complete the questionnaire); proxies were allowed because most of those helping patients carry out their discharge procedure were the patients’ family members or friends. Time spent by each patient was recorded in Excel after they completed the questionnaire. This study was approved and monitored by the Research and Ethical Review Board of the Chi-Mei Medical Center, Tainan, Taiwan.
Tool: CAT-Format Questionnaire
We designed the 18-item CAT questionnaire in Excel based on an 18-item inpatient perception questionnaire (IPQ-18) ; see ). Unidimensionality, local independence, item fit, and differential item functioning using the Rasch model to investigate these criteria have been previously reported [ ].
Data collected from the patients included demographic characteristics (gender, treatment department, age, and person completing survey, ie, proxy or patient); perception measure in a logit unit; number of items needed to be completed; and mean square errors (MNSQ) of infit and outfit (indicators of response patterns for the IPQ-18 scale ) (see , , and ).
Outfit and Infit Statistics
Outfit statistics are based on unweighted sum of squared standardized residuals and are sensitive to unexpected outlying, off-target, and low-information responses; whereas infit statistics are based on weighted sum of squared standardized residuals and are sensitive to unexpected inlying patterns among informative and on-target observations . Smith [ ] found that Rasch outfit MNSQ approaching 1.0 [ ] demonstrates a higher power than its counterpart of infit MNSQ. Outfit MNSQ of 2.0 or greater for a patient indicate a possibly aberrant response pattern [ ].
CAT Procedures and Features
We programmed a Visual Basic for Applications (VBA) module in Microsoft Excel and on the Internet (http://www.healthup.org.tw/cat.asp, http://www.webcitation.org/60xWv6p6d) complying with several rules and criteria for CAT-based testing on COW (, ). The person separation reliability (similar to Cronbach alpha) calculated from the original paper [ ] was 0.94 (mean 2.64, SD 2.09). Based on this number, the CAT stop rule for measurement of standardized error was determined to be 0.51(SD × sqrt(1 – alpha) = 2.09 × sqrt(1 – 0.94)).
We also set another stop rule so that the minimum number of questions required for completion was 10 items (10/18, 56%), because CAT achieves a similar measurement precision to NAT with only about half the test length [, - ]. The initial question was selected from the pool of 18 items according the patient’s overall perception of satisfaction in their hospitalization. The provisional person measure is estimated by the maximum of the log-likelihood function using an iterative Newton-Raphson procedure [ , ] ( ) after 3 items were responded to, without all answers corresponding to either 0 or 4. The next question selected was the one with the highest information value among the remaining unanswered questions weighted against the provisional person measure. The details of CAT procedures are shown in and .
Comparison of Efficiency Between NAT and CAT
Two indicators used to examine CAT efficiency in this study are (1) whether the number of questions needed was significantly less than for NAT (18 questions) and (2) whether the precision of person measures was less than that for NAT. We used paired t tests to make these two statistical inferences.
Accordingly, the perception measure based on NAT should be estimated in advance for each patient who was assumed to have answered all 18 items. The following steps were thus followed: (1) we used a standard item response-generation method [- ] to generate responses to 18 questions for each patient with given question difficulty parameters (in ) and a patient perception measure (by CAT), and (2) we applied the rectangle metric of 18 questions × 200 persons to re-estimate NAT measures for each patient using WINSTEPS software (WINSTEPS version 3.72.0: Winsteps.com, Chicago, IL, USA) (the 18-question difficulties are anchored in WINSTEPS with iafile shown in ).
SPSS software for Windows (Version 12, SPSS, Chicago, IL) was used for all statistical analysis.
Data on patient gender, age, treatment department, and proxy respondent were collected. Noncontinuous variables were reported as frequency and percentages, and were examined by chi-square tests.
For continuous variables, CAT and NAT measures were compared using the Pearson correlation coefficient. Patient perception measures obtained by CAT were compared between groups using t tests or analysis of variance (ANOVA). Time spent by patients was averaged by using logarithmic transformation and reported as mean (SD) by exponential function. All analyses were considered to be statistically significant at the .05 alpha level.
As seen in, there were no significant associations between gender and other demographic characteristics (ie, treatment department, age, and participant). Among inpatients we approached, 13% (31/231) were unwilling to respond to the CAT questions due to personal reasons, despite the incentive we offered. CAT and NAT measures were highly correlated (r = 0.98).
Mean time spent by patients in CAT was 54.91 seconds (SD 8.00; maximum 76; minimum 33). As shown in, CAT required substantially fewer questions than NAT (P < .001). NAT required all of the 200 patients to respond to all 18 questions, and thus yielded a total of 3600 responses. In CAT, a total of 2083 responses were required, meaning that on average a patient answered 10.42 questions. Thus, as compared with NAT, CAT received an efficiency gain in test length of 0.42 (defined as 1 – ratio of total responses by CAT and NAT: 1 – 2084/3600).
Regarding precision of measurement, person measures from CAT did not statistically differ from those from NAT (P = .14). ANOVA revealed that patient perception measures did not differ between groups across elements; t test analyses showed that they also did not differ between genders ().
Total person mean 0.71 logits (SD 1.62); median 0.59; coefficient of skewness 0.103 (P = .54); coefficient of kurtosis –0.89 (P = .03); D’Agostino-Pearson test for normal distribution accept normality (P = .09).
The results from this study indicate that CAT-based testing using COW can reduce patient (or proxy) burdens. It is up to 42% more efficient in answering questions and achieves a very similar degree of measurement precision to NAT.
What This Adds to What Was Known
Consistent with the literature [, - , ], the efficiency of CAT was supported. We confirmed that the CAT-based IPQ-18 on COW requires significantly fewer questions to measure patient perception than NAT, but does not compromise precision of measurement.
What is the Implication, What Should be Changed
Using an Excel module of animation for CAT on COW as a tool that can help hospital staff efficiently and precisely gather feedback from patients is technically feasible. Outfit MNSQ of 2.0 or greater can be used to examine whether patient responses are distorted or abnormal—that is, many more responses unexpectedly did not fit the model’s requirement and were deemed to be very likely to be careless, mistaken, or awkward [, , , ]. Thus, CAT makes detection of problematic responses easier—normally, inspecting problematic feedback from patients using classical test theory is rather difficult.
Strength of This Study
There are 2 major forms of standardized assessments in clinical settings : (1) a lengthy questionnaire to achieve a precise assessment that requires significant amounts of time and training to administer, and (2) a rapid short-form scale that briefly screens for the most common symptoms using cut-off points to identify degrees of impairment [ , ]. CAT has the advantages of both forms: precision and efficiency. This paper reports several achievements, including using the Rasch rating scale model [ ] (instead of dichotomy) to design CAT in a perception survey, applying CAT on a COW, and offering an Excel module with an animation prototype (demonstrated in or http://www.healthup.org.tw/cat.asp). This module and available files can complement the limited uses for interactive voice response or Internet-like visualization online [ ] to complete questionnaires and put them into clinical practice.
We conducted an actual CAT-based survey instead of CAT with simulations. This study demonstrates the whole procedure of a CAT-based survey, from the beginning of data collection (and ) through the end of the evaluation report ( ), and fulfills the goal of creating a Web-CAT with graphs and animations, as stated in our previous paper [ ].
Limitations of the Study
Several issues should be considered more thoroughly in further studies. First, a total of 200 patients were surveyed on the IPQ-18. The generalizability of this study needs to be investigated with more studies on different samples and different instruments. Second, there is a potential sampling bias in this study. Those who completed the IPQ-18 CAT on COW tended to be younger; and proxies were used to represent patients to complete the discharge procedure from hospital, because they were selected randomly by the digit code of their invoice number on the patient’s discharge. The proportion of proxies, who are assumed to be healthier and more capable of filling out a questionnaire, was very high (183/200, 91.5%; see). This sample therefore does not reflect mostly the patients’ perspective on hospitalization, which possibly affects the study results shown in . Third, the patient burden was determined by the number of questions administered in this study. Other indices may be collected where available, such as time and effort required for test administration, and accessibility of the hospital [ , ].
In addition, we set at least 10 items in CAT to be completed as one of the stop rules, which might inflate the test length to some extent. As a result, the test length of CAT was about 58% that of NAT, a little higher than in previous studies with about half the test length [, - ].
A large variety of behavior-change techniques and other methods to promote exposure to interventions have been used . There are concerns about how to entice patients (or proxies) to complete surveys before they are discharged from the hospital. Offering reward points or coupons good for credits toward another service is recommended because perception surveys are not similar to other clinical scales conducted by clinicians, where patients themselves consider the benefits to their health.
A telephone survey with CAT-based administration or patient self-report on the Internet (demonstrated at http://www.healthup.org.tw/cat.asp) can be combined with the CAT on COW for gathering feedback from patients easily, quickly, and efficiently.
There are many issues that should be addressed in the future, including studies that address the limitations noted above. For example, using CAT on COW at patients’ bedsides to gather their feedback before discharge from the hospital can solve the problem of sampling bias (eg, when proxies constitute a high proportion of respondents) and warrants further study. Surveying perceptions of hospital service via the Internet by CAT-type telephone or self-report is encouraged to complement CAT on COW and questionnaires delivered by mail to discharged patients, such as the Picker Institute Europe’s annual survey.
One of the important advantages of CAT scoring is that the item pool can be expanded without changing the metric . CAT administrators may expand the IPQ-18 item pool or replace items with other kinds of questions as presented in the Excel spreadsheet example. It must be noted that (1) overall item and step (threshold) difficulties of the questionnaire must be calibrated in advance using Rasch analysis (eg, the IPQ-18 of this study was examined by Rasch analysis in a previous paper [ ]), and (2) picture and voice files for each question should be well prepared in an appropriate folder that can be shown simultaneously with the corresponding question in an animation module of CAT.
CAT-based administration of surveys of patient perception reduces patient burden without compromising measurement precision. The Excel module for animation-CAT on COW connected to a mainframe computer is recommended for assessing patients’ perceptions of their experience in the hospital.
This study was supported by Grant 98cm-kmu-18 from the Chi Mei Medical Center, Taiwan.
Conflicts of Interest
Chien,Lai and Chou collected all data, generated the database, designed and performed the statistical analysis and wrote the manuscript. Wang and Huang contributed to the development of the study design and advised on the performance of the statistical analysis. The analysis and results were discussed by all authors together. Chien contributed to the Excel programming, helped to interpret the results and helped to draft the manuscript. All authors read and approved the final manuscript.
Multimedia Appendix 1
Excel VBA module for CAT delivering results to the website through an Internet addressZIP file (Zip Archive), 1141 KB
Multimedia Appendix 2
Comprehensive overview of Rasch models and the CAT processPDF file (Adobe PDF File), 405 KB
Multimedia Appendix 3
Screenshot of the module with an animation-CAT designWMV file (Windows Media Video File), 16,264 KB
- Chien TW, Wu HM, Wang WC, Castillo RV, Chou W. Reduction in patient burdens with graphical computerized adaptive testing on the ADL scale: tool development and simulation. Health Qual Life Outcomes 2009;7:39 [FREE Full text] [CrossRef] [Medline]
- Briggs B. Point of care on a roll. Health Data Manag 2003 Dec;11(12):48-50. [Medline]
- Lavin M, Sierzega G, Pucklavage D, Kleinbach D, Gogal C, Bokovoy J. Carts and care. Roll out safer medication delivery and smoother workflow with mobile technology. Nurs Manage 2007 Nov;Suppl Pharmacy:16-18. [CrossRef] [Medline]
- Davies AR, Ware JE. Involving consumers in quality of care assessment. Health Aff (Millwood) 1988;7(1):33-48 [FREE Full text] [Medline]
- Chien TW, Wang WC, Wang HY, Lin HJ. Online assessment of patients' views on hospital performances using Rasch model's KIDMAP diagram. BMC Health Serv Res 2009;9:135 [FREE Full text] [CrossRef] [Medline]
- Chien TW, Wang WC, Lin SB, Lin CY, Guo HR, Su SB. KIDMAP, a web based system for gathering patients' feedback on their doctors. BMC Med Res Methodol 2009;9:38 [FREE Full text] [CrossRef] [Medline]
- Ritter P, Lorig K, Laurent D, Matthews K. Internet versus mailed questionnaires: a randomized comparison. J Med Internet Res 2004 Sep 15;6(3):e29 [FREE Full text] [CrossRef] [Medline]
- Lord FM. Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum; 1980.
- Wainer HW, Dorans NJ, Flaugher R, Green BF, Mislevy RJ, Steinberg L, et al. Computerized adaptive testing: a primer. Hillsdale, N.J.: L. Erlbaum Associates; 1990.
- Embretson S, Reise S. Item response theory for psychologists. Mahwah, N.J.: L. Erlbaum Associates; 2000:158-186.
- Weiss DJ. Improvement measurement quality and efficiency with adaptive testing. Applied Psychological Measurement 1982;6:473-492. [CrossRef]
- Crocker L Algina J. Introduction to classical and modern test theory. New York: Holt, Rinehart, and Winston; 1986.
- Nunnally JC. Bernstein .IH. Psychometric theory. New York: McGraw-Hill; 1994.
- Wright BD, Linacre JM. Observations are always ordinal; measurements, however, must be interval. Arch Phys Med Rehabil 1989 Nov;70(12):857-860. [Medline]
- Rasch G. Probabilistic models for some intelligence and attainment tests. Chicago: University of Chicago Press; 1980.
- Holland PW, Wainer H. Differential item functioning. Hillsdale: Lawrence Erlbaum Associates; 1993.
- Smith AB, Wright P, Selby PJ, Velikova G. A Rasch and factor analysis of the Functional Assessment of Cancer Therapy-General (FACT-G). Health Qual Life Outcomes 2007;5:19 [FREE Full text] [CrossRef] [Medline]
- Velozo CA, Wang Y, Lehman L, Wang JH. Utilizing Rasch measurement models to develop a computer adaptive self-report of walking, climbing, and running. Disabil Rehabil 2008;30(6):458-467. [CrossRef] [Medline]
- Lehman LA, Woodbury M, Shechtman O, Wang YC, Pomeranz J, Gray DB, et al. Development of an item bank for a computerised adaptive test of upper-extremity function. Disabil Rehabil 2011 Mar 14. [CrossRef] [Medline]
- Öztuna D, Elhan AH, Küçükdeveci AA, Kutlay S, Tennant A. An application of computerised adaptive testing for measuring health status in patients with knee osteoarthritis. Disabil Rehabil 2010;32(23):1928-1938. [CrossRef] [Medline]
- Elhan AH, Oztuna D, Kutlay S, Küçükdeveci AA, Tennant A. An initial application of computerized adaptive testing (CAT) for measuring disability in patients with low back pain. BMC Musculoskelet Disord 2008;9:166 [FREE Full text] [CrossRef] [Medline]
- Linacre JM, Wright BD. Chi-square fit statistics. Rasch Meas Trans 1994;8(2):360.
- Smith RM. Fit analysis in latent trait measurement models. J Appl Meas 2000;1(2):199-218. [Medline]
- Linacre JM. Optimizing rating scale category effectiveness. J Appl Meas 2002;3(1):85-106. [Medline]
- Kieffer KM, Reese RJ. A reliabilty generalization study of the ceriatric scale. Educational and Psychological Measurement 2002;62(6):969-994. [CrossRef]
- Harwell M, Stone CA, Hsu TC, Kirisci L. Monte Carlo studies in item response theory. Applied Psychological Measurement 1996, 20, 101-125 1996. [CrossRef]
- Macdonald P, Paunonen SV. A monte carlo comparison of item and person statistics based on item response theory versus classical test theory. Educational and Psychological Measurement 2002;62:921-943. [CrossRef]
- Chien TW, Lin SJ, Wang WC, Leung HW, Lai WP, Chan AL. Reliability of 95% confidence interval revealed by expected quality-of-life scores: an example of nasopharyngeal carcinoma patients after radiotherapy using EORTC QLQ-C 30. Health Qual Life Outcomes 2010;8:68 [FREE Full text] [CrossRef] [Medline]
- Linacre JM. How to simulate Rasch data. Rasch Meas Trans 2007;21(3):1125.
- Ware JE, Kosinski M, Bjorner JB, Bayliss MS, Batenhorst A, Dahlöf CG, et al. Applications of computerized adaptive testing (CAT) to the assessment of headache impact. Qual Life Res 2003 Dec;12(8):935-952. [Medline]
- Eack SM, Singer JB, Greeno CG. Screening for anxiety and depression in community mental health: the beck anxiety and depression inventories. Community Ment Health J 2008 Dec;44(6):465-474. [CrossRef] [Medline]
- Ramirez Basco M, Bostic JQ, Davies D, Rush AJ, Witte B, Hendrickse W, et al. Methods to improve diagnostic accuracy in a community mental health setting. Am J Psychiatry 2000 Oct;157(10):1599-1605 [FREE Full text] [Medline]
- Shear MK, Greeno C, Kang J, Ludewig D, Frank E, Swartz HA, et al. Diagnosis of nonpsychotic patients in community clinics. Am J Psychiatry 2000 Apr;157(4):581-587 [FREE Full text] [Medline]
- Andrich D. A rating formulation for ordered response categories. Psychometrika 1978;43:561-573.
- Chien TW, Lai WP, Lu CW, Wang WC, Chen SC, Wang HY, et al. Web-based computer adaptive assessment of individual perceptions of job satisfaction for hospital workplace employees. BMC Med Res Methodol 2011;11:47 [FREE Full text] [CrossRef] [Medline]
- Brouwer W, Kroeze W, Crutzen R, de Nooijer J, de Vries NK, Brug J, et al. Which intervention characteristics are related to more exposure to internet-delivered healthy lifestyle promotion interventions? A systematic review. J Med Internet Res 2011;13(1):e2 [FREE Full text] [CrossRef] [Medline]
- Aday LA. Designing and conducting health surveys: a comprehensive guide. San Francisco: Jossey-Bass Publishers; 1996.
|ANOVA: analysis of variance|
|CAT: computerized adaptive testing|
|COW: computers on wheels|
|IPQ: inpatient perception questionnaire|
|IRT: item response theory|
|MNSQ: mean square errors|
|NAT: nonadaptive testing|
|VBA: Visual Basic for Applications|
Edited by G Eysenbach; submitted 26.02.11; peer-reviewed by T Bond, CL Hsieh, A Smith; comments to author 13.04.11; revised version received 05.05.11; accepted 09.05.11; published 15.08.11
©Tsair-Wei Chien, Wen-Chung Wang, Sheng-Yun Huang, Wen-Pin Lai, Julie Chi Chow. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 15.08.2011.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.