Psychometric Properties of the Chinese Warwick-Edinburgh Mental Well-being Scale in Medical Staff: Cross-sectional Study

Background: Worldwide, mental well-being is a critical issue for public health, especially among medical staff; it affects professionalism, efficiency, quality of care delivery, and overall quality of life. Nevertheless, assessing mental well-being is a complex problem. Objective: This study aimed to evaluate the psychometric properties of the Chinese-language version of the 14-item Warwick-Edinburgh Mental Well-being Scale (WEMWBS) in medical staff recruited mainly from 6 hospitals in China and provide a reliable measurement of positive mental well-being. Methods: A cross-sectional online survey was conducted of medical staff from 15 provinces in China from May 15 to July 15, 2020. Confirmatory factor analysis (CFA) was conducted to test the structure of the Chinese WEMWBS. The Spearman correlations of the Chinese WEMWBS with the 5-item World Health Organization Well-Being Index (WHO-5) were used to evaluate convergent validity. The Cronbach α and split-half reliability ( λ ) represented internal consistency. A graded response model was adopted for an item response theory (IRT) analysis. We report discrimination, difficulty, item characteristic curves (ICCs), and item information curves (IICs). ICCs and IICs were used to estimate reliability and validity based on the IRT analysis. Results: A total of 572 participants from 15 provinces in China finished the Chinese WEMWBS. The CFA showed that the 1D model was satisfactory and internal consistency reliability was excellent, with α =.965 and λ =0.947, while the item-scale correlation coefficients ranged from r =0.727 to r =0.900. The correlation coefficient between the Chinese WEMWBS and the WHO-5 was significant, at r =0.746. The average variance extraction value was 0.656, and the composite reliability value was 0.964, with good aggregation validity. The discrimination of the Chinese WEMWBS items ranged from 2.026 to 5.098. The ICCs illustrated that the orders of the category thresholds for the 14 items were satisfactory. Conclusions: The Chinese WEMWBS showed good psychometric properties and can measure well-being in medical staff. (J Med


Introduction Background
Mental well-being is a public health concern worldwide; adequate mental well-being is associated with better health-related quality of life and longer life expectancy [1]. In recent years, the mental well-being of employees in several occupations has gained substantial attention [2][3][4][5][6]. A meta-analysis revealed that numerous health care workers had various psychological problems [7]. It is well known that medical staff experience many work-related stresses (eg, prolonged and irregular working hours, night shifts, high-intensity work, emotional exhaustion, chronicity of care, and moral conflicts), which may negatively influence their mental well-being, causing depression, anxiety, sleeping disorders, and other problems. Impaired mental well-being can affect health care providers' professionalism, quality of care delivery, efficiency, and overall quality of life [8,9].
Moreover, it has been reported that the overall mental health status of Chinese medical staff is unfavorable [10,11]. This finding suggests that the mental well-being of medical staff is critically important to public health [12,13]. For this reason, it is crucial to measure the mental health status of medical staff and identify work-related risk factors to protect their well-being [14].
The Warwick-Edinburgh Mental Well-being Scale (WEMWBS) is a relatively new, short, acceptable scale that has been translated into several languages [15][16][17][18]. It has demonstrated excellent reliability, good validity, and internal consistency [19]. Studies of public mental health have confirmed the WEMWBS's ability to offer rigor in psychological evaluations [20]; it focuses on protective and promoting factors that can provide a rational basis for the orientation of policy makers formulating interventions [21].
Previous studies have reported the psychological performance of the Chinese-language version of the WEMWBS in clinical and nonclinical settings in China, but all have had limitations [15,22]. Research by Liu et al [23] appears to be the earliest psychometric analysis of the Chinese WEMWBS; however, 2 issues need addressing. First, their paper was written in Chinese, making it burdensome to read for non-Chinese-speaking investigators and impeding comparisons of China with other countries. Second, the age of the study participants ranged from 60 to 97 years, resulting in information and selection bias. The generalizability of the findings from Dong et al [22] is problematic, because the 191 patients with chronic heart failure in that study came from 1 hospital in a Chinese city. A study by Fung [24] and an earlier study by Dong et al [15] were limited because all respondents were university students recruited from either a single university or a single hospital nursing internship program in a Chinese city; this could have caused pervasive information and selection bias in these studies' assessment of the psychometric properties of the WEMWBS. A study by Waqas et al [25] explored the reliability and validity of the WEMWBS in Pakistan; Taggart et al [26] investigated the WEMWBS in a targeted sample of minority ethnic groups living in the UK who self-identified as Chinese or Pakistani by background. Additionally, no previous investigation has combined a graded response model (GRM), item response theory (IRT), and classical test theory (CTT) to evaluate the psychometric properties of the WEMWBS. It is necessary to find a comprehensive method and a better representative sample that covers participants from southern and northern areas to assess the performance of the Chinese WEMWBS.

Objective of the Study
We administered the Chinese WEMWBS to medical staff to evaluate their psychological characteristics and explore and popularize this questionnaire on mental well-being, which is suitable for Chinese national conditions. We aim to provide theoretical support for improving the mental well-being of medical staff.

Ethics Approval
All participants provided informed consent before participation, and the Medical Ethics Committee of the Second Affiliated Hospital of Wenzhou Medical University approved the study (LCKY2019-288).

Instruments
Data were collected via a self-administered online questionnaire. The first section collected sociodemographic characteristics, including age, marital status, gender, body weight (in kilograms), height (in meters), professional status, and education level. The second section examined lifestyle habits, including working hours, night shifts per week, smoking history, drinking history, consumption of vegetables and fruit, physical exercise, and self-reported personality. The third section examined mental well-being using the WEMWBS and self-perceived quality of life (QoL). The WEMWBS is a 14-item sequential scale that measures 3 aspects of mental well-being: positive psychological function, emotion, and interpersonal relationship satisfaction. All items were scored on a 5-point Likert scale, including 1 (never), 2 (occasionally), 3 (yes), 4 (often), and 5 (always). The total score ranged from 14 to 70, with higher scores representing stronger subjective well-being. The third section of the questionnaire used the 36-Item Short Form Health Survey, Version 2 (SF-36 v2) to assess self-perceived QoL. The SF-36 v2 is a 36-item structured scale that comprehensively summarizes respondents' QoL across 8 dimensions: physical functioning (10 items), role-physical (4 items), bodily pain (2 items), general health (5 items), vitality (4 items), social functioning (2 items), role-emotional (3 items), and mental health (5 items). The physical component summary and the mental component summary are 2 subscales of the 8 dimensions. In addition to the 8 dimensions listed above, the SF-36 v2 includes another health condition, reported health transition, which measures overall changes in health status over the past year.

Statistical Analysis
We used EpiData (version 3.1; EpiData Association) for double entry and data management. Data collection and analysis were carried out using SPSS (version 27.0; IBM Corp) and R (version 4.1.1; R Foundation for Statistical Computing). Means and SDs were calculated for continuous data and frequencies and percentages for categorical data.

Dimensionality Test
Principal component analysis of the Chinese WEMWBS was used to independently identify a 1D hypothesis; this analysis indicates good quality (ie, statistical power) of the 1D structure of the model when the first eigenvalue is more than 50% of the total variation.

Ceiling Effect and Floor Effect
A ceiling or floor effect is present when subjects receive the scale's highest or lowest score. Measurement scales with ceiling or floor effects may have questionable validity, reliability, and reactivity. The significance level should be 20%.

Item Analysis
Item analysis determines effectiveness and the ability to discriminate the entire scale. The process used is to sum the scores of the items for each participant, divide them into high-score and low-score groups (with 27% and 73% quantiles as the boundaries), and finally use a 2-tailed t test to identify differences between the groups. If there is a difference, the scale item is appropriately designed; otherwise, it indicates that the item has a questionable ability to discriminate between respondents, meaning that the item should be deleted or rearranged.

Reliability Analysis: Internal Consistency of the Scale
We used the Cronbach α and split-half reliability (λ) to represent internal consistency reliability. The former indicates the homogeneity of each item in the scale; we considered α=.7 as the threshold above which the scale showed desired reliability. The latter measures consistency between the 2 halves of these items, divided according to the precedence and the odd-even sequence of the serial number. Generally, a correlation coefficient of r≥0.70 is considered acceptable.

Test-Retest Reliability
The test-retest reliability of the WEMWBS scale was estimated within a 2-week interval by comparing 2 sets of scores using the intraclass correlation coefficient.

Construct Validity
Confirmatory factor analysis (CFA) of item responses was implemented using the weighted least-squares method to test the structural equation modeling of the hypothesized unidimensionality of the WEMWBS. Statistical analysis of correlations was performed using SAS (version 9.4; SAS Institute Inc), assuming no relationship between the residuals. A stepwise strategy was then used to add the matrix elements with the highest dependencies until sufficient fit statistics were achieved.
The predicted levels of the goodness-of-fit index and adjusted goodness-of-fit index based on degrees of freedom correction were >0.9 and >0.8, respectively.
A root mean square error of approximation (RMSEA) below the accepted level of 0.06 [27] indicates only a tiny number of unintended deviations. A chi-square statistic with P<.05 indicates a considerable amount of actual covariance between measurements that the model cannot explain [28]. Nevertheless, large sample sizes may exaggerate this and are therefore unsuitable [29].

Compatible Validity
This parameter refers to the extent to which the scores of the new scale are relevant to the scores of another scale with the same content and known validity. If the compatibility coefficient is high, the 2 scales measure the same content, and the new scale is equally effective. Based on the range of these 2 scales, we hypothesized a strong correlation between the WEMWBS and the 5-item World Health Organization Well-Being Index (WHO-5) scale for capturing mental well-being, with a coefficient above r=0.7.

Convergence Validity
Convergence validity refers to the similarity of measurement results when different algorithmic methods are grouped to determine the same feature. The evaluation indices usually include composite reliability (CR), factor loading, and average variance extracted (AVE), where AVE greater than 0.5 and CR greater than 0.7 indicate that the aggregation validity is acceptable.

IRT Analysis
IRT, also known as latent trait theory, is a modern psychometric theory proposed to compensate for the limitations of CTT. According to an exploratory factor analysis of CTT, the Chinese WEMWBS is a 1D scale. Therefore, in this study, the responses of the 572 participants to the WEMWBS on a 5-point Likert-type scale were interpreted with the Samejima GRM [30]. These parameters, including a discrimination parameter (referred to as a), a difficulty parameter (referred to as b), item characteristic curves (ICCs), and item information curves (IICs), were administered to implement filtering entry. The discrimination parameter evaluates the strength of the relationship between each item and the scale; the difficulty parameter identifies an item in the potential continuum of the structure that best distinguishes each item. Each item has 5 levels; we used level 1 as a reference and set the remaining 4 levels as difficulty levels. The difficulty level parameter was calculated between 1 and 2, 2 and 3, 3 and 4, and 4 and 5, denoted as thresholds: ≥2, ≥3, ≥4, and 5.
When the discrimination parameter is <0.4 or >3 and the difficulty parameter range exceeds -3 to 3, the item should be considered for deletion. The model simulates ICCs for each option for the 14 items. The first and fifth ICCs change unvaryingly, and the second, third, and fourth ICCs are typically distributed, which can be considered ideal. The more ideal the ICC distribution, the more considerable the corresponding project information. Moreover, a larger item information function results in greater accuracy. Item screening was then carried out. When an item did not meet the requirement for 3 or more parameters, it was considered for deletion based on professional knowledge and expert opinion. These calculations were performed using Stata/MP (version 14.0 for Mac; StataCorp LP).

Descriptive Statistics of the Scale
The total sample of 572 medical staff had a mean score for the Chinese WEMWBS of 38.47 (95% CI 37.45-39.61; SD 13.23; skewness 0.449; kurtosis -0.486) and a median score of 37, indicating a latent skewed trait distribution (Figure 1). An independent-sample t test showed no difference between the total WEMWBS score and gender (t 1 =-1.477; P=.14). A Pearson correlation analysis did not indicate any significant relationship between the score for mental well-being and age; therefore, further validation analyses did not include participant age.

Item Analysis
As shown in Table 1, the values for specific items were significantly different in the high-score and low-score groups (P<.001), meaning that all 14 items could differentiate the 2 groups well, and that none should be discarded. The correlation coefficient between each item and the total score of the instrument ranged from r=0.727 to r=0.900. As seen in Table  2, none of the items reached a rate of 20%, suggesting that there were no ceiling or floor effects.

Reliability Analysis
Internal consistency reliability was good (Cronbach α=.965). The corrected item-total correlation values of the items were all greater than 0.5, indicating a good correlation between items and reliability (Table 3). Two weeks after completing the questionnaire, 35 subjects completed it again; the test-retest reliability was measured at 0.810, indicating that the scale had good stability. The split-half reliability of the scale was λ=0.947 according to the first half and the second half of the serial number, while the value was λ=0.970 according to the odd-even status of the serial number.   Table 4 shows factor loadings for the 14 items, which ranged from 0.714 for item 4 to 0.903 for item 8.

CFA Results
An analysis of mean average precision (MAP) showed that the WEMWBS had a 1D structure. The minor average squared partial correlation was 0.02221, and the most negligible average fourth-power partial correlation was 0.00100. According to the revised MAP test [31], the number of factors was 1.
We conducted a CFA test of the hypothetical single-factor structure of the Chinese WEMWBS and measured the goodness-of-fit of the single confirmatory factor model. Assuming that there was no correlation between the residuals, the initial model fit poorly. The χ 2 /df was 8.437; the comparative fitting index (CFI) was 0.927; the RMSEA was 0.114; for the normed fit index (NFI), delta 1 was 0.918; for the relative fit index (RFI), rho 1 was 0.903; for the incremental fit index (IFI), delta 2 was 0.927; for and the Tacker-Lewis index (TLI), rho was 2.914.

Compatible Validity
There was a significant positive correlation between the Chinese WEMWBS and the WHO-5, with a correlation coefficient of 0.746 (95% CI 0.722-0.794; P<.01).

Combination Reliability and Convergent Validity
A CFA showed that the AVE value was 0.674 (ie, greater than 0.5). The CR value was 0.966 (ie, greater than 0.7), suggesting that the sample had good convergence validity. Table 5 shows the results of the GRM analysis. The discrimination difference indices of the items ranged from 2.026 to 5.098, which demonstrates that the Chinese WEMWBS scores of low-score individuals differed from high-score individuals, corresponding to latent trait sensitivity. The item difficulty of thresholds ≥2, ≥3, ≥4, and 5 ranged from 1.06 to 1.73, 0 to 0.23, 0.56 to 1.06, and 1.12 to 1.66, respectively. The ICCs and IICs for the Chinese WEMWBS are shown in Figures 2 and 3, respectively. The ICCs demonstrated that the sequence of the categories' thresholds for the 14 items was as predicted, meaning that all regimentations were sufficient in including respondents; this finding, in turn, suggests that all categories were adequate based on placing a participant on the scale. The IICs displayed multimodal distribution. The shape of item 8 was the most precipitous and provided more knowledge than the other 13 items. The shape of item 4 was the flattest, indicating that the item provided the least information.

Principal Findings
This is the first study to combine CTT and a GRM incorporating IRT to evaluate psychometric properties of the Chinese-language version of the WEMWBS in a sample of medical staff. Our results confirm the initial hypothesis that the WEMWBS is 1D. Since its establishment in 2006, the WEMWBS has been used in trials of patients and the general population with commendable results according to CTT and the Rasch model [32,33]. Given the broad and complicated spectrum of psychometric processes other than CTT, each with new evaluations and fixed statistical analyses in diverse models [34], we adopted the GRM to evaluate the contribution of the 14 items and their responses to the assessment of subjective well-being (SWB).

Comparisons With Previous Studies
The mean score for the Chinese version of the WEMWBS used in this study was 38.47 (SD 13.23), which is lower than WEMWBS scores in medical staff surveys in other countries (eg, the United Kingdom [35], Pakistan [25,36], and Northern Ireland [37]). This discrepancy may be due to the data having been collected during the outbreak of COVID-19, meaning that the SWB of the medical staff would have been impacted to a certain extent [38]. Moreover, with the aging population of China, medical staff are under a great deal of pressure and need to master multidisciplinary knowledge and skills even as their work intensity increases [39].
The original 1D structure of the WEMWBS, as confirmed by previous studies in other countries [24,27,29], was not fully supported by earlier research in mainland China. This outcome was expected; some studies [28,40] identified a 2D structure that differed from the original assumption.
Researchers have pointed to differences between Eastern and Western cultures to explain this: the original meaning of the individual items might be changed in translated versions, and this alteration could affect the perceived intentions of the target population [22]. Furthermore, previous studies [33] adopted the Likert ordinal interval for a comprehensive rating, in which the 14 individual item scores were added to produce a total score. Bartram [35] found that using only a CFA may lead to misunderstanding, because the total score has a serial order, and the intervals between each score are not necessarily equal. The unidimensional structure was not without problems in this study. First, the model fitting effect was insufficient, because the χ 2 /df was greater than 5, and the RMSEA was greater than 0.08. Only the NFI, TLI, and CFI values supported the unidimensionality of the model. However, the AVE was greater than 0.5, and the CR was greater than 0.8, suggesting a relevant result. Second, the 1D model's factor loadings for the 14 items were similar to the 2-factor model. Third, considering that the number of factors according to the revised MAP test was 1 [31], we adopted the 1D structure. An exemplary configuration of the Chinese WEMWBS would be favorable for facilitating IRT analyses in the future. Administering the Chinese WEMWBS based on IRT could strengthen its sensitivity and precision, guaranteeing that the items reflect the participants' SWB levels.
The proportion of participants selecting the options "sometimes" and "often" was high in this study, suggesting that most respondents had relatively good SWB. To test the accuracy of the results, we examined the 14 items for floor and ceiling effects; we did not find extreme ceiling or floor effects, indicating that the process was reliable. There have been no reports on the distribution of responses to the WEMWBS in mainland China. In addition, the Chinese version of the WEMWBS displayed outstanding reliability, with a Cronbach α of .96, more significant than other studies for Chinese and other language versions [18,19,28,29,41].
The GRM was the best-match IRT model in this study. No previous studies have used the GRM to evaluate the psychometric properties of the WEMWBS. Our study reinforces the use of IRT models and supports existing studies on the psychometric evaluation of the WEMWBS with IRT methods.
The GRM analysis demonstrated that the global performance of WEMWBS items was satisfactory. The ICCs showed that the feedback categories of all the items were ordered and that all categories were presumably at the same point on the continuum [42].

Prospects for Application of the Chinese WEMWBS
Mental health assessment has drawn increasing attention from the Chinese government. In 2017, the Chinese government released the first guidelines to improve mental health in schools, workplaces, and hospitals. The WEMWBS has proven to be a convenient and valuable psychometric tool for academics, medical professionals, and other prominent stakeholders to measure the SWB of medical staff [43,44]. The Chinese WEMWBS has good reliability and validity with comprehensive and understandable content [15,24,26,45].

Limitations
There are some limitations to this study. First, our investigation concentrated on hospitals in Zhejiang and Hunan provinces, and most participants were nurses, suggesting some selection bias. Follow-up research needs a larger sample size that includes therapists, physicians, and surgeons to assess the psychometric properties of the Chinese WEMWBS. Second, the sample size was only 572, which is less than 1000; this may have caused ambiguity in evaluating the IRT model. A larger sample size is needed in future research to confirm our findings. Third, we did not discriminate between medical staff with anxiety or depression when calculating the psychometric properties of the Chinese WEMWBS, which may have caused difficulty in demonstrating the scale's validity. The performance of the Chinese WEMWBS should be further assessed in distinct staff groups.

Conclusion
Detailed provisions were made for the Chinese version of the WEMWBS in this study, and its psychometric properties were evaluated in a group of medical staff. We found that the Chinese WEMWBS has good reliability and validity and that it could be used as a reliable tool to evaluate the SWB of medical staff. It is critical to adopt measures to enable decision-making departments of hospitals to reduce work pressure, improve the SWB of clinical medical staff, improve patient satisfaction, and promote the development of the medical industry in a favorable direction.