Published on in Vol 28 (2026)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/84617, first published .
Characterization of Models for Identifying Physical and Cognitive Frailty in Older Adults With Diabetes: Systematic Review and Meta-Analysis

Characterization of Models for Identifying Physical and Cognitive Frailty in Older Adults With Diabetes: Systematic Review and Meta-Analysis

Characterization of Models for Identifying Physical and Cognitive Frailty in Older Adults With Diabetes: Systematic Review and Meta-Analysis

1School of Basic Medical Sciences and School of Nursing, Chengdu University, No. 2025, Chengluo Avenue, Chengdu, China

2School of Nursing, Yibin Vocational College of Medicine and Health, Yibin, China

3Nursing Department, The Fourth People's Hospital of Yibin, Yibin, China

4Rehabilitation College, Sichuan Health Rehabilitation Vocational College, Zigong, China

5Medical and Nursing College, Yibin Vocational College of Medicine and Health, Yibin, China

6Nursing Department, Wenjiang District People's Hospital, No.86 Kangtai Road Wenjiang District, Chengdu, Sichuan Province, China

*these authors contributed equally

Corresponding Author:

Ru Gao, PhD


Background: Physical frailty and cognitive frailty are increasingly recognized as critical geriatric syndromes among older adults with diabetes, contributing to adverse outcomes such as disability, hospitalization, and mortality. Early identification of individuals at high risk is therefore essential for timely prevention and intervention. Although a growing number of prediction models have been developed for this population, evidence regarding their methodological rigor, predictive performance, and generalizability remains fragmented.

Objective: This study aims to evaluate and characterize existing models for detecting or predicting physical frailty and cognitive frailty in older adults with diabetes.

Methods: PubMed, Embase, Web of Science, China National Knowledge Infrastructure (CNKI), Wanfang, and VIP databases were searched from their inception to December 2025. Retrospective, cross-sectional, and prospective studies that developed or validated models predicting frailty or cognitive frailty in older adults with diabetes were included. The Prediction Model Study Risk Of Bias Assessment Tool (PROBAST) was used to assess risk of bias and applicability. Random effects meta-analyses using the Hartung-Knapp-Sidik-Jonkman method were conducted to synthesize model performance, including the pooled area under the receiver operating characteristic curve (AUC). Heterogeneity was explored through subgroup and sensitivity analyses. Small study effects were evaluated using funnel plots, the Egger test, and the Deeks funnel plot asymmetry test.

Results: A total of 24 studies comprising 32 diagnostic models were included. The overall pooled analysis demonstrated an AUC of 0.851 (95% CI 0.820‐0.882) with a 95% prediction interval of 0.710‐0.992, sensitivity of 0.810 (95% CI 0.740‐0.850), and specificity of 0.850 (95% CI 0.810‐0.890). Statistical comparisons in the modeling approach revealed that logistic regression models achieved a significantly higher pooled AUC (0.850) compared with machine learning models (0.785; P=.003). Similarly, retrospective studies demonstrated superior performance, with an AUC of 0.900 compared with 0.843 for cross-sectional studies (P=.03). Conversely, no significant differences were observed across subgroups stratified by data source (P=.42), patient characteristics (P=.77), validation methods (P=.16), or specific outcomes (P=.94). The most common predictors identified were depression, age, and regular exercise; however, all included studies were assessed as having a high risk of bias.

Conclusions: To our knowledge, this review provides the first comprehensive synthesis of models for risk stratification of physical frailty and cognitive frailty in older adults with diabetes. The findings indicate that existing models demonstrate satisfactory discrimination; specifically, CIs confirmed a robust average effect, while prediction intervals suggested that performance in future settings, though variable, is likely to remain acceptable. However, clinical utility is currently constrained by high risk of bias and limited external validation. Future research must prioritize rigorous, prospective, multicenter studies adhering to standard reporting guidelines (eg, TRIPOD [Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis]) to establish valid, generalizable, and clinically actionable prognostic instruments.

Trial Registration: PROSPERO CRD420251019308; https://www.crd.york.ac.uk/PROSPERO/view/CRD420251019308

J Med Internet Res 2026;28:e84617

doi:10.2196/84617

Keywords



Background

Diabetes mellitus has evolved into one of the most critical global public health challenges of the 21st century. According to the International Diabetes Federation (IDF), approximately 589 million adults were living with diabetes globally in 2024, with projections indicating this number could rise to 853 million by 2050 [1]. In the absence of optimal management, patients with diabetes are predisposed to micro- and macrovascular complications that significantly shorten life expectancy [2,3]. Recent data from the Global Burden of Disease study indicate that the prevalence of diabetes increases with age, reaching 24.4% among individuals aged ≥75 years [4]. Older adults are especially vulnerable to diabetes-related complications due to greater medical complexity and a higher likelihood of frailty compared with younger populations [5].

Frailty is regarded as a consequence of the decline in function and reserve of multiple organs with age, particularly involving the neuromuscular, endocrine, and immune systems [6]. Notably, frailty is particularly prevalent among patients with diabetes, with reported prevalence rates ranging from 10.4% to 20.8% across different studies [7-10]. Furthermore, previous studies indicated that individuals with diabetes have an approximately 1.6-fold higher risk of developing frailty than those without diabetes [11]. However, frailty frequently co-occurs with cognitive impairment [12]; their simultaneous presence is termed cognitive frailty, a distinct clinical entity that represents a crucial subtype of frailty requiring specific attention.

Once established, frailty typically follows a progressive trajectory, increasing the likelihood of adverse clinical outcomes such as falls, incontinence, rapid functional decline, pressure ulcers, and delirium [13-17]. In addition to these risks, frailty is linked to higher rates of hospitalization, emergency department visits, prolonged inpatient stays, and mortality [18,19]. Of particular concern is that the coexistence of physical and cognitive impairment further amplifies these risks, leading to greater adverse outcomes [20,21].

Evidence suggests a bidirectional relationship between diabetes and frailty, often creating a cycle where each condition exacerbates the other [22]. The presence of physical or cognitive frailty introduces significant complexity to diabetes management [23]. In frail patients, physiological deterioration and multi-organ dysfunction fundamentally alter the pharmacokinetics of antihyperglycemic agents [24]. Specifically, sarcopenia, increased adiposity, and compromised renal or hepatic clearance heighten the susceptibility to adverse drug events, such as hypoglycemia and unintended weight loss. Additionally, the decreased caloric intake typical of this population further aggravates the risk of hypoglycemia and hinders recovery from hypoglycemic events [25,26].

In recent years, physical frailty and cognitive frailty have been increasingly conceptualized as dynamic and potentially preventable or reversible conditions, especially when identified at an early stage [27,28]. In patients with diabetes, nonpharmacological interventions—including structured physical activity, nutritional optimization, and multimodal strategies—have demonstrated potential benefits for mitigating frailty progression. Consequently, early identification of individuals at high risk has become a cornerstone of effective prevention and management strategies. To this end, diagnostic and prognostic models designed to detect physical or cognitive frailty integrate multiple demographic, clinical, and psychosocial factors to estimate an individual’s risk profile. These models serve to support health care professionals with stratifying risk, facilitating timely and targeted interventions, and optimizing the allocation of health care resources.

However, the clinical application of these models may be hindered due to insufficient evidence regarding their performance, risk of bias, and applicability in routine practice. Although individual studies exist, no systematic review has yet comprehensively evaluated these models for both physical frailty and cognitive frailty in older adults with diabetes. Therefore, it is essential to conduct a systematic review that thoroughly assesses the methodological quality and clinical applicability of existing models.

Objectives

The aim of this systematic review and meta-analysis was to evaluate the methodological quality and clinical utility of existing models designed for the identification or prediction of physical frailty and cognitive frailty in older adults with diabetes. The specific aims included the following: (1) to determine the characteristics and most frequent predictors of risk prediction models developed for physical frailty and cognitive frailty in this population; (2) to analyze the methodological limitations and risk of bias of these models using the Prediction Model Study Risk Of Bias Assessment Tool (PROBAST); and (3) to investigate the pooled predictive performance of these tools to assess their potential for real-world clinical implementation.


Search Strategy and Selection Criteria

This systematic review and meta-analysis was registered on PROSPERO (CRD420251019308). The study followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) expanded checklist [29] and the PRISMA extension for diagnostic test accuracy (PRISMA-DTA) [30], while the literature search was conducted and reported in accordance with PRISMA-S (PRISMA Search) [31]. A comprehensive literature search was conducted across the PubMed, EMBASE, Web of Science, China National Knowledge Infrastructure (CNKI), Wanfang, and VIP databases, covering records from database inception to December 2025. We developed the search strategy based on the PITROS (Participants, Index Test, Target Conditions, Reference Standard, Outcomes, Settings) framework (Table S1 in Multimedia Appendix 1). The strategy combined Medical Subject Headings (MeSH) with free-text terms (Table S2 in Multimedia Appendix 1). The search strategy was independently evaluated by another librarian in accordance with the PRESS (Peer Review of Electronic Search Strategies) guidelines. In addition, references of relevant studies, guidelines, and reviews were manually searched, and citation tracking was performed using the Web of Science database to identify other relevant studies. No study registries were searched.

In clinical settings, prediction encompasses both diagnostic models (estimating the probability of a particular condition being present) and prognostic models (forecasting the likelihood of future outcomes) [32,33]. This review included all primary studies describing the development and/or validation of prediction models, tools, or scores for estimating the risk of physical frailty or cognitive frailty in older adults with diabetes. The inclusion criteria were (1) participant age ≥60 years and presence of diabetes, including diabetes only and diabetes with other comorbidities or complications; (2) research content involving the construction of a predictive model for identifying physical frailty or cognitive frailty in individuals with diabetes; (3) retrospective studies, cross-sectional studies, and prospective studies; and (4) published in English or Chinese. The exclusion criteria were (1) duplicate publications; (2) reviews, case reports, or conference abstracts; (3) literature that could not be obtained from the original text; (4) literature that could not provide valid data; and (5) studies in which the model contained only a single predictor.

Data Extraction

This study used the reference management software EndNote X9 to identify and remove duplicate records. We then eliminated literature unrelated to the research topic by screening titles and abstracts. Finally, the full texts were reviewed to identify studies satisfying both the inclusion and exclusion criteria. Upon completion of the literature screening process, a data extraction form was devised in accordance with the Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modeling Studies (CHARMS) [34]. The contents included (1) basic information of the study including author, year of publication, study location, study design, and source of data used; (2) patient characteristics including sample size and patient diagnosis; (3) number of predictors, predictor type, most important predictors, and predictor screening methods; (4) model characteristics including outcome indicators, modeling methods, model verification methods, and missing data processing methods; (5) model presentation; and (6) model performance as measured using the area under the curve (AUC), sensitivity, and specificity. In studies where multiple models were developed and the best-performing model was explicitly reported, we included the best model in the analysis. For studies that reported multiple models without specifying a preferred one, we selected the model with the highest AUC to represent the study. We prioritized internally validated estimates, resorting to development performance or external validation data only when internal estimates were unavailable. For studies with unclear or incomplete data, attempts were made to contact the corresponding authors. To ensure the consistency and accuracy of the final data, two researchers (XW and SM) independently extracted the data, and the extracted results were compared and checked. Inconsistencies were resolved through discussion and consultation, and a third researcher (RG) was asked to assist in judgment when necessary.

Quality Assessment

Two reviewers independently used PROBAST [35] to appraise the risk of bias and the applicability of each included study. Disagreements were resolved by discussion or by consulting a third reviewer. PROBAST evaluates 4 domains: participants, predictors, outcome, and analysis. Each domain is rated as high, unclear, or low risk of bias. The applicability evaluation focuses on the 3 areas of research (subjects, predictors, and results), and its evaluation process is similar to the risk of bias assessment.

Statistical Analysis

Meta-analyses were conducted using Stata 18 (specifically the midas and metan commands) and R version 4.3.2 (the metafor package). Using AUC values derived from models, we calculated the pooled AUC and produced an AUC forest plot. An AUC below 0.7 signified inadequate discrimination, an AUC ranging from 0.7 to 0.8 denoted moderate discrimination, and an AUC exceeding 0.8 suggested excellent discrimination [36]. Additionally, models that reported sample sizes and sensitivity and specificity were extracted. The true positive (TP), false positive (FP), false negative (FN), and true negative (TN) for each model were calculated using the formulas sensitivity=TP/(TP+ FN) and specificity=TN/(FP+ TN). Pooled sensitivity and pooled specificity were computed based on TP, FP, FN, and TN, and corresponding forest plots of sensitivity and specificity were constructed. Subsequently, a summary receiver operating characteristic curve was generated. The degree of heterogeneity across the models under consideration was evaluated using the Q test and measured using the I² statistic (where an I² value <25% signifies low heterogeneity, between 25% and 50% indicates moderate heterogeneity, and >50% denotes high heterogeneity) [37]. To account for between-study heterogeneity and provide more robust variance estimation, we used the Hartung-Knapp-Sidik-Jonkman method for a random effects meta-analysis on the logit scale [38]. For studies with significant heterogeneity, subgroup analyses, sensitivity analyses, or only descriptive analyses were performed. The presence of small study effects was evaluated using the Egger test, funnel plots, and the Deeks funnel plot [39]. A P value <.05 was deemed to indicate statistical significance.


Study Selection

The initial search identified a total of 4873 records. After removing duplicates, 3124 records remained. Following title and abstract screening, 88 studies were selected for full-text review. We could not retrieve 4 studies. During the full-text assessment, 18 studies were excluded because the study population did not have diabetes mellitus. Additionally, 16 studies were excluded because the outcome variable was not physical frailty or cognitive frailty; 16 were excluded because they were nonoriginal studies, such as reviews and meta-analyses; and 10 were excluded due to an inappropriate study design. Ultimately, 24 studies [40-63] were included in this meta-analysis. The literature screening process and results are illustrated in Figure 1.

Figure 1. Literature screening flow chart. CNKI: China National Knowledge Infrastructure.

Characteristics of Included Studies

The specific characteristics of the included studies are detailed in Tables 1 and 2. A total of 24 studies reported 32 diagnostic models for physical frailty and cognitive frailty in older adults with diabetes. The publication years of these papers spanned from 2023 to 2025. All the studies were conducted in China. The number of participants in the included studies ranged from 152 to 1436. The prevalence of frailty varied from 10.1% to 51.2%, while that of cognitive frailty ranged from 20.3% to 62.1%. Regarding study design, 21 studies [41-46,48-60,62,63] used a cross-sectional design, while 3 were retrospective studies [40,47,61]. In addition, 18 studies [41-48,50,51,53-59,63] were conducted at single centers, whereas 6 studies [40,49,52,60-62] were multicenter studies. In terms of the target population, 7 studies [40-42,57,58,60,61] included all types of diabetes, 14 studies [43,45,47-51,53-56,59,62,63] focused specifically on patients with type 2 diabetes, and 3 studies [44,46,52] included patients with diabetes and other comorbidities or complications. Physical frailty was the primary outcome in 13 studies [40-42,44,45,51,52,55,57,58,60-62], while the remaining 11 studies [43,46-50,53,54,56,59,63] investigated cognitive frailty. For frailty screening, 22 studies [40-51,53-57,59-63] used the Fatigue, Resistance, Ambulation, Illnesses, and Loss of Weight (FRAIL) scale, while 2 studies [52,58] used the Tilburg Frailty Indicator. To evaluate cognitive function in the 11 cognitive frailty studies, the Montreal Cognitive Assessment was the predominant tool used in 10 studies [43,46-50,53,54,59,63], whereas the Mini-Mental State Examination was used in only 1 study [56]. Regarding data handling, missing data were not reported in 9 studies [42,43,45-47,49,51,56,59], 3 studies [40,57,62] used imputation methods, and 12 studies [41,44,48,50,52-55,58,60,61,63] excluded participants with missing data. Continuous variables were maintained as continuous in 14 studies [40,42-44,47,51,52,55-59,61,62] and transformed into categorical variables in 10 studies [41,45,46,48-50,53,54,60,63]. The majority of models were presented as nomograms: Specifically, 13 studies [40,43,45,48,50,51,54-57,59,61,63] presented results solely as nomograms, 4 studies [41,42,44,46] provided both nomograms and full equations, 1 study [52] presented a nomogram and risk chart, and 1 study [58] presented a nomogram and decision tree. Additionally, 3 studies [53,60,62] developed risk sum scores, and 2 studies [47,49] provided logistic regression (LR) equations.

Table 1. Overview of the basic characteristics of included studies (n=24) identifying physical and cognitive frailty in older adults with diabetes.
AuthorYearOutcomeDefinition of outcomeSample size, nEvent rate, n (%)Modeling algorithms usedPopulationInternal validationPredictors, nModel presentation
Wu et al [62]2025FrailtyFRAILa509148 (29.1)LRb, SVMc, GBMd, RFe, CatBoostT2DMfCross-validation7Sum score
Xiao et al [61]2025FrailtyFRAIL1107113 (10.2)LRDiabetesRandom split7Nomogram
Wang et al [55]2025FrailtyFRAIL15247 (31. 1)LRT2DMRandom split4Nomogram
Du et al [45]2024FrailtyFRAIL45883 (18.1)LRT2DMBootstrap8Nomogram
Tang et al [51]2024FrailtyFRAIL566213 (37.6)LRT2DMRandom split6Nomogram
Wang et al [52]2024FrailtyTFIg491216 (44.0)LR, NNhDiabetes with diabetic footRandom split8Nomogram and risk chart
Dang [42]2024FrailtyFRAIL360115 (32.0)LRDiabetesRandom split (+ external)5Nomogram and full equation
Xi [57]2024FrailtyFRAIL338130 (38.5)LRDiabetesRandom split6Nomogram
Cheng [41]2024FrailtyFRAIL317118 (37.2)LRDiabetesBootstrap5Nomogram and full equation
Yin [58]2024FrailtyTFI379194 (51.2)LR, DTiDiabetesBootstrap8Nomogram and decision tree
Zheng [60]2024FrailtyFRAIL380112 (29.5)RF, SVM, KNNjDiabetesCross-validation7Sum score
Bu et al [40]2023FrailtyFRAIL1436145 (10.1)LRDiabetesRandom split7Nomogram
Dong et al [44]2023FrailtyFRAIL485211 (43.5)LRDiabetes with diabetic retinopathyBootstrap7Nomogram and full equation
Ma et al [49]2025Cognitive frailtyFRAIL, MoCAk25376 (30.0)LRT2DMNone5Full equation
Wang et al [53]2025Cognitive frailtyFRAIL, MoCA20280 (39.6)DTT2DMRandom split11Sum score
Liang et al [46]2024Cognitive frailtyFRAIL, MoCA26593 (35.1)LRDiabetes with COPDlRandom split (+ external)7Nomogram and full equation
Yu and Yu [63]2024Cognitive frailtyFRAIL, MoCA430132 (30.7)LRT2DMBootstrap (+ external)7Nomogram
Zhang et al [59]2024Cognitive frailtyFRAIL, MoCA21566 (30.7)LRT2DMBootstrap5Nomogram
Liu [47]2024Cognitive frailtyFRAIL, MoCA220137 (62.1)LRT2DMRandom split4Full equation
Deng et al [43]2023Cognitive frailtyFRAIL, MoCA31587 (27.6)LRT2DMRandom split6Nomogram
Wang and Xu [54]2023Cognitive frailtyFRAIL, MoCA26285 (32.4)LRT2DMBootstrap8Nomogram
Wang et al [56]2023Cognitive frailtyFRAIL, MMSEm32185 (26.5)LRT2DMBootstrap5Nomogram
Liu [48]2023Cognitive frailtyFRAIL, MoCA48398 (20.3)LRT2DMRandom split6Nomogram
Meng [50]2022Cognitive frailtyFRAIL, MoCA508117 (23.0)LRT2DMBootstrap (+ external)6Nomogram

aFRAIL: Fatigue, Resistance, Ambulation, Illness, and Loss of Weight scale.

bLR: logistic regression.

cSVM: support vector machine.

dGBM: gradient boosting machine.

eRF: random forest.

fT2DM: type 2 diabetes mellitus.

gTFI: Tilburg Frailty Indicator

hNN: neural network.

iDT: decision tree.

jKNN: k-nearest neighbors.

kMoCA: Montreal Cognitive Assessment.

lCOPD: chronic obstructive pulmonary disease.

mMMSE: Mini-Mental State Examination.

Table 2. Methodological and clinical characteristics of included studies (n=24) identifying physical frailty and cognitive frailty in older adults with diabetes.
CharacteristicStudies, n (%)
Study design
 Retrospective studies3 (13)
 Cross-sectional study21 (88)
Source of data used
 Single center18 (75)
 Multicenter6 (25)
Missing data handling
 Not reported9 (38)
 Exclusion12 (50)
 Imputation3 (13)
Handling of continuous data
 Continuous14 (58)
 Categorical or dichotomous10 (42)
Feature selection
 Univariate analysis6 (25)
 Multivariate analysis7 (29)
 Univariate analysis and multivariate analysis11 (46)
Calibration method
 Hosmer-Lemeshow test2 (8)
 Calibration plot4 (17)
 Hosmer-Lemeshow test and calibration plot14 (58)
 None4 (17)
Validation method
 Internal validation19 (79)
 External validation and internal validation4 (17)
 None1 (4)

Characteristics of Included Prediction Models

Regarding modeling methods, among the 32 models included, LR was the most commonly used algorithm. LR analyses were used in 22 models, while 10 models used machine learning (ML) techniques, including random forest (n=2), support vector machines (n=2), decision trees (n=2), k-nearest neighbors (n=1), CatBoost (n=1), gradient boosting machine (n=1), and neural networks (n=1). Model discrimination was reported for all models, with AUC values ranging from 0.703 to 0.983 (Table S3 in Multimedia Appendix 1). Specificity and sensitivity were reported in 17 studies [40-42,44,46-50,52,53,56-58,60-62] involving 25 models. Specifically, sensitivity ranged from 0.102 to 0.955, and specificity varied from 0.505 to 0.990. However, model calibration was not reported in 4 studies [52,53,60,62]. P values from both the Hosmer-Lemeshow test and calibration plots were used in 14 studies [40-44,48,50,51,54-59], 2 studies [47,49] used P values only from the Hosmer-Lemeshow test, and 4 studies [45,46,61,63] used only calibration plots. Regarding model validation, 1 study [49] developed models without validation, and 19 studies [40,41,43-45,47,48,51-62] conducted only internal validation without external validation.

Features

All features covered a wide range of factors, including sociodemographic characteristics, lifestyle factors, health-related factors, mental health status, laboratory test indicators, and anthropometric measurements. A total of 33 features were involved in the studies. The number of features incorporated into each study varied from 4 to 11. Among the features, the 5 most frequently occurring were depression, age, regular exercise, social activity, and duration of diabetes. The frequency distribution of all features is illustrated in Figure 2.

Figure 2. Frequency of predictors in the included studies. GFR: glomerular filtration rate; HbA1c: glycated hemoglobin; L/A: ratio of serum leptin to adiponectin.

Quality Assessment

Overview

The PROBAST tool was used to assess the risk of bias and the applicability of the included prediction model studies (Figure 3 and Table S4 in Multimedia Appendix 1). According to the established criteria, all 24 studies, which encompassed 32 models, were identified as having a high risk of bias. In terms of applicability, 13 studies [43,44,46-50,52-54,56,59,63], including 14 models, were deemed to have high concerns regarding applicability. Conversely, the remaining 11 studies [40-42,45,51,55,57,58,60-62], which included 18 models, were considered to have low concerns regarding applicability. Notably, 4 studies [52,58,60,62] included multiple models each; however, there was no difference in the quality assessment results between the models within these studies.

Figure 3. Prediction Model Study Risk Of Bias Assessment Tool (PROBAST) risk of bias (ROB) and applicability assessment for all included studies.
Risk of Bias Assessment

Within the participant domain, 4 studies [40,43,47,61] were recognized as exhibiting a high risk of bias. Of these, 3 studies [40,47,61] were deemed as having a high risk owing to their study designs, while the remaining study [43] was classified as such due to the exclusion of specific subgroups that could potentially alter the performance of the prediction model. In the predictor domain, 2 studies [40,63] were assessed as having a significant risk of bias due to the use of outcome information in the evaluation of predictors, 1 study [55] was rated as having an unclear risk of bias because the researchers did not report whether they used the same assessment measures when evaluating the predictors. In the outcome domain, 3 studies [40,62,63] had a significant risk of bias because the definition of outcomes included ≥1 predictor, 2 studies [52,53] were rated as having a high risk of bias due to the potentially inappropriate time interval between predictor assessment and outcome determination, and 1 study [49] was deemed to be at unclear risk of bias as they did not report information on the method of outcome classification. In the analysis domain, all studies were judged to have a high risk of bias. Current guidance recommends that studies developing predictive models achieve at least 20 events per variable (EPV). However, 13 studies [43,45,46,48,49,53-56,59-61,63] did not meet this requirement. Moreover, 10 studies [41,45,46,48-50,53,54,60,63] transformed continuous variables into categorical variables, either in part or entirely, and the authors did not report whether standard definitions were used for the categorization; 1 study [40] partially excluded participants for unreasonable reasons. Regarding the handling of missing data, 12 studies [41,42,44,48,50,52-54,58,60,61,63] directly excluded cases with missing data, while 9 studies [43,45-47,49,51,55,56,59] did not explicitly report whether data were missing. In addition, 6 studies [44,49,50,52,54,59] did not avoid selecting variables based solely on univariate analysis; 3 studies [52,53,60] did not comprehensively assess the predictive performance of their models, using only discrimination measures without calibration; 6 studies [41,44,48,50,55,57] neglected to evaluate the risk of overfitting, underfitting, or optimism that could bias the apparent performance of their predictive models; 1 study [49] developed models without validation; and 19 studies [40,41,43-45,47,48,51-62] conducted only internal validation without external validation.

Applicability Risk Assessment

In the participant domain, 3 studies [44,46,52] had a high risk of applicability concerns due to the inclusion of individuals with other comorbidities or complications.

Meta-Analysis

A random effects meta-analysis using the Hartung-Knapp-Sidik-Jonkman method was performed to evaluate the predictive performance at both the study and model levels. Regarding the analysis of the 24 included studies, the overall pooled AUC was 0.851 (95% CI 0.820‐0.882), with a 95% prediction interval (PI) of 0.710 to 0.992 (P<.001; I²=92.0%; Figure 4). When analyzing the 32 models, the overall pooled AUC was 0.829 (95% CI 0.802‐0.856), with a 95% PI of 0.686 to 0.972 (P<.001; I²=92.5%; Figure S1 in Multimedia Appendix 1).

Figure 4. Forest plot of the random effects meta-analysis of pooled area under the curve (AUC) estimates for 29 validation models [40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62]. HKSJ: Hartung-Knapp-Sidik-Jonkman.

Additional data on sample size, sensitivity, and specificity were extracted from 17 studies to calculate TP, FP, FN, and TN (Table S5 in Multimedia Appendix 1). Based on these values, the pooled sensitivity was 0.810 (95% CI 0.740‐0.850; I²=92.26%), as illustrated in the forest plot. The pooled specificity was 0.850 (95% CI 0.810‐0.890; I²=92.44%), with the corresponding forest plot also presented (Figure 5B). Furthermore, a summary receiver operating characteristic curve was generated, as depicted in Figure 5A. These results indicate significant heterogeneity across the models regarding AUC, sensitivity, and specificity.

Figure 5. (A) Summary receiver operating characteristic curve and (B) forest plots of the random effects meta-analysis of the sensitivity and specificity for 22 validation models [40-42,44,46-50,52,53,56-58,60-62]. AUC: area under the curve; SENS: sensitivity; SPEC: specificity; SROC: summary receiver operating characteristic.

Sensitivity Analysis

Sensitivity analyses were undertaken by sequentially removing one study at a time. The point estimates obtained after excluding any single study all fell within the 95% CI of the overall effect size (Figure 6A). This indicates that the removal of any individual study did not significantly influence the pooled AUC. Therefore, the combined results were relatively stable.

Figure 6. Assessment of robustness, heterogeneity, and small study effects in the meta-analysis of 24 studies evaluating physical frailty and cognitive frailty in older adults with diabetes: (A) sensitivity analysis, (B) subgroup forest plot stratified by modeling approach (32 models), and (C) funnel plot [40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63]. AUC: area under the curve; HKSJ: Hartung-Knapp-Sidik-Jonkman; LR: logistic regression; ML: machine learning.

Subgroup Analysis

Subgroup analyses were performed based on modeling approach, study design, data source, population, outcome, and validation method (Table 3 and Figure 6B). Significant differences were observed in modeling approaches (P=.003), where LR models yielded a higher pooled AUC (0.850) compared with ML models (0.785). Similarly, study design showed significant heterogeneity (P=.03), with retrospective studies demonstrating superior diagnostic performance (AUC=0.900) than cross-sectional studies (AUC=0.843). However, no statistically significant differences were found across subgroups stratified by data source (P=.42), patient characteristics (P=.77), validation methods (P=.16), or specific outcomes (P=.94).

Table 3. Subgroup analysis of the pooled area under the curve (AUC) of studies of physical frailty and cognitive frailty in older adults with diabetes to explore potential sources of heterogeneity (n=24).
SubgroupStudies, nAUC (95% CI)I2, %P value
Population0.77
Diabetes210.849 (0.818-0.879)
Diabetes with comorbidities or complications30.866 (0.757-0.975)
Study design79.6.03
Cross-sectional study210.843 (0.812-0.875)
Retrospective study30.900 (0.861-0.939)
Data source0.42
Multicenter60.871 (0.815-0.926)
Single center180.844 (0.809-0.878)
Outcome0.94
Physical frailty130.851 (0.806-0.896)
Cognitive frailty110.849 (0.812-0.886)
Validation42.5.16
 Random split120.875 (0.836-0.914)
 Bootstrap90.829 (0.779-0.879)
 Cross-validation20.803 (0.710-0.896)
 None10.896 (0.842-0.950)

Small Study Effects Assessment

We examined small study effects using the Egger test, funnel plots, and the Deeks funnel plot. The Egger test yielded a coefficient of −1.07 (P=.40), indicating no statistically significant small study effects. This result aligns with the visual inspection of the contour-enhanced funnel plot (Figure 6C), which displayed a relatively symmetrical distribution of the studies, suggesting no obvious small study effects among the included studies. Furthermore, the Deeks funnel plot asymmetry test yielded a nonsignificant P value of .94, providing no evidence of significant small study effects (Figure S2 in Multimedia Appendix 1).

However, these results must be interpreted with caution. Funnel plot symmetry or nonsignificant statistical tests do not definitively rule out publication bias. Conversely, asymmetry may arise from heterogeneity or methodological limitations not merely from publication bias. Therefore, the potential for publication bias cannot be entirely excluded, particularly due to limitations in study selection. Although our search strategy was designed to be global, the final pool of eligible studies consisted exclusively of research conducted in China. Additionally, the restriction of inclusion criteria to English and Chinese languages may have introduced language bias by excluding relevant data published in other languages.


Principal Findings

To our knowledge, this is the first systematic review and meta-analysis to comprehensively evaluate the performance of prediction models for physical frailty and cognitive frailty specifically in older adults with diabetes. Our analysis included 24 studies encompassing 32 models. At the study level, the pooled AUC was 0.851 (95% CI 0.820‐0.882), while the model-level analysis yielded a similarly high pooled AUC of 0.829 (95% CI 0.802‐0.856). In addition, the pooled sensitivity was 0.810, and the pooled specificity was 0.850, indicating that these models demonstrated reasonable discriminative performance for identifying physical frailty and cognitive frailty in this high-risk population.

A notable finding from our subgroup analysis was the difference in performance based on the modeling approach (P=.003). LR models yielded a higher pooled AUC (0.850) than ML models (0.785). However, this finding should be interpreted with caution. Consistent with the systematic review by Christodoulou et al [64], we observed no consistent performance advantage of ML over LR analysis in this dataset. The apparent disparity may be attributed to the smaller number of ML studies included, heterogeneity in populations and predictors, and potential risk of bias, rather than an inherent superiority of LR. For instance, Wang et al [52] and Yin [58] developed models using both approaches and found LR performed slightly better. Ultimately, there is no “one-size-fits-all” modeling method; performance often depends on the specific data structure and clinical context. Therefore, future research should prioritize rigorous comparisons of multiple modeling approaches—including proper hyperparameter tuning for ML—to identify the optimal strategy for specific physical frailty and cognitive frailty prediction scenarios. Additionally, we observed that retrospective studies yielded significantly higher AUC values (0.900) than cross-sectional studies (0.843; P=.03). This phenomenon likely stems from the inherent selection bias and better data quality control often present in retrospective cohorts, potentially leading to overoptimistic performance estimates.

The diagnostic models included in this review possess meaningful clinical implications, facilitating a shift toward the precise identification and risk stratification of physical frailty and cognitive frailty in clinical settings. Our analysis revealed that the most frequently used features—depression, age, regular exercise, social activity, and diabetes duration—provide concrete metrics for identifying these concurrent conditions. Notably, depression emerged as the most consistent and prominent feature across multiple studies, highlighting its strong correlation with both physical frailty and cognitive frailty. Existing evidence indicates a bidirectional relationship between depression and these conditions, potentially mediated through inflammatory pathways, endocrine dysregulation, and overlapping symptoms such as fatigue and psychomotor slowing [65,66]. Consequently, assessing mental health not only is vital for psychological well-being but also serves as a critical entry point for identifying patients who may already be experiencing physical frailty or cognitive frailty. Advanced age was also a predominant feature, consistent with reports by Kong et al [9] and Wang et al [67]. This association likely reflects immunosenescence, chronic inflammation, and metabolic dysregulation, which collectively contribute to sarcopenia and functional decline [68-70]. These findings suggest that age remains a fundamental stratification factor, warranting heightened clinical vigilance for physical and cognitive frailty in older cohorts. Regular exercise was identified as a powerful discriminatory feature. Physiologically, physical activity is known to reduce inflammation, preserve muscle function, and support cognitive health [71]. In the context of these diagnostic models, the absence of regular exercise serves as a robust, easily accessible clinical marker for detecting potential physical frailty and cognitive frailty. This allows health care providers to efficiently target vulnerable populations in community settings where elaborate geriatric assessments may be impractical. Similarly, lower levels of social activity emerged as a significant indicator. This suggests that social isolation often co-occurs with physical frailty and cognitive frailty, making social history a valuable component of the risk stratification process for older adults with diabetes. Finally, the duration of diabetes was a frequent feature in the included models. The strong association between longer disease duration and physical frailty and cognitive frailty likely reflects the cumulative burden of chronic hyperglycemia, insulin resistance, and related complications over time [8,26]. As diabetes duration increases, physiological and cognitive reserves decline, thereby increasing the probability of concurrent physical frailty and cognitive frailty. These insights emphasize that patients with a long history of diabetes represent a high-risk group requiring prioritized screening and comprehensive management.

Comparison With Prior Work

Previous systematic reviews and narrative summaries on physical frailty or cognitive frailty in older adults with diabetes have primarily concentrated on estimating prevalence, identifying associated risk factors, or describing frailty phenotypes, rather than systematically evaluating multivariable prediction or identification models [8,9,72,73]. Moreover, most prior reviews did not attempt a quantitative synthesis of model performance, likely due to the substantial methodological and clinical heterogeneity across studies, which is commonly encountered in the evaluation of prediction models. Consistent with this literature, we observed considerable variation among included studies in terms of study populations, definitions of physical frailty and cognitive frailty, predictor selection, model development strategies, and validation approaches. These differences reflect the evolving and fragmented nature of model development in this field and pose challenges for direct comparison across studies. Nevertheless, by conducting a meta-analysis of discrimination performance, particularly through the synthesis of the AUC, our study provides a quantitative overview of the overall performance of existing models for identifying physical frailty and cognitive frailty in older adults with diabetes.

Heterogeneity

Substantial heterogeneity was observed across the included studies (I²>90% for sensitivity, specificity, and AUC). This is a common challenge in diagnostic meta-analyses and may be attributed to variations in study design, population characteristics, and modeling methodologies. Our subgroup analysis identified that the modeling approach (ML vs LR) and study design (retrospective vs cross-sectional) were significant sources of heterogeneity (P<.05). However, other factors such as data source (single vs multicenter) and outcome definitions (physical frailty vs cognitive frailty) did not significantly contribute to the observed variance. It is also important to note the variability in feature selection across studies. With 33 different features identified—ranging from depression and age to regular exercise—the lack of a standardized set of predictors likely contributes to the heterogeneity in model performance. This diversity reflects the multifaceted pathophysiology of frailty in diabetes but complicates the direct comparison of models.

Importantly, beyond the conventional I² statistic, we further quantified between-study heterogeneity using 95% PIs, which provide a clinically meaningful estimate of the expected range of model performance in future settings. Although I² values exceeding 90% indicate substantial relative heterogeneity, they do not convey the absolute extent to which predictive performance may vary across populations and clinical contexts. In contrast, PIs directly address this limitation by reflecting the dispersion of true effects on the original AUC scale. At the study level, the pooled AUC of 0.851 was accompanied by a 95% PI ranging from 0.710 to 0.992, indicating that, although predictive performance may vary considerably across different real-world settings, most future applications are still likely to achieve at least acceptable discrimination.

Methodological Quality and Risk of Bias

Evaluation using the PROBAST checklist indicated that all included studies exhibited a high risk of bias, predominantly in the analysis domain. Consequently, the pooled performance estimates reported in this review should be interpreted with caution, likely representing “best-case scenarios” or optimistic estimates rather than robust predictions of real-world performance.

In the participant domain, specifically regarding data sources, although retrospective designs were identified as a source of bias for a few studies, the overarching issue remains the analytical approach. We recommend using prospective data or registry data for model development in future optimization efforts to reduce the risk of bias arising from data sources. Additionally, the evaluation of model applicability indicated that certain studies included not only patients with diabetes but also those with other comorbidities or complications. These factors limited the applicability of the respective models to the general diabetes population. In the outcome domain, some studies inappropriately incorporated outcome-related information into the predictor assessment, leading to information leakage and inflated model performance. In addition, inadequate reporting regarding the consistency of predictor measurement raised concerns about reproducibility in at least one study. In the outcome domain, several studies were judged to have a high risk of bias due to problematic outcome definitions.

The analysis domain had the highest frequency of a high risk of bias, with all studies rated as high risk in this domain. According to the PROBAST assessment tool, an EPV≥20 is commonly used as a heuristic to indicate an adequate sample size for developing prediction models. In this review, 13 studies had an EPV <20, which may suggest an increased risk of bias related to model overfitting. However, relying solely on fixed rules of thumb may be insufficient; therefore, future studies should prioritize formal, model-tailored sample size calculations (eg, approaches proposed by Riley et al [74]) to ensure precise estimation and adequate statistical power. During the predictor selection process, several studies relied solely on univariate screening, which often fails to identify confounding factors and can lead to model overfitting. Therefore, predictor selection should not solely depend on univariate screening but should also be combined with clinical practice. Moreover, an increasing number of studies are using least absolute shrinkage and selection operator (LASSO) regression to handle high-dimensional data and select potential variables. By introducing a penalty term, LASSO regression reduces the estimates of extreme variables, thereby effectively enhancing the accuracy of model estimation and decreasing the likelihood of overfitting [75]. Moreover, the handling of missing data was suboptimal. Cases with missing data were directly excluded in 12 studies, which can introduce bias and reduce statistical power, while 9 studies failed to report how missing data were handled. A minority of studies used appropriate methods such as multiple imputation. Accurate data reporting and careful handling of missing observations help reduce model overfitting. It is recommended that future studies strengthen the management of missing data to ensure the integrity of the study. When dealing with continuous variables, transforming continuous data into categorical variables for modeling may lead to a significant loss of model efficacy [76]; however, 10 studies in our review performed such transformations without reporting standard definitions. Although data transformation can be considered to enhance the convenience of application for researchers during the clinical dissemination phase, it should be done with caution during development. Crucially, the majority of models lacked external validation. Although they demonstrated good discrimination in derivation cohorts, their performance remains inherently tied to their specific development settings.

Therefore, the primary implication of this review is methodological: Rather than endorsing specific existing tools for immediate clinical use, we emphasize the urgent need for better-designed research. Future studies must strictly adhere to TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) guidelines and PROBAST standards—specifically ensuring adequate sample sizes, appropriate handling of continuous variables and missing data, and rigorous external validation in independent, geographically distinct populations—to develop robust and transportable prediction models.

Limitations

This study has several limitations that warrant consideration. First, the predominance of cross-sectional and retrospective designs among the included studies restricts the scope of these models to the diagnostic identification of prevalent frailty. Consequently, they function as concurrent screening tools rather than prognostic instruments for predicting future incidence, precluding the ability to infer causal relationships between predictors and outcomes. Second, according to the PROBAST assessment, all included studies exhibited a high risk of bias, particularly within the analysis domain. This methodological weakness likely results in overoptimistic performance estimates and limits the transportability of these models to diverse clinical populations. Third, substantial statistical heterogeneity (I²) was observed, stemming from variations in study design and modeling methodologies. We calculated 95% PIs to provide a more clinically meaningful estimate of the expected range of model performance in future settings. Finally, all included studies were conducted in China, and the review was restricted to English and Chinese literature. Although this reflects the rapid emergence of this research focus within China, the lack of geographic and ethnic diversity limits the generalizability of our findings to other global populations and health care systems.

Conclusion

This review provides the first comprehensive synthesis of models for risk stratification of physical and cognitive frailty in older adults with diabetes. The findings indicate that existing models demonstrate satisfactory pooled discriminative performance. Specifically, although the CIs confirm a robust average effect, the 95% PIs indicate that the distribution of predictive performance in future real-world settings is expected to vary across different clinical contexts, yet likely remaining within an acceptable range. Nevertheless, their clinical utility is currently constrained by significant methodological limitations. Specifically, the identified models rely heavily on readily available clinical and psychosocial predictors, such as depression, age, regular exercise, and social activity, suggesting that early risk stratification is feasible in routine practice. However, the evidence is underpinned by a pervasive high risk of bias, primarily due to analytical shortcomings, small sample sizes, and a lack of rigorous external validation. Furthermore, the predominance of cross-sectional designs and the geographic restriction of studies to China limit the generalizability of these tools to broader global populations and their ability to function as true prognostic instruments for future risk. Consequently, although current models show promise for screening and identifying prevalent physical and cognitive frailty, they are not yet sufficiently robust for widespread deployment in diverse clinical settings. Future research must pivot from developing new, redundant models to conducting robust, prospective, multicenter studies that adhere strictly to TRIPOD guidelines. Emphasis should be placed on external validation and the development of longitudinal prognostic tools to ensure reliable, transportable, and clinically actionable risk stratification for this vulnerable population.

Acknowledgments

The authors declare that generative artificial intelligence (AI) was not used in the creation of this manuscript.

Funding

This study was funded by grants from the nursing research project of Sichuan Province (number H23003) and the research project of the Science and Technology Department of Sichuan Province (number 2024ZYD0338). The funders played no part in the study design, data collection, analysis, interpretation of the results, or the writing of the manuscript.

Disclaimer

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Data Availability

The datasets generated or analyzed during this study are available from the corresponding author on reasonable request.

Authors' Contributions

Conceptualization: XW

Data curation: LL, RZ, QJ

Formal analysis: XW, XX, HC

Funding acquisition: YL, RZ, RG

Methodology: XW, SM, XX, RG

Project administration: RG

Resources: LL

Software: QJ

Writing - original draft: XW, XX

Writing - review & editing: XW, XM, XX, LL, HC, YL, RZ, QJ, SL, RG, SM

Conflicts of Interest

None declared.

Multimedia Appendix 1

Inclusion criteria, search terms, performance measures, and assessments of bias.

DOCX File, 495 KB

Checklist 1

PRISMA checklist.

PDF File, 106 KB

  1. Genitsaridi I, Salpea P, Salim A, et al. 11th edition of the IDF Diabetes Atlas: global, regional, and national diabetes prevalence estimates for 2024 and projections for 2050. Lancet Diabetes Endocrinol. Feb 2026;14(2):149-156. [CrossRef] [Medline]
  2. Kobayashi E, Linden-Santangeli NJ, Chan N, et al. Longitudinal metabolic trajectories in diabetes prevention program participants reveal subgroups with varying micro- and macrovascular complication risks. Diabetes Care. Oct 1, 2025;48(10):1704-1712. [CrossRef] [Medline]
  3. Xu X, Wang F, Liao S, Liu J, Xiao L. Tele-cognitive behavioral therapy for the treatment of diabetes-related distress in individuals With diabetes mellitus: systematic review and meta-analysis of randomized controlled trials. J Med Internet Res. Dec 24, 2025;27:e80476. [CrossRef] [Medline]
  4. Ong KL, Stafford LK, McLaughlin SA. Global, regional, and national burden of diabetes from 1990 to 2021, with projections of prevalence to 2050: a systematic analysis for the Global Burden of Disease Study 2021. Lancet. Jul 15, 2023;402(10397):203-234. [CrossRef] [Medline]
  5. O’Neil H, Todd A, Pearce M, Husband A. What are the consequences of over and undertreatment of type 2 diabetes mellitus in a frail population? A systematic review. Endocrinol Diabetes Metab. Mar 2024;7(2):e00470. [CrossRef] [Medline]
  6. Clegg A, Young J, Iliffe S, Rikkert MO, Rockwood K. Frailty in elderly people. The Lancet. Mar 2013;381(9868):752-762. [CrossRef]
  7. Wu Y, Xiong T, Tan X, Chen L. Frailty and risk of microvascular complications in patients with type 2 diabetes: a population-based cohort study. BMC Med. Dec 8, 2022;20(1):473. [CrossRef] [Medline]
  8. Hanlon P, Fauré I, Corcoran N, et al. Frailty measurement, prevalence, incidence, and clinical implications in people with diabetes: a systematic review and study-level meta-analysis. Lancet Healthy Longev. Dec 2020;1(3):e106-e116. [CrossRef] [Medline]
  9. Kong LN, Lyu Q, Yao HY, Yang L, Chen SZ. The prevalence of frailty among community-dwelling older adults with diabetes: a meta-analysis. Int J Nurs Stud. Jul 2021;119:103952. [CrossRef] [Medline]
  10. Huang ST, Chen LK, Hsiao FY. Clinical impacts of frailty on 123,172 people with diabetes mellitus considering the age of onset and drugs of choice: a nationwide population-based 10-year trajectory analysis. Age Ageing. Jul 1, 2023;52(7):afad128. [CrossRef] [Medline]
  11. Walston J, McBurnie MA, Newman A, et al. Frailty and activation of the inflammation and coagulation systems with and without clinical comorbidities: results from the Cardiovascular Health Study. Arch Intern Med. Nov 11, 2002;162(20):2333-2341. [CrossRef] [Medline]
  12. Zhang X, Bo Y, Li Z, et al. Association between frailty and cognitive function: a pooled analysis of three ageing cohorts. Transl Psychiatry. Nov 18, 2025;15(1):486. [CrossRef]
  13. Li CL, Stanaway FF, Lin JD, Chang HY. Frailty and health care use among community-dwelling older adults with diabetes: a population-based study. Clin Interv Aging. 2018;13:2295-2300. [CrossRef] [Medline]
  14. Ferri-Guerra J, Aparicio-Ugarriza R, Salguero D, et al. The association of frailty with hospitalizations and mortality among community dwelling older adults with diabetes. J Frailty Aging. 2020;9(2):94-100. [CrossRef] [Medline]
  15. Wang X, Chen Z, Li Z, et al. Association between frailty and risk of fall among diabetic patients. Endocr Connect. Oct 2020;9(10):1057-1064. [CrossRef] [Medline]
  16. Santulli G, Sabatelli G, Wang B, et al. Interplay between frailty and cardiometabolic disorders: from pathophysiology to clinical implications. Cardiovasc Diabetol. Dec 8, 2025;25(1):1. [CrossRef] [Medline]
  17. Zhang RH, Wang J, Wang Y, et al. Frailty and transitions across cardiometabolic disease states: evidence from multistate models in a 16-year Chinese cohort. NPJ Aging. Nov 29, 2025;12(1):4. [CrossRef] [Medline]
  18. Fan J, Yu C, Guo Y, et al. Frailty index and all-cause and cause-specific mortality in Chinese adults: a prospective cohort study. Lancet Public Health. Dec 2020;5(12):e650-e660. [CrossRef] [Medline]
  19. Haapanen MJ, Jansson Sigfrids F, Ylinen A, et al. Frailty outperforms conventional risk factors for predicting complications and death in type 1 diabetes. Diabetes Res Clin Pract. Dec 2025;230:112984. [CrossRef] [Medline]
  20. Guo X, Pei J, Ma Y, et al. Cognitive frailty as a predictor of future falls in older adults: a systematic review and meta-analysis. J Am Med Dir Assoc. Jan 2023;24(1):38-47. [CrossRef] [Medline]
  21. Ren M, Guo H, Guo Y, Guo W, Zhu L. The risk prediction models for cognitive frailty in the older people in China: a systematic review and meta-analysis. BMC Geriatr. May 22, 2025;25(1):365. [CrossRef] [Medline]
  22. Si H, Zhang Y, Zhao P, et al. Bidirectional relationship between diabetes and frailty in middle-aged and older adults: a systematic review and meta-analysis. Arch Gerontol Geriatr. Aug 2025;135:105880. [CrossRef] [Medline]
  23. Cheng M, He M, Ning L, et al. The impact of frailty on clinical outcomes among older adults with diabetes: a systematic review and meta-analysis. Medicine (Abingdon). 2024;103(26):e38621. [CrossRef]
  24. Sinclair AJ, Pennells D, Abdelhafiz AH. Hypoglycaemic therapy in frail older people with type 2 diabetes mellitus-a choice determined by metabolic phenotype. Aging Clin Exp Res. Sep 2022;34(9):1949-1967. [CrossRef] [Medline]
  25. Ji CH, Huang XQ, Li Y, Muheremu A, Luo ZH, Dong ZH. The relationship between physical activity, nutritional status, and sarcopenia in community- dwelling older adults with type 2 diabetes: a cross-sectional study. BMC Geriatr. Jun 7, 2024;24(1):506. [CrossRef] [Medline]
  26. Sinclair AJ, Abdelhafiz AH. Metabolic impact of frailty changes diabetes trajectory. Metabolites. Feb 16, 2023;13(2):295. [CrossRef] [Medline]
  27. Zhong W, Huang W, Deng H, Qiu S, Yang Q, Jia H. A randomized controlled trial to assess the efficacy of standardized tai chi in prefrail older adults with immunosenescence: design and protocol. BMC Complement Med Ther. Jan 3, 2025;25(1):1. [CrossRef] [Medline]
  28. Sobrinho ACDS, de Paula Venancio RC, da Silva Rodrigues G, et al. Systematic review of interventions for pre-frail and frail older adults: evidence from clinical trials on frailty levels. Arch Gerontol Geriatr. Jul 2025;134:105851. [CrossRef] [Medline]
  29. Page MJ, Moher D, Bossuyt PM, et al. PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. BMJ. Mar 29, 2021;372:n160. [CrossRef] [Medline]
  30. McInnes MDF, Moher D, Thombs BD, et al. Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: the PRISMA-DTA statement. JAMA. Jan 23, 2018;319(4):388-396. [CrossRef] [Medline]
  31. Rethlefsen ML, Kirtley S, Waffenschmidt S, et al. PRISMA-S: an extension to the PRISMA statement for reporting literature searches in systematic reviews. Syst Rev. Jan 26, 2021;10(1):39. [CrossRef] [Medline]
  32. Wolff RF, Moons KGM, Riley RD, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med. Jan 1, 2019;170(1):51-58. [CrossRef] [Medline]
  33. Moons KGM, Altman DG, Reitsma JB, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. Jan 6, 2015;162(1):W1-W73. [CrossRef] [Medline]
  34. Moons KGM, de Groot JAH, Bouwmeester W, et al. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist. PLoS Med. Oct 2014;11(10):e1001744. [CrossRef] [Medline]
  35. Moons KGM, Wolff RF, Riley RD, et al. PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Ann Intern Med. Jan 1, 2019;170(1):W1-W33. [CrossRef] [Medline]
  36. Šimundić AM. Measures of diagnostic accuracy: basic definitions. EJIFCC. Jan 2009;19(4):203-211. [Medline]
  37. Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ. Sep 6, 2003;327(7414):557-560. [CrossRef] [Medline]
  38. IntHout J, Ioannidis JPA, Borm GF. The Hartung-Knapp-Sidik-Jonkman method for random effects meta-analysis is straightforward and considerably outperforms the standard DerSimonian-Laird method. BMC Med Res Methodol. Feb 18, 2014;14:25. [CrossRef] [Medline]
  39. Egger M, Davey Smith G, Schneider M, Minder C. Bias in meta-analysis detected by a simple, graphical test. BMJ. Sep 13, 1997;315(7109):629-634. [CrossRef] [Medline]
  40. Bu F, Deng XH, Zhan NN, et al. Development and validation of a risk prediction model for frailty in patients with diabetes. BMC Geriatr. Mar 27, 2023;23(1):172. [CrossRef] [Medline]
  41. Cheng YM. Construction of a predictive model for the risk of frailty in hospitalized elderly patients with diabetes mellitus [Master’s thesis]. Gannan Medical University; 2024. URL: https:/​/oversea.​cnki.net/​KCMS/​detail/​detail.​aspx?dbcode=CMFD&dbname=CMFD202501&filename=1024557771.​nh&uniplatform=OVERSEA&v=aCGchnfEAAyXVFPDXN9Dm_iJvJYtdsledsnnFwYOOKq65nJCZmCdm-7Y3NSbCg_U [Accessed 2026-01-22] [CrossRef]
  42. Dang X. Study on the construction of a model for predicting the risk of debility in hospitalized elderly patients with diabetes mellitus [Master’s thesis]. Changchun University of Chinese Medicine; 2024. URL: https:/​/oversea.​cnki.net/​KCMS/​detail/​detail.​aspx?dbcode=CMFD&dbname=CMFD202501&filename=1024551416.​nh&uniplatform=OVERSEA&v=saoWsbwFh_hMUrvE2hYKMbmPvq6dTUrm6EUZP7P8Emfni9qewTW8gjWKg0liRN7b [Accessed 2026-01-22] [CrossRef]
  43. Deng YH, Li N, Wang Y, Xiong C, Zou X. Risk factors and prediction nomogram of cognitive frailty with diabetes in the elderly. Diabetes Metab Syndr Obes. 2023;16:3175-3185. [CrossRef] [Medline]
  44. Dong XT, Wang XH, Sheng YH, Wang GP, Gu T. Construction of frailty risk prediction model in elderly patients with diabetic retinopathy. Journal of Gannan Medical University. 2023;43(12):1275-1281. [CrossRef]
  45. Du J, Zhang D, Chen Y, Zhang W. Development of a prediction model for frailty among older Chinese individuals with type 2 diabetes residing in the community. Public Health Nurs. 2024;41(6):1271-1280. [CrossRef] [Medline]
  46. Liang MY, Li R, Feng L, Qian WP. Construction and verification of a risk prediction model for cognitive frailty in older patients with chronic obstructive pulmonary disease and diabetes mellitus. J Int Med Res. Sep 2024;52(9):3000605241274211. [CrossRef] [Medline]
  47. Liu XX, Fan XZ. Establishment and validation of a prediction model for cognitive frailty in elderly patients with type 2 diabetes mellitus. Anhui Medical Journal. 2024;45(12):1543-1548. [CrossRef]
  48. Liu Y. Construction and verification of cognitive frailty risk prediction model in elderly patients with type 2 diabetes [Master’s thesis]. Tianjin University of Traditional Chinese Medicine; 2023. URL: https:/​/oversea.​cnki.net/​KCMS/​detail/​detail.​aspx?dbcode=CMFD&dbname=CMFD202501&filename=1023871608.​nh&uniplatform=OVERSEA&v=bMx3cCr3OPvyjITKNyqnW0JcHM8BIfzLFhPkowxsujZYch6jIjA1m7lsencsQrYF [Accessed 2026-01-22] [CrossRef]
  49. Ma SM, Ni LL, Guo L, Gu LP. Construction of a risk prediction model for cognitive decline in elderly patients with diabetic foot ulcers. Psychological Monthly. 2025;20(2):52-54. URL: https://www.xlykzz.com/CN/10.19738/j.cnki.psy.2025.02.014 [Accessed 2026-01-23] [CrossRef]
  50. Meng L. Research on the influencing factors of cognitive frailty and the construction of risk prediction model in elderly diabetic patients [Master’s thesis]. Zunyi Medical University; 2023. URL: https:/​/oversea.​cnki.net/​KCMS/​detail/​detail.​aspx?dbcode=CMFD&dbname=CMFD202501&filename=1024333258.​nh&uniplatform=OVERSEA&v=OQcVHZS3INx0BTBGPkZnRQy4Deev1Fhcs49_DXQJd8j1MnacoqsS81_n7RhEcusZ [Accessed 2026-01-26] [CrossRef]
  51. Tang QF, Wang JJ, Zhao HY, Wang HY. Establishment and validation of a frailty risk prediction model for elderly patients with diabetes mellitus. Chinese Nursing Research. 2024;38(17):3065-3071. [CrossRef]
  52. Wang BJ, Liang Q, Liu Y, Cheng Y, Zhang CM. Construction and validation of frailty risk prediction model in elderly patients with diabetic foot. Mil Nurs. 2024;41(5):6-10. [CrossRef]
  53. Wang SJ, Tan TT, Wang QQ, et al. Construction of a risk prediction model for cognitive decline in elderly hospitalized patients with type 2 diabetes based on decision tree. Sichuan Medical Journal. 2025;46(2):204-210. [CrossRef]
  54. Wang XW, Xu YL. Prediction of cognitive decline among elderly patients with type 2 diabetes mellitus. China Preventive Medicine Journal. 2023;35(12):1037-1042. [CrossRef]
  55. Wang Z, Zheng HF, Liang LL. Analysis of risk factors of elderly patients with type 2 diabetes complicated with frailty and establishment of prediction model. Int J Geriatr. 2025;46(2):162-168. [CrossRef]
  56. Wang ZJ, Chen JY, Li MX. Establishment of a nomogram predictive model based on serum leptin-to-adiponectin ratio for cognitive frailty in elderly patients with type 2 diabetes mellitus. Shandong Medical Journal. 2023;63(34):31-37. [CrossRef]
  57. Xi MX. Construction of prediction model for frailty in hospitalized elderly patients with diabetes mellitus [Master’s thesis]. Yan’an University; 2024. URL: https:/​/oversea.​cnki.net/​KCMS/​detail/​detail.​aspx?dbcode=CMFD&dbname=CMFD202501&filename=1024455798.​nh&uniplatform=OVERSEA&v=S5OEfMlXAp2MuMsWXc_GQtzuxBN-77f0i0NyJbd51w8xQlUaiOwjOVtxUElm5mcS [Accessed 2026-01-22] [CrossRef]
  58. Yin YY. Construction and verification of frailty risk prediction model in elderly diabetic patients [Master’s thesis]. Hebei University; 2024. URL: https:/​/oversea.​cnki.net/​KCMS/​detail/​detail.​aspx?dbcode=CMFD&dbname=CMFD202501&filename=1024530021.​nh&uniplatform=OVERSEA&v=SApAmgAzcplPDX5BRNmZPhLMUnWViopteNW4E_KeL-FHGr3Rn323ES_7vpnrm8Sb [Accessed 2026-01-22] [CrossRef]
  59. Zhang YJ, Tian XF, Zhang H, Tang ZY, Zhang ZY. Influencing factors of cognitive decline in elderly patients with type 2 diabetes mellitus and hypertension and construction of nomogram model for predicting its risk. Practical Journal of Cardiac Cerebral Pneumal and Vascular Disease. 2024;32(12):49-54. URL: https://www.sciengine.com/cfs/files/pdfs/view/1008-5971/CEE14B30ABC34483AABDA493EE7A72AF.pdf [Accessed 2026-01-26] [CrossRef]
  60. Zheng XM. Research on the construction of frailty prediction model for elderly hospitalized patients with diabetes based on machine learning algorithm. Yangtze University; 2024. URL: https:/​/oversea.​cnki.net/​KCMS/​detail/​detail.​aspx?dbcode=CMFD&dbname=CMFD202501&filename=1024659916.​nh&uniplatform=OVERSEA&v=iU31hKCfxDkcQQIIuCxhedEIj2mwaaiQ5ki4TyGPmRJytm6uY7RU4TpTMk0kFlYT [Accessed 2026-01-22] [CrossRef]
  61. Xiao RF, Wang R, Xu L, Xu MJ. Risk factors for the development of frailty in Chinese elderly diabetic patients and the predictive value of the nomogram model. Journal of Lanzhou University (Medical Sciences). 2025;51(10):20-25. [CrossRef]
  62. Wu JQ, Fang SZ, Li K, et al. Construction and validation of frailty risk prediction model for hospitalized elderly patients with type 2 diabetes based on machine learning and SHAP explainabilility. Journal of Nursing (China). 2025;32(11):7-12. [CrossRef]
  63. Yu Q, Yu H. Development and validation of a risk prediction model for cognitive frailty in elderly patients with type 2 diabetes mellitus. J Clin Nurs. Aug 2025;34(8):3261-3275. [CrossRef] [Medline]
  64. Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. Jun 2019;110:12-22. [CrossRef] [Medline]
  65. Chu W, Chang SF, Ho HY, Lin HC. The relationship between depression and frailty in community-dwelling older people: a systematic review and meta-analysis of 84,351 older adults. J Nurs Scholarsh. Sep 2019;51(5):547-559. [CrossRef] [Medline]
  66. Borges MK, Aprahamian I, Romanini CV, et al. Depression as a determinant of frailty in late life. Aging Ment Health. Dec 2021;25(12):2279-2285. [CrossRef] [Medline]
  67. Wang Q, Wang J, Dai G. Prevalence, characteristics, and impact on health outcomes of frailty in elderly outpatients with diabetes: a cross-sectional study. Medicine (Abingdon). 2023;102(47):e36187. [CrossRef]
  68. Ferrucci L, Fabbri E. Inflammageing: chronic inflammation in ageing, cardiovascular disease, and frailty. Nat Rev Cardiol. Sep 2018;15(9):505-522. [CrossRef] [Medline]
  69. Zeng M, Li Y, Zhu Y, Sun Y. Inflammatory markers and clinical factors as key independent risk factors for frailty: a retrospective study. BMC Geriatr. Jun 4, 2025;25(1):404. [CrossRef] [Medline]
  70. Straub RH. Interaction of the endocrine system with inflammation: a function of energy and volume regulation. Arthritis Res Ther. Feb 13, 2014;16(1):203. [CrossRef] [Medline]
  71. Sutkowy P, Woźniak A, Mila-Kierzenkowska C, et al. Physical activity vs. redox balance in the brain: brain health, aging and diseases. Antioxidants (Basel). Dec 30, 2021;11(1):95. [CrossRef] [Medline]
  72. Wen L, Lu Y, Li X, An Y, Tan X, Chen L. Association of frailty and pre-frailty with all-cause and cardiovascular mortality in diabetes: three prospective cohorts and a meta-analysis. Ageing Res Rev. Apr 2025;106:102696. [CrossRef] [Medline]
  73. Lyu Q, Guan CX, Kong LN, Zhu JL. Prevalence and risk factors of cognitive frailty in community-dwelling older adults with diabetes: a systematic review and meta-analysis. Diabet Med. Jan 2023;40(1):e14935. [CrossRef] [Medline]
  74. Riley RD, Ensor J, Snell KIE, et al. Calculating the sample size required for developing a clinical prediction model. BMJ. Mar 18, 2020;368:m441. [CrossRef] [Medline]
  75. Zarringhalam K, Degras D, Brockel C, Ziemek D. Robust phenotype prediction from gene expression data using differential shrinkage of co-regulated genes. Sci Rep. Jan 19, 2018;8(1):1237. [CrossRef] [Medline]
  76. Rhemtulla M, Brosseau-Liard PÉ, Savalei V. When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions. Psychol Methods. Sep 2012;17(3):354-373. [CrossRef] [Medline]


AUC: area under the curve
CHARMS: Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modeling Studies
CNKI: China National Knowledge Infrastructure
EPV: events per variable
FN: false negative
FP: false positive
FRAIL: Fatigue, Resistance, Ambulation, Illnesses, and Loss of Weight
IDF: International Diabetes Federation
LASSO: least absolute shrinkage and selection operator
LR: logistic regression
MeSH: Medical Subject Headings
ML: machine learning
PI: prediction interval
PITROS: Participants, Index Test, Target Conditions, Reference Standard, Outcomes, Settings
PRESS: Peer Review of Electronic Search Strategies
PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses
PROBAST: Prediction Model Study Risk Of Bias Assessment Tool
TN: true negative
TP: true positive
TRIPOD: Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis


Edited by Stefano Brini; submitted 23.Oct.2025; peer-reviewed by Abhishek Shivanna, Chen-Yang Su; final revised version received 04.Jan.2026; accepted 05.Jan.2026; published 29.Jan.2026.

Copyright

© Xia Wang, Shujie Meng, Xiang Xiao, Liu Lu, Hongyan Chen, Yong Li, Rong Zhang, Qiwu Jiang, Shan Liu, Ru Gao. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 29.Jan.2026.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.