JMIR - Open Peer-Review: A Multidimensional Comparative Study of Multiple Mainstream Large Language Models in Perioperative Consultation for Hypospadias Surgery, and other submissions

Home
Open Peer-Review: A Multidimensional Comparative Study of Multiple Mainstream Large Language Models in Perioperative Consultation for Hypospadias Surgery, and other submissions

Latest Submissions Open for Peer Review

JMIR has been a leader in applying openness, participation, collaboration and other "2.0" ideas to scholarly publishing, and since December 2009 offers open peer review articles, allowing JMIR users to sign themselves up as peer reviewers for specific articles currently considered by the Journal (in addition to author- and editor-selected reviewers).

For a complete list of all submissions across all JMIR journals as well as partner journals, see JMIR Preprints

Note that this is a not a complete list of submissions as authors can opt-out. The list below shows recently submitted articles where submitting authors have not opted-out of open peer-review and where the editor has not made a decision yet. (Note that this feature is for reviewing specific articles - if you just want to sign up as reviewer (and wait for the editor to contact you if articles match your interests), please sign up as reviewer using your profile).

To assign yourself to an article as reviewer, you must have a user account on this site (if you don't have one, register for a free account here) and be logged in (please verify that your email address in your profile is correct).

Add yourself as a peer reviewer to any article by clicking the '+Peer-review Me!+' link under each article. Full instructions on how to complete your review will be sent to you via email shortly after. Do not sign up as peer-reviewer if you have any conflicts of interest (note that we will treat any attempts by authors to sign up as reviewer under a false identity as scientific misconduct and reserve the right to promptly reject the article and inform the host institution).

The standard turnaround time for reviews is currently 2 weeks, and the general aim is to give constructive feedback to the authors and/or to prevent publication of uninteresting or fatally flawed articles. Reviewers will be acknowledged by name if the article is published, but remain anonymous if the article is declined.

The abstracts on this page are unpublished studies - please do not cite them (yet). If you wish to cite them/wish to see them published, write your opinion in the form of a peer-review!

Tip: Include the RSS feed of the JMIR submissions on this page on your homepage, blog, or desktop RSS reader to stay informed about current submissions!

↑ Grab this Headline Animator

If you follow us on Twitter, we will also announce new submissions under open peer-review there.

Titles/Abstracts of Articles Currently Open for Review:

A Multidimensional Comparative Study of Multiple Mainstream Large Language Models in Perioperative Consultation for Hypospadias Surgery
Date Submitted: Feb 12, 2026

Open Peer Review Period: Feb 13, 2026 - Apr 10, 2026
Peer Review Me
Background: Hypospadias is a common congenital malformation requiring surgical correction, with caregivers facing significant perioperative information needs. Large language models (LLMs) offer a potential solution for health education, yet their performance in pediatric urological consultations remains unexplored. Objective: This study aimed to evaluate the performance of five LLMs—ChatGPT-4o, Gemini-2.5-Pro, OpenEvidence, Zhipu Qingyan, and DeepSeek—in addressing core concerns of caregivers during the perioperative period for children with hypospadias, and to assess their application value and limitations in pediatric urological clinical health education. Methods: A prospective, non-interventional, cross-sectional design was employed. A question bank was developed based on literature and clinical practice, with 10 high-priority questions selected via questionnaire screening for the test set. Seven experts (6 dimensions) and 32 caregivers (4 dimensions) were prospectively recruited to evaluate responses using a double-blind forced-order ranking method (reverse scoring from 1 to 5), with reference authenticity being verified. Results: Significant differences existed across dimensions among the five models (P<.001). Gemini-2.5-Pro demonstrated overall superior performance, ranking first in both expert evaluations (median 5.0 [IQR 4.0-5.0]) and caregiver evaluations (median 4.0 [IQR 3.0-5.0]), with outstanding structural capabilities; DeepSeek ranked second (median 4.0), demonstrating relatively consistent ratings across socioeconomic strata and superior emotional support compared to ChatGPT-4o (P=.001); OpenEvidence scored lowest (nearly 50% “poor” ratings), exhibiting poor readability but reliable evidence sources. Conclusions: Gemini-2.5-Pro offers the most comprehensive quality for perioperative hypospadias consultations. DeepSeek and Zhipu Qingyan demonstrate strong rapport in home care guidance but require strict control of literature hallucination risks. Perioperative care for hypospadias should adopt a tiered human-machine collaboration model based on clinical risk stratification, balancing communication efficiency and health education safety.
Smartphone-based assessment of physical activity and cardiovascular health: Findings from the Dutch MyHeart Counts study
Date Submitted: Feb 12, 2026

Open Peer Review Period: Feb 13, 2026 - Apr 10, 2026
Peer Review Me
Background: Fitness and physical activity patterns are key predictors of cardiovascular disease. Traditionally, these factors have been assessed through participant self-report, which is prone to recall bias and inaccuracy. Smartphone-based monitoring provides a scalable and objective alternative for measuring physical activity, offering improved accuracy over conventional assessment methods. Objective: To evaluated the feasibility of smartphone-based cardiovascular research in the Netherlands and to examine associations between objectively measured physical activity, perceived activity, functional capacity, life satisfaction, and cardiovascular risk. Methods: Adults in the Netherlands were recruited via the MyHeart Counts iPhone app between August 2022 and December 2023. Within the app, participants completed surveys, passively shared motion sensor data, and were invited to perform a smartphone-based 6-minute walk test (6MWT). Perceived activity was compared with sensor-measured activity and actual activity (sensor-measured with supplemented self-reported unrecorded activity). Multivariable linear regression assessed associations between activity and 6MWT performance and between activity and life satisfaction. Perceived cardiovascular risk was compared with the difference between heart age and actual age. Results: Of 518 enrolled participants (median age 58 years; 72% female), 93% shared data beyond demographics. Median engagement duration was 27 days, and 58% completed at least one full consecutive week of motion tracking. Perceived activity weakly correlated with both sensor-measured activity (ρ = 0.15, P = .01) and with actual activity (ρ = 0.15, P = .01). Median perceived activity was 3.5 hours/week, significantly higher than sensor-measured activity (0.9 hours/week; mean difference 2.9 hours, 95% CI 2.2–3.7; P < .001). In contrast, median actual activity was 3.2 hours/week and did not differ significantly from perceived activity (mean difference 0.7 hours, 95% CI −0.2 to 1.6; P = .11), indicating no significant over- or underestimation when unrecorded activity was accounted for. Sensor-measured physical activity was associated with longer 6MWT distance (+10.1 m per hour; 95% CI 3.9-16.4, P = 0.002). No association was observed between sensor-measured activity and life satisfaction. Perceived cardiovascular risk correlated with the difference between heart age and actual age (ρ = 0.41; P < 0.001). Conclusions: Smartphone-based cardiovascular monitoring is feasible in a European adult population and yields valid functional correlates of physical activity. However, incomplete phone carriage substantially limits sensor-only activity estimates, underscoring the need for hybrid measurement strategies. These findings support the use of smartphone platforms for scalable cardiovascular research, while highlighting persistent challenges in engagement and measurement completeness.
Impact of Large Language Model-Generated versus Clinician-Generated Advice on Resuscitation Preferences and Chinese-Language Readability in Advanced Cancer Patients in the Emergency Department: A Randomised Controlled Trial
Date Submitted: Feb 12, 2026

Open Peer Review Period: Feb 13, 2026 - Apr 10, 2026
Peer Review Me
Background: For patients with advanced cancer in the emergency department (ED), decisions regarding life-sustaining treatments (LST) are critical and hinge on clear communication of complex prognoses. While large language models (LLMs) can synthesize clinical information, their comparative effectiveness against clinicians in shaping real patient preferences, and the readability of their outputs, remain unproven. Objective: This study aimed to determine if LLM-generated advice is non-inferior to clinician-generated advice in changing patient resuscitation preferences. Secondarily, we compared the Chinese-language readability of the advice using a validated formula with a clinical cutoff and assessed patient satisfaction. Methods: We conducted a three-arm, parallel, randomized controlled non-inferiority trial. 189 adult patients with advanced cancer in the ED were assigned to review structured advice generated by: (1) a senior clinician, (2) ChatGPT-5.0 Mini, or (3) DeepSeek. The primary outcome was the change in score on the Cancer Advanced Care Preferences Scale. Secondary outcomes included text readability score (assessed by a validated Chinese health literacy formula) and patient satisfaction. Results: A total of 189 participants were enrolled and completed the study. In the primary non-inferiority analysis, the change in resuscitation preference scores for the DeepSeek group was non-inferior to that of the clinician group (mean difference: -0.095 points, 95% CI: -0.750 to 0.560; lower limit > -1.7 margin). Similarly, ChatGPT-5.0 Mini was also non-inferior to the clinician group (mean difference: 0.349 points, 95% CI: -0.237 to 0.935; lower limit > -1.7 margin). Regarding secondary outcomes, a significant difference in readability was found among the three groups (Kruskal-Wallis H(2)=129.36, p<0.001). Post-hoc comparisons indicated that texts from DeepSeek had the highest median readability score (7.53, IQR: 7.39-7.62), followed by ChatGPT-5.0 Mini (5.93, IQR: 5.60-6.23), and clinician-generated texts (5.51, IQR: 5.29-5.74), with all pairwise differences being significant (p<0.001). However, no significant difference in patient satisfaction was observed across the groups (H(2)=1.10, p=.578). Conclusions: LLM-generated advice was non-inferior to clinician advice in influencing resuscitation preferences. Its superior readability and higher patient satisfaction highlight the potential of LLMs as a scalable tool to support complex decision-making in time-pressured ED settings.
Patient-generated health data in lung cancer symptom management and health promotion: clinical practice, challenges and future directions
Date Submitted: Feb 12, 2026

Open Peer Review Period: Feb 13, 2026 - Apr 10, 2026
Peer Review Me
Patient-generated health data (PGHD) refers to health-related information collected by patients themselves, serving as a vital supplement to traditional clinical data. In the era of big data, the potential of PGHD in the long-term management of chronic diseases and cancer is increasingly recognised, with its clinical application becoming a key issue in the digital health field. The proliferation of smart devices and wearable technology, improvements in sensor performance, and rapid advancements in artificial intelligence have made the collection of PGHD more convenient. Existing clinical evidence preliminarily indicates that PGHD may alleviate symptom burden in lung cancer patients and enhance the quality of cancer care. However, significant challenges remain in effectively integrating PGHD with clinical data, conducting reliable analyses of vast PGHD datasets, and ultimately incorporating it into routine clinical practice. Furthermore, regulatory bodies, healthcare institutions, and device manufacturers must collaboratively establish policies and standards to safeguard patient data security and privacy. While leveraging digital tools for PGHD collection, attention must also be paid to economic costs and technical barriers to broaden coverage and promote health equity. The potential and application models of PGHD in the long-term management of lung cancer patients warrant further exploration. Against this backdrop, this paper proposes a WeChat Official Account-based model for PGHD collection and remote management, aimed at implementing sustainable symptom monitoring and health guidance for lung cancer patients. This approach seeks to advance the widespread clinical application of PGHD and further explore its potential value in promoting patient self-management and improving quality of life.
Attitudes and Needs of Healthcare Providers Toward Artificial Intelligence-Assisted Pediatric Palliative Care: A Mixed-Methods Study
Date Submitted: Feb 12, 2026

Open Peer Review Period: Feb 12, 2026 - Apr 9, 2026
Peer Review Me
Background: While AI's transformative potential in healthcare is widely acknowledged, its application in highly sensitive, humanistic domains like PPC remains largely unexplored. Objective: To explore the attitudes and needs of healthcare providers on the pediatric palliative care (PPC) assisted by artificial intelligence (AI), with the goal of informing future development and implementation of AI systems in this field. Methods: This was an explanatory sequential mixed-methods study consisting of a nationwide cross-sectional questionnaire survey (March–April 2025) followed by qualitative semi-structured interviews (August–October 2025). The quantitative study aimed to investigate PPC healthcare providers' experiences, attitudes, and needs for the application of AI. Participants included team members of all recognized PPC teams in mainland China. The qualitative study aimed to explore in greater depth the potential future roles of AI in this field, as well as the features of an ideal AI-assisted tool for PPC. Potential interviewees were recruited from the pool of quantitative survey respondents. Results: Among 352 survey respondents, most (58.24%) reported moderate familiarity with AI, with large language models being the most commonly used (79.55%). Among large language model users, over half (57.50%) reported using them for clinical purposes. Attitudes were generally positive: 67.05% believed AI's benefits would outweigh drawbacks, and 78.98% considered its implementation feasible. The most desired applications were patient/family education (78.41%) and symptom management (73.01%). Interviews with 17 providers revealed three themes: (1) clinical roles and boundaries; (2) elements for clinical integration; and (3) challenges in development and deployment. Conclusions: This study reveals that PPC providers express positive attitudes and strong demand for AI-assisted clinical work. Furthermore, the research clarifies appropriate roles for AI, outlines elements for clinical integration, and highlights potential challenges in development and integration. This study provides evidence for the feasibility of AI application in PPC and offers guidance for the future development and deployment of AI tools.
Automating Frailty Identification in Older Adults: A scoping review of Natural Language Processing and Explainable Artificial Intelligence methods
Date Submitted: Feb 11, 2026

Open Peer Review Period: Feb 12, 2026 - Apr 9, 2026
Peer Review Me
Background: Frailty is a multidimensional clinical syndrome characterized by diminished physiologic reserve and increased vulnerability to stressors, thus putting older adults at higher risk of adverse outcomes (e.g., falls, mental and physical disability, hospitalization, mortality) in response to even minor stress events. Frailty can be reversed or at least attenuated if detected early, yet early identification remains challenging in primary care due to time- and resource-intensive assessment methods. Artificial intelligence (AI) offers promise in automating frailty identification at the point of care. Natural Language Processing (NLP) is particularly valuable for extracting frailty indicators from rich text data stored in electronic health records, but its limited interpretability has prompted growing interest in augmenting the NLP processes with the use of explainable AI (XAI) techniques. Although NLP and XAI methods have been applied for chronic disease identification, their use for frailty identification has not yet been systematically examined. Objective: This scoping review aimed to synthesize current evidence on the use of NLP and XAI methods for automating frailty identification in older adults. Methods: Peer-reviewed studies published in English between January 2015 and November 2025 were eligible if they applied AI, NLP, or XAI methods to identify frailty in adults aged ≥50 years using real-world health data from OECD or OECD-partner countries. Searches were performed in PubMed and Google Scholar and supplemented by screening bibliographies of identified studies. Data were extracted using a standardized form that captured study characteristics, sample size, data sources, and specific aspects of the AI models, and NLP and XAI methods used. Results: We identified 24 studies that satisfied the eligibility criteria. While all studies used AI approaches to identify frailty, only six used neural network-based models. Logistic regression was the most frequently used AI method (n=14), and only one study employed Bidirectional Encoder Representations from Transformers (BERT). Seven studies relied on both structured and unstructured data, two relied exclusively on structured data only, and the rest relied exclusively on unstructured data. Seven studies used NLP methods, seven used XAI methods, and only one integrated both. Only two studies reported deploying their models in real clinical settings. Conclusions: AI-based approaches show promise for automating frailty identification, yet current applications remain limited by reliance on traditional machine learning models, underuse of NLP and XAI methods, and very little real-world deployment. Future work should focus on developing explainable NLP models, facilitating access to large volumes of unstructured data, and developing standardized frameworks for the systematic evaluation of NLP and XAI methods. Coordinated efforts across clinical, technical, and regulatory domains are essential to develop scalable, transparent, and clinically meaningful AI systems for frailty identification.
Pulse Pressure and Gastrointestinal Bleeding Risk in Anticoagulated Patients With Atrial Fibrillation: A Real-World Analysis From REACHnet
Date Submitted: Feb 11, 2026

Open Peer Review Period: Feb 12, 2026 - Apr 9, 2026
Peer Review Me
Background: Anticoagulated patients with atrial fibrillation (AF) face significant bleeding risks, which current risk scores inadequately predict. Pulse pressure (PP), a marker of arterial stiffness, may offer additional prognostic value. Objective: This study aimed to evaluate whether elevated PP independently predicts major bleeding events. Methods: We conducted a retrospective cohort study using electronic health records from 4,935 AF patients on oral anticoagulation (2010–2019) in the REACHnet network. PP was calculated from outpatient blood pressure readings and analyzed in tertiles and as a continuous variable. Kaplan-Meier curve and log-rank test were conducted to assess the association between PP and clinical outcomes. Cox regression models further adjusted for demographics, comorbidities, systolic blood pressure, medications, and the ORBIT bleeding score. Results: Over a median 5-year follow-up, 677 patients (13.7%) experienced major bleeding. GI bleeding was significantly more frequent in the highest PP tertile (p = 0.007), while intracranial and other bleeding types showed no significant differences. Each 10 mmHg increase in PP was associated with a 15% higher risk of GI bleeding (HR: 1.014; p = 0.042), and this association remained significant after adjusting for systolic blood pressure and the ORBIT score (OR: 1.013 per mmHg; p = 0.028). PP was not significantly associated with intracranial, other, or overall bleeding. Conclusions: Pulse pressure independently predicts gastrointestinal bleeding in anticoagulated AF patients, even after accounting for traditional bleeding risk factors. These findings support the inclusion of PP in future risk stratification models and clinical monitoring strategies. Clinical Trial: N/A
Evaluation approaches used to assess the quality of weight loss apps: a systematic review
Date Submitted: Feb 11, 2026

Open Peer Review Period: Feb 12, 2026 - Apr 9, 2026
Peer Review Me
Background: Mobile applications (apps) have emerged as a convenient and accessible solution to support weight management. More than 28,000 apps related to weight loss are available across various platforms. However, there is a lack of understanding of the most effective approach to evaluate the quality of these apps. Existing studies have focused only on popular apps or specific user groups. Objective: To identify the approaches employed to assess the quality of weight loss apps and to determine which app features are considered important for enhancing their effectiveness. Methods: This systematic review was conducted in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. A comprehensive literature search was carried out across four databases: PubMed, Embase, Medline, and Web of Science. As inclusion criteria, studies were eligible if they specifically assessed weight-loss apps among healthy adult users (aged ≥18 years) and were published between January 1st 2019, and June 30th 2024. Studies were excluded if they focused on non-digital interventions, were not in English, involved clinical, military, or athletic populations, or were review articles, meta-analyses, conference abstracts, or reports. Search terms were derived from the concepts of quality, weight loss, and mobile applications. Data extraction focused on the approaches used to evaluate app quality. Results: Eleven studies met the inclusion criteria, evaluating a total of 46 distinct weight loss apps. Seven generic app evaluation approaches and two supporting frameworks were identified, with the most frequently used being the Mobile App Rating Scale (MARS) (n=39 apps), Evidence-based Strategies (EBS) Assessment (n=25 apps), and Six Sigma (n=25 apps). Only two approaches, MARS and the System Usability Score (SUS), have been validated to evaluate mobile apps. A total of 8 feature categories were identified as present in the apps across the studies. The most frequently observed were nutrition education (5), self-monitoring and tracking (5), exercise content & tools (5), behavioural support (4), social features (4), coaching and feedback (3), planning and goal setting (3), and technical functionality (2). Nine features were also recommended by the study authors to enhance app effectiveness through behaviour change. These features are progress reports (4), self-monitoring (2), reminders (3), gamification (2), and expert monitoring (1), comprehensive nutrition databases (3), food entry options (2), barcode scanning of calorie content (2), and affordability (2). Only the initial five are associated with behaviour change elements as per the BCT Taxonomy framework. Conclusions: A range of approaches are currently employed to evaluate the quality of weight loss apps. This review identified seven commonly used evaluation approaches and two supporting frameworks, with MARS being the most frequently applied. Additionally, this study identifies a set of common and key features that should be prioritised in the development of weight loss apps for adults living with obesity to potentially enhance their overall effectiveness.
The Impact of a Behavior Change Wheel-Based Personalized mHealth Intervention on Physical Activity Participation in Older Adults with Mild Cognitive Impairment: A Feasibility Study
Date Submitted: Feb 11, 2026

Open Peer Review Period: Feb 11, 2026 - Apr 8, 2026
Peer Review Me
Background: Mild cognitive impairment (MCI) is recognized as a critical stage for dementia prevention. Physical activity is an important intervention to prevent cognitive decline, but challenges still remain in improving or maintaining cognitive function in older adults with MCI through increased physical activity. Personalized mobile health (mHealth) promotion strategies based on the Behaviour Change Wheel (BCW) hold promise for enhancing physical activity levels in this population. Objective: This study aims to evaluate the feasibility and preliminary effectiveness of a personalized mobile application (App) named ActiveAide, developed based on the BCW framework, for promoting physical activity among older adults with MCI. Methods: This feasibility study employed a single‑arm, pre‑ and post‑test design. 18 participants received an 8‑week personalized intervention via ActiveAide. Feasibility measures included recruitment rate, retention rate, App usage data, App usability evaluation, and user experience with the App. Effectiveness measures encompassed physical activity level, physical fitness, physical activity self‑efficacy, and social support. Quantitative data were analyzed using paired‑sample t‑tests and Wilcoxon signed‑rank tests, while qualitative data underwent content analysis. Results: The study achieved a recruitment rate of 90.9% and a retention rate of 90%. The mean strategy completion rate was 78.5%, with the mean number of App accesses of 71. The mean System Usability Scale (SUS) score was 74.86 ± 8.81, indicating good usability. Qualitative interviews identified three themes: strengths of MotiveAide, limitations of MotiveAide, and suggestions to improve MotiveAide. Post-intervention, statistically significant improvements were observed in participants’ physical activity level (P<0.001), physical activity self-efficacy (P<0.001), VO2max (P<0.001), strength assessment score (P=0.002), and body composition measures including total physical score (P<0.001), fat mass (P=0.001), and body fat percentage (P<0.001). No significant change was found in the level of social support. Conclusions: The personalized mHealth application ActiveAide, developed based on the BCW framework, demonstrated good feasibility and preliminary effectiveness in promoting physical activity among older adults with MCI. Future research could further optimize the application’s features and employ more rigorous designs, such as randomized controlled trials, to validate its long-term efficacy and generalizability.
Semantic Layer in Health Care: The Art of Riding a Bicycle
Date Submitted: Feb 10, 2026

Open Peer Review Period: Feb 11, 2026 - Apr 8, 2026
This manuscript needs more reviewers Peer Review Me
Health data interoperability is the central hill climb in contemporary digital health. Hospitals often accumulate data like mismatched spare parts, catalogued inconsistently, and difficult to re-use across care. The landscape of non-annotated source systems, legacy data warehouses that lack interoperable data models, the coexistence of multiple terminologies with divergent scopes, the operational turbulence of system migrations, and the persistent challenges of metadata catalogues and versioning set a starting point to a journey in building a semantic layer that makes data Findable, Accessible, Interoperable, and Reusable (FAIR), and that remains robust as terminologies evolve. Terminology updates are complex and their terms, classifications, and regulations continually change. This viewpoint article gives an exemplary historical overview at a Swiss university hospital, highlights the relevance of key decisions and projects and contrasts local conditions with the Swiss and European context. It notes perspectives of large clinical information systems and highlights organizational implications, tools and models needed, and the challenge of legacy data. It dives into project work of ontology creation. The discussion reflects on achievements and the future illustrating the cadence and resilience required to ride interoperable data “around the world”. Key Message. Achieving healthcare interoperability requires balancing diverse standards, terminologies, and data governance. The FAIR principles provide a framework. Organizational commitment to these practices is essential.
Health Discourse Regarding Syrian Refugees in Türkiye on Twitter: A Longitudinal Sentiment and Stance Analysis Study
Date Submitted: Feb 10, 2026

Open Peer Review Period: Feb 11, 2026 - Apr 8, 2026
This manuscript needs more reviewers Peer Review Me
Background: Since 2011, Türkiye has become the primary destination for Syrian refugees. While healthcare is a fundamental human right, public discourse surrounding refugee health services can influence policy and social cohesion. Objective: The objective of our study was to examine 14 years of Turkish health-related discourse on platform X (formerly Twitter) to identify evolving sentiment, stance, and key grievances. Methods: From a dataset of 4.5 million tweets (2009-2022), 116,172 health-related posts were identified. We employed a fine-tuned Turkish BERT-based large language model to perform multi-task classification for sentiment, stance, and health topics. Tweets were categorized into five domains as Provision of Healthcare Services, Financing and Coverage, Human Resources, Public Health and Disease Prevention, and Access to Medications and Pharmaceutical Services. Lift scores and heatmaps were used to analyze the relationship between the keywords and public attitudes. Results: The fine-tuned Turkish BERT model achieved high classification performance with a weighted F1 score of 0.85 for sentiment and 0.8 for stance detection. Public discourse shifted from neutral or positive tones in 2011 to overwhelming negativity over time. By 2021, negative sentiment reached 79.9%, and anti-refugee stance peaked at 78.3%. Prominent topics evolved from Provision of Healthcare Services (47.5% in 2011) to Public Health and Disease Prevention (57.3% in 2021) and Human Resources (34.6% in 2022). High lift scores revealed that anti-refugee stances were strongly associated with keywords such as ‘appointment’, ‘vaccine’, and ‘free’. Conclusions: There is a marked and consistent rise in anti-refugee sentiment within Turkish digital health discourse, often fueled by misinformation and perceived systemic strain. Public health authorities should prioritize evidence-based communication strategies to counter digital polarization and ensure the legibility of health policies for the host population.
Public Online Discussions of CAR T-cell Cancer Therapy: Unpacking the Hype
Date Submitted: Feb 9, 2026

Open Peer Review Period: Feb 10, 2026 - Apr 7, 2026
This manuscript needs more reviewers Peer Review Me
Background: Chimeric antigen receptor (CAR) therapy is a novel cell editing technology and innovative form of cancer immunotherapy. An individual’s immune cells (T-cells) are removed from the body, engineered to target and limit the growth of cancer cells, and reinfused into the patient’s body. The one-time treatment is expensive ($500,000 plus hospital costs), and requires specialized care to treat and manage the associated side effects, such as cytokine release syndrome (CRS), and other serious health issues including cognitive confusion, infertility, secondary malignancies, and compromised long term quality of life. At the same time, CAR T has been highly successful for patients with advanced blood cancers and no remaining treatment options. The CAR T landscape is changing rapidly, and product approvals have outpaced the capacity for researchers to collect long term evidence related to survival or predictive biomarkers that might better prioritize patients. Because CAR T is offered exclusively in urban cancer centres with access to cell manufacturing capacity, equitable access has been challenging. At the same time there is considerable demand and social hype about CAR T as a cancer cure despite the risks and uncertainty of the technology. Objective: We aimed to determine the dominant perspectives and nature of the information on CAR T-cell therapy available to the public in the online environment. Methods: In this qualitative study, we conducted a comprehensive search of websites including professional, medical, corporate, health-based, news media, and blogs to capture the diversity of online sources and their perspectives presenting information on CAR T-cell therapy. Fifty-one webpages met the study criteria and comprised the data set in this review. The content of the sites was reviewed and analyzed using a critical and interpretive descriptive lens. Results: We classified the website information into four dominant major themes characterizing CAR T-cell therapy: 1) patient stories of success, magic and hope; 2) medical science explainers; 3) economic perspectives; and 4) ethical discussions and complex arguments. With the exception of the sites that presented ethical discussions and complex information, the online environment positioned CAR T as revolutionary, curative, and the future of cancer treatment. Side effects were generally minimized, and collective dilemmas such as sustainability for the healthcare system, equitable access, and issues of prioritization were frequently sidelined or absent. Conclusions: The persuasive tone of online CAR T information combined with the increasingly blurred distinctions between research and care in genetic medical technologies suggests that obtaining informed consent or refusal may place too much onus on individual patients. In an evolving technological landscape such as CAR T, determining the acceptable risks and benefits is a question that ethically requires broader, as well as more inclusive, societal deliberation.
Consensus Statement on Digital Health and ADHD by The European Network for ADHD (EUNETHYDIS): a modified Delphi study
Date Submitted: Feb 7, 2026

Open Peer Review Period: Feb 9, 2026 - Apr 6, 2026
Peer Review Me
Background: Digital technologies are becoming an important part of healthcare, including for individuals with ADHD. Digital health innovations present valuable opportunities to provide flexible and tailored support for their diverse needs, together with significant challenges. Attentional, organisational, and motivational characteristics associated with ADHD may affect how individuals engage with digital tools. Potential risks include additional access barriers, exclusion of underserved groups, and diminished quality of care. To help reduce these risks, the development, evaluation, and implementation of digital tools must be person-centred and guided by a comprehensive understanding of diverse needs of all stakeholders. Objective: To advance research in this area, a multidisciplinary panel of ADHD specialists, technology experts, and individuals with lived experience of ADHD was formed. The panel worked together to agree on key priorities and considerations for developing, evaluating, and implementing digital technologies for ADHD. Recommendations are designed to be shared with the wider research community and to guide innovations in ADHD digital health to improve care. Methods: A modified Delphi approach was used to develop consensus. Key statements were drafted, building on discussions held during The European Network for ADHD (EUNETHYDIS) Special Interest Group (SIG) meeting in 2024. An Expert Panel that included additional key stakeholders was convened. Draft statements were shared with Panel members via a two-round Delphi survey and discussion meetings, with final statements co-produced by the Panel. Insights from multiple perspectives were incorporated, and consensus agreement sought. Refined statements were shared with EUNETHYDIS members for ratification. Panel members were invited to contribute as co-authors. Results: An expert panel of 30 members (21 EUNETHYDIS SIG members, 9 invited experts) co-produced 30 consensus statements on ADHD and digital health. Agreement ranged from 78.5-100% for the first round (19 statements), and 96.4-100% for the second round (30 statements). Final statements covered four topic areas: Opportunities and aspirations, Development and evaluation, Implementation, and Risks and unintended consequences. These were ratified in September 2025 by the EUNETHYDIS. Conclusions: This consensus process provides the first comprehensive set of key considerations for digital health care for people with ADHD and demonstrates the feasibility of achieving expert agreement on complex, rapidly evolving topics, such as digital health. Future work should focus on translating these considerations into more specific and practical implementation frameworks, identifying priorities, and connecting them to real-life stories and empirical evidence.
Measuring Substance Use with Ecological Momentary Assessment: A Systematic Review of Methods and Key Recommendations for a Methodological and Reporting Framework
Date Submitted: Feb 3, 2026

Open Peer Review Period: Feb 6, 2026 - Apr 3, 2026
This manuscript needs more reviewers Peer Review Me
Background: Substance use disorders account for a significant portion of the disease burden attributed to mental health globally, but measurement remains suboptimal. Studies assessing substance use typically rely on retrospective recall often over long periods of time. However, the episodic, contextual and event- or time-contingent nature of substance use call into question the validity of these traditional retrospective measurement methods. One method to overcome these limitations is ecological momentary assessment (EMA). EMA methods repeatedly sample participant behaviours and experiences in real time, in the context in which they occur. Objective: This review aimed to systematically identify studies using EMA in substance use measurement, provide a comprehensive overview of the EMA methods used, and to provide a draft framework for reporting and methodological recommendations for future EMA studies in this field. Methods: Studies published between 2018 and 2023 were sourced from PubMed, Medline, Scopus, and PsycINFO via Ovid databases on 31st January 2023 using terms related to EMA, digital phenotyping, passive sensing, daily diary and specific terms for each drug type. Studies that actively or passively assessed thoughts and/or behaviour, in the participants’ natural environment/daily lives, in a repeated manner, at or close to the behaviour of interest (substance use), using either automatic prompts or notifications were included. Studies were included for all populations, any age, in any setting, any study design, including RCTs or experimental designs. This study was preregistered on PROSPERO (CRD42023400418). Results: The search identified 7053 articles of which 858 were reviewed in full, and 273 (n = 70,831 participants) were included and extracted. Most studies were conducted in the United States (80%) and focused on alcohol (78%) and cannabis use (30%) with or without the presence of other substance use. Alcohol and cannabis measurement co-occurred the most in 44 (16%) studies. Psychedelics (2%) were particularly understudied using EMA methods. PCP, bath salts, and inhalants were only measured in one study each. We found limited reporting consistency with respect to compliance, completion windows, attrition rates, survey duration and data collection technologies in EMA substance use studies. Sensing data were measured in a limited number of studies. Conclusions: While EMA is a powerful tool for capturing dynamic behaviours, inconsistencies in reporting and design transparency persist. Improving reporting practices, smart sensing and wearable integration, compliance monitoring alongside expanding EMA to underexplored substances such as psychedelics, will be critical to enhancing data quality and advancing the field.
Addressing the Challenges in Using Synthetic Data for Health Research: Application to Cardiology
Date Submitted: Feb 5, 2026

Open Peer Review Period: Feb 6, 2026 - Apr 3, 2026
Peer Review Me
Synthetic data (SD) has emerged as a promising tool for advancing cardiology research by enabling data access, enhancing patient privacy, and supporting the development of machine learning models. By generating artificial patient records that reflect real-world distributions, SD can accelerate clinical research, improve model performance for rare cardiovascular conditions, and facilitate transnational collaborations that would otherwise be restricted by data sharing barriers. Despite these advantages, the increasing use of SD raises important ethical, regulatory, and methodological concerns that remain insufficiently addressed. Key challenges include assessing the validity and generalizability of synthetic datasets, understanding their limitations in representing complex and heterogeneous patient populations, and preventing the amplification of existing biases in cardiovascular care. Regulatory frameworks such as GDPR and HIPAA safeguard privacy but do not fully account for emerging risks such as re-identification or data leakage, leaving uncertainty regarding the use of SD in evidence generation for medical devices or therapeutic evaluation. Technical constraints, including the reliability of generative models and the difficulty of capturing nuanced clinical trajectories, further limit the clinical applicability of SD. As cardiology increasingly intersects with artificial intelligence and digital health technologies, ensuring rigorous methodological standards, transparent validation, and clear governance mechanisms is essential to harness SD responsibly. This Viewpoint highlights the opportunities and blind spots associated with SD and virtual patients in cardiology and underscores the need for harmonized regulatory guidance and ethical safeguards to support their meaningful integration into research and clinical practice.
Occam’s Razor in AI-assisted complex diagnosis: a comparative effectiveness study of single large language models versus multi-agent systems in resource-constrained primary care settings
Date Submitted: Feb 5, 2026

Open Peer Review Period: Feb 6, 2026 - Apr 3, 2026
This manuscript needs more reviewers Peer Review Me
Background: Primary care physicians in resource-constrained settings, particularly within low-income and middle-income countries (LMICs), frequently encounter a "diagnostic gap" when managing complex, rare, or multisystemic pathologies. While Large Language Models (LLMs) demonstrate significant potential to augment clinical reasoning, current state-of-the-art solutions rely predominantly on high-bandwidth cloud infrastructure, limiting their deployment in regions with unstable internet connectivity and strict data sovereignty regulations. Objective: The prevailing technological consensus in computer science suggests that "Agentic Workflows" or Multi-Agent Systems (MAS)—which orchestrate multiple models to simulate collective reasoning—inherently offer superior accuracy and safety compared to single models. However, the comparative efficacy, safety, and cost-effectiveness of complex MAS versus single localised models in offline, hardware-limited environments remain unproven. Methods: We conducted a prospective comparative benchmarking study using the DiagnosisArena dataset, comprising 915 complex clinical cases across 28 medical specialties. To simulate a secure, offline primary care environment, we evaluated five locally deployed single open-source LLMs (GPT-oss-20b Llama3.1-70B, Qwen3-32B, DeepSeek-R1-32B, Gemma3-27B) against two Multi-Agent architectures: a Standard voting ensemble and a novel hierarchical Adaptive Weighted System. All models were hosted on a local server (4×NVIDIA A100) using the Dify platform. Performance was adjudicated against a Reference Standard established by the consensus of three board-certified physicians using a dual-metric system: a 10-point Diagnostic Recall Scale and a comprehensive Hallucination/Safety Index. Inference latency and computational resource utilisation were recorded to assess cost-effectiveness. Results: Contrary to the hypothesis that architectural complexity yields diagnostic precision, single high-performance models significantly outperformed complex ensembles. The single GPT-oss-20b model achieved the highest Diagnostic Recall Score (mean 4.68 [SD 3.82]), statistically surpassing the Adaptive Weighted Multi-Agent System (4.13 [SD 3.43]; p<0.001) and smaller models such as Gemma3-27B (2.89 [SD 3.89]; p<0.001). The Adaptive System, despite utilising dynamic routing, failed to outperform the median score of human physicians (4.22 [SD 3.62]; p=0.432). Furthermore, the inclusion of mid-tier models in the adaptive workflow introduced an "ensemble degradation" effect, significantly lowering the Safety Score compared to the single GPT-oss-20b model (4.99 vs 5.50; p<0.001) and reducing the rate of Top-1 correct diagnoses from 51.58% to 46.89%. Crucially, the single GPT-oss-20b model demonstrated superior efficiency with an average inference time of 30 seconds per case, compared to 200 seconds for the Standard Multi-Agent System—representing an 85% reduction in latency. Conclusions: In the context of clinical diagnosis, architectural complexity does not equate to clinical utility. We identified a phenomenon of "ensemble degradation," where integrating mid-tier models into ensembles dilutes the reasoning capabilities of strong base models through the introduction of diagnostic noise. For global health equity, implementation strategies should prioritise "Lean AI"—localising a single, robust open-source model—rather than orchestrating computationally expensive agent swarms. This approach provides a safer, more accurate, and scientifically validated path for bridging the diagnostic gap in resource-constrained primary care.
Consensus-Based Recommendations for Optimizing Diversified TCM Data Collection during Clinical Work
Date Submitted: Feb 5, 2026

Open Peer Review Period: Feb 6, 2026 - Apr 3, 2026
Peer Review Me
Background: Background: An increasing amount of TCM clinical data can be collected by software and equipment, forming diversified TCM data, which should typically be collected alongside clinical work. TCM diagnosis and treatment data collection is conducted concurrently with clinical work, typically. However, with the limited time, space, and human resources available in clinical work, collecting diversified TCM Data is difficult, which may affect the quality of the collected data. Objective: Objective: To develop recommendations for optimizing diversified traditional Chinese medicine (TCM) data collection. Methods: Method: A working group comprising 12 members was established. Based on previous survey findings regarding the burden of clinical data collection, the group developed a preliminary list of recommendations for optimizing diversified TCM data collection. A Delphi survey was conducted to investigate consensus levels(using a 5-point Likert scale for importance evaluation) on the list items, and open-ended opinions were also surveyed. If experts in the first round propose additions, deletions, or modifications, or if there is a lack of consensus on certain items, a next round of surveys will be conducted to obtain the experts' agreement rate on the related items. Results: Results: A total of 86 experts from China, the United Kingdom, and Singapore completed two rounds of surveys. Following the first Delphi survey, all items achieved agreement scores above 4, with coefficients of variation(CV) below 0.2. The working group revised 12 items based on open-ended opinions and resubmitted them for agreement assessment. All revised items achieved agreement rates of over 95%. Following the two-round survey process, the final version of the recommendations comprises 5 primary domains, 11 sub-domains, and 25 items. Conclusions: Conclusion: This study formulated recommendations for optimizing diversified TCM data collection. It is hoped that these recommendations will help clinical data collectors consider data collection in advance during the design phase
How pandemics have reshaped respiratory virus data landscape in Europe? A scoping review
Date Submitted: Feb 5, 2026

Open Peer Review Period: Feb 6, 2026 - Apr 3, 2026
This manuscript needs more reviewers Peer Review Me
Background: Acute respiratory infections caused by influenza, respiratory syncytial virus (RSV), and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) remain a major public health challenge in Europe. Although surveillance systems for these pathogens are well established, the past two decades have seen a rapid diversification of data streams supporting surveillance and research. This expanding and increasingly complex data landscape, combined with fragmentation across institutions, sectors, and countries, may limit timely evidence synthesis and effective public health decision-making. Objective: This scoping review aimed to identify and characterize data sources used for surveillance and research on influenza, RSV, and SARS-CoV-2 across 12 European countries over the past 20 years, and to examine their evolution over time, their alignment with research objectives, and geographic variation in data availability and use. Methods: We conducted a scoping review using an objective-driven analytical framework. Empirical reports published between January 2005 and September 2025 were identified in Medline, Web of Science, and Embase. Eligible reports focused on influenza, RSV, or SARS-CoV-2 and included data from Western (France, Belgium, Germany, Netherlands), Northern (Denmark, England, Finland, Sweden), Southern (Italy, Spain), and Eastern Europe (Poland, Romania). Clinical and interventional studies were excluded. Reports were classified according to four research objectives: epidemiological monitoring; evaluation of interventions; assessment of disease burden and health outcomes; and analyses of population adherence and trust toward public health measures. Data sources were grouped into nine categories, including surveillance systems, electronic health records (EHRs), registries, claims, surveys, digital, environmental, and integrated datasets. Results: A total of 2,564 empirical reports were included. Over time, respiratory virus research relied on an increasingly diverse set of data streams. While surveillance systems remained central, particularly for epidemiological monitoring, their relative dominance declined. From 2020 onward, there was a marked expansion in the use of EHRs, registries, claims data, digital sources, and linked or integrated datasets, alongside increased use of open-access data. Data source use varied by research objective: surveillance data predominated in monitoring and intervention evaluation; EHRs in studies of risk factors and treatment effectiveness; surveys in seroprevalence and public trust analyses; and claims data in assessments of economic burden. Substantial geographic disparities were observed. Northern European countries more frequently used linked and multi-source datasets, whereas Western and Southern Europe relied more often on open-access or single-source data. Conclusions: Respiratory virus surveillance and research in Europe have expanded and diversified substantially over the past two decades, particularly after the Coronavirus disease 2019 (COVID-19) pandemic. However, access to advanced and integrated data streams remains uneven across countries. Strengthening preparedness for future respiratory virus threats will require sustained investment in interoperable data infrastructures, improved data governance, and the responsible use of artificial intelligence to integrate heterogeneous data sources.
Application of Multimodal Large Language Models in Cutaneous Lesion Recognition of Talaromycosis and Cryptococcosis
Date Submitted: Feb 4, 2026

Open Peer Review Period: Feb 5, 2026 - Apr 2, 2026
Peer Review Me
Background: Talaromycosis and cryptococcosis are prevalent in Southern China and Southeast Asia and are frequently misclassified due to overlapping lesion morphology and limited access to confirmatory testing. Objective: To evaluate the zero-shot diagnostic performance of multimodal large language models in identifying and differentiating cutaneous lesions of talaromycosis and cryptococcosis Methods: Published clinical photographs of cutaneous lesions of talaromycosis and cryptococcosis were systematically retrieved up to 31 August 2025, and seven representative multimodal large language models were benchmarked under a strictly zero-shot setting using a standardized prompt template and a predefined output schema. Latency, unanswerable/invalid response rates, and diagnostic performance were evaluated using accuracy, precision, sensitivity, specificity, F1-score, and Matthews correlation coefficient. For explanation quality assessment, model-generated texts were independently rated by two clinicians across five dimensions, and hallucination events were quantified. Results: In total, 214 articles (95 for talaromycosis and 119 for cryptococcosis), including 244 talaromycosis cutaneous lesion images and 236 cryptococcosis cutaneous lesion images, were collected for zero-shot evaluation. Most models achieved acceptable performance recognition, among them, ChatGPT-5 achieved the best performance. For comprehensive performance comparison, ChatGPT-5 ranked first across six indicators but exhibited relatively lower sensitivity. Evaluation of the output text quality demonstrated that the diagnostic texts generated by GPT-5 were excellent. The EQI was 70.08, with a hallucination rate of 21.76%. Conclusions: ChatGPT-5 demonstrates feasibility in the recognition of cutaneous lesions of talaromycosis and cryptococcosis under zero-shot conditions and can serve as a potential tool for assisting in the analysis of infectious skin disease images.
Impact of intelligent robot-aided task-oriented physiotherapy on patients’ musculoskeletal system compared with constant-support robot-aided treatment and physiotherapist-guided therapy – development and usability study
Date Submitted: Feb 4, 2026

Open Peer Review Period: Feb 5, 2026 - Apr 2, 2026
Peer Review Me
Background: Task-oriented rehabilitation supported by exoskeletons has the potential to increase therapy intensity, personalization, and accessibility. However, to achieve fully automatic treatment, robotized systems need to analyze therapy in a more complex way than only based on reference trajectories following. Objective: This study investigates the effects of an intelligent, context-aware control algorithm for an upper-limb rehabilitation exoskeleton on patients’ musculoskeletal engagement, compared with constant-admittance robot-assisted therapy and conventional physiotherapist-guided treatment. Methods: A single-session experimental study was conducted with 34 adult participants performing six activities of daily living under three therapy modes: robot-assisted therapy with constant admittance, robot-assisted therapy with an intelligent assist-as-needed algorithm, and physiotherapist-guided therapy. Muscle activity was assessed using surface electromyography of eight upper-limb muscle groups, while joint kinematics were recorded using inertial measurement units. Metrics included EMG power, muscle activation time, joint range of motion, and burst duration similarity indices. Statistical comparisons were performed using the T-test and the Mann-Whitney U-test depending on data normality. Results: Results indicate that the intelligent control strategy engages the musculoskeletal system at least as effectively as constant-admittance control across all exercises. At the same time, more motion control is given to the patient, which is preferable for neuroplasticity training. Compared with physiotherapist-guided therapy, robot-assisted treatment with intelligent control elicited significantly higher and more consistent muscular engagement. Intelligent assistance also modified joint-level motion patterns by reducing compensatory movements, particularly in shoulder–elbow coupling, while maintaining functional task execution. Muscle activation timing patterns during intelligent robot-assisted therapy were more consistent with robotic control than with manual therapy, reflecting altered movement strategies. Conclusions: These findings demonstrate that context-aware, intelligent control in rehabilitation exoskeletons can promote active patient participation, reduce compensatory behaviors, and maintain physiologically meaningful muscle engagement. The proposed approach exceeds the results of recent similar studies, being a promising step toward effective, minimally supervised, task-oriented rehabilitation. Clinical Trial: The experiments were carried out under the KB/132/2024 approval of the Bioethical Committee of the Medical University of Warsaw (https://komisja-bioetyczna.wum.edu.pl/). Written informed consent was obtained from all of the subjects involved in this study.
An Electronic Medical Record-Embedded Large Language Model for Acute Pancreatitis Diagnosis, Severity, and Prognosis
Date Submitted: Feb 4, 2026

Open Peer Review Period: Feb 5, 2026 - Apr 2, 2026
Peer Review Me
Background: Early diagnosis, accurate severity assessment of acute pancreatitis (AP), and prediction of progression to severe acute pancreatitis (SAP) are critical. We evaluated an electronic medical record (EMR)-embedded large language model (LLM) for these tasks. Methods: The LLM reviewed earliest AP hospitalization records of 261 adults and answered three prompts (diagnosis, severity, and risk of progression to SAP). Results: 224 (85.8%) had mild AP (MAP), 30 (11.5%) moderately SAP (MSAP), and 7 (2.7%) SAP. The LLM diagnosed AP with 89.3% sensitivity and 100.0% positive predictive value (PPV). Severity classification was inconsistent (MAP sensitivity 49.1%, MSAP 66.7%, SAP 42.9%). For progression prediction from initial MAP, the LLM showed high sensitivity (87.5%) but low accuracy (26.8%); Bedside index for severity in acute pancreatitis (BISAP) had higher accuracy (95.5%) but low sensitivity (12.5%). In MSAP, the LLM sensitivity was 85.7% versus BISAP 0%. Conclusions: An EMR-embedded LLM can detect AP and identify many who progress to SAP, but specificity and severity classification require improvement.
Exploring breast cancer patients’ digital health information behaviours across the illness trajectory: a qualitative study informed by Uncertainty Management Theory
Date Submitted: Feb 3, 2026

Open Peer Review Period: Feb 4, 2026 - Apr 1, 2026
Peer Review Me
Background: Background: The digital transformation of healthcare is reshaping how breast cancer patients access and use information, yet little is known about how their digital information behaviours evolve across the illness trajectory. Objective: Objective: To explore stage-specific digital health information behaviours and the cognitive, emotional and social factors shaping decision-making. Methods: Design: Descriptive qualitative study informed by Uncertainty Management Theory. Setting: A tertiary hospital in Shanghai, China. Participants: Fifteen women with breast cancer. Methods: Semi-structured, face-to-face interviews were conducted with purposive sampling across diagnostic, treatment and recovery phases; data were analysed using directed and inductive content analysis within a UMT framework. Results: Results: Five themes emerged, highlighting shifts from passive reception to active screening, complementary use of search engines, social media and AI tools, and the role of trust, emotion and social context in information acceptance or rejection. Conclusions: Conclusions: Digital health information behaviours are dynamic and stage-specific, suggesting phase-tailored, nurse-led digital support.
Digital Physical Exercise Interventions for Cognitive Functions in Older Adults: Systematic Review and Bayesian Network Meta-Analysis
Date Submitted: Feb 3, 2026

Open Peer Review Period: Feb 4, 2026 - Apr 1, 2026
Peer Review Me
Background: Digital physical exercise interventions offer a scalable solution to combat age-related cognitive decline. While various modalities exist, their comparative effectiveness across different cognitive domains remains unclear, necessitating a systematic evaluation to guide clinical practice. Objective: This study aims to evaluate and rank the comparative effectiveness of different digital physical exercise interventions—including immersive VR (IVR_E), non-immersive exergames (NI_ExG), remote exercise (RE), and VR combined with cognitive training (VR_EC)—on global cognition, executive function, and memory function in older adults. Methods: We conducted a systematic review and Bayesian network meta-analysis of randomized controlled trials (RCTs) published between January 1, 2010, and April 30, 2025. Data sources included PubMed, Embase, and Web of Science. Eligible studies involved older adults (aged ≥60 years) and compared digital physical exercise interventions against routine interventions (RI) or non-intervention (NI). The primary outcomes were global cognition, executive function, and memory function. We estimated standardized mean differences (SMDs) and ranked interventions using the surface under the cumulative ranking curve (SUCRA). Results: A total of 41 RCTs involving 2919 participants were included. For global cognition, IVR_E emerged as the most effective intervention (SUCRA=96.6%), followed by NI_ExG (SUCRA=76.4%); both modalities were significantly superior to RI. Regarding executive function, RE (SUCRA=73.8%) and NI_ExG (SUCRA=69.3%) ranked highest. Notably, NI_ExG was the only intervention to demonstrate a statistically significant improvement over RI in this domain, while IVR_E showed no significant advantage. For memory function, IVR_E was the dominant intervention (SUCRA=82.8%) and was the only modality significantly more effective than RI. Subgroup analyses further indicated that a cumulative training dose exceeding 1000 minutes is critical for observing significant improvements in memory function. Conclusions: Digital physical exercise interventions significantly enhance cognitive function in older adults, but their optimal application is domain-specific. IVR_E appears most effective for global cognition and memory, likely due to high immersion and standardization. Conversely, NI_ExG and RE are preferable for enhancing executive function, potentially offering more scalable alternatives for home-based care. Future interventions targeting memory improvement should ensure sufficient cumulative training duration. Clinical Trial: PROSPERO CRD42025103014
Changing Technology Use, Confidence, and Support Needs Among Older Adults in UK Retirement Villages: A Mixed Methods Study
Date Submitted: Feb 3, 2026

Open Peer Review Period: Feb 4, 2026 - Apr 1, 2026
Peer Review Me
Background: Assistive technologies can support independent living among older adults, but uptake is often constrained by attitudes and confidence. The COVID‑19 lockdowns accelerated technology use across all age groups, offering a natural experiment to examine changes in adoption. Objective: This study aimed to examine changing patterns of technology use in older adults, to provide insight as to how service providers can support the use of technology to support independence and well-being. Methods: Two cross‑sectional surveys were conducted in UK retirement villages, one before the pandemic (2020) and one after lockdowns (2023), to assess technology attitudes and use. Semi‑structured interviews with eight participants in a technology trial scheme provided qualitative insights. Results: Technology adoption increased significantly between 2020 and 2023, with older adults reporting greater confidence and comfort in digital use. Self‑education and informal support from family or friends were the most common pathways to adoption. Age‑related differences in confidence observed in 2020 were no longer apparent in 2023, although gender disparities persisted. Interviewees emphasized usefulness and accessibility as key drivers of sustained engagement. Findings demonstrate that the pandemic catalyzed lasting increases in technology adoption among older adults, including increased confidence and ownership. Conclusions: Findings demonstrate that the pandemic catalyzed lasting increases in technology adoption among older adults, including increased confidence and ownership. These results provide evidence for housing providers and policymakers to embed accessible technologies and targeted support in retirement communities, thereby enhancing independence and quality of life in later life.
Social media influencer marketing as a clinical trial recruitment modality: A tutorial informed by one study’s approach
Date Submitted: Feb 3, 2026

Open Peer Review Period: Feb 4, 2026 - Apr 1, 2026
Peer Review Me
Social media influencer marketing is a digital advertisement strategy that is growing in popularity. Its use has been documented in consumer purchasing behavior but is yet to be described for clinical trial recruitment. In this tutorial, we describe the steps we followed to develop and deploy a social media influencer advertisement for the recruitment of participants into the Groceries for Residents of Southeastern USA to Stop Hypertension (GoFreshSE) trial. We also provide a preparation framework for other studies who would like to use this modality for their own clinical trial recruitment. We used Cameo Business to identify potentially relevant influencers to hire by selecting influencers who were popular in the 3 geographic areas from which GoFreshSE is recruiting. We narrowed down the list of possible influencers by selecting those with ≥100,000 followers on their respective social media platforms (for a wide reach) and charged a cost of ≤$3,000/video. We ultimately selected a former football coach, who provided a high-quality video of him reading an institutional review board-approved script 4 days later. We utilized open source, commercially available tools to edit the video and deployed the 44-second-long video on Facebook and Instagram using Meta’s Advertising platform. Social media influencer marketing through the Cameo Business platform is a rapid mechanism to develop clinical trial influencer recruitment videos.
Ternary Card Hypercube Pooling for PCR Testing in Future Pandemics: Tutorial on How to Identify infected patients by efficient pooling strategy
Date Submitted: Feb 3, 2026

Open Peer Review Period: Feb 4, 2026 - Apr 1, 2026
Peer Review Me
Background: Sample pooling is an essential strategy for optimizing polymerase chain reaction (PCR) resources during infectious disease outbreaks, especially in the beginning. While high-dimensional hypercube pooling strategies—such as those recently highlighted in Nature—offer superior efficiency in low-prevalence settings, they are difficult to implement in practice. The human cognitive and physical limitation to three-dimensional environments makes manual execution of four- or five-dimensional sample arrays prone to significant operational error. Objective: To develop and evaluate a novel "Ternary Card Hypercube Pooling" strategy that simplifies the implementation of multidimensional pooling, making it accessible for laboratory personnel without compromising mathematical efficiency. Methods: We integrated logic from ternary card games (based on sets of three attributes) to create a visual and physical framework for hypercube pooling. This method maps high-dimensional coordinates onto a simplified "card" system, allowing laboratory technicians to organize and track samples using intuitive pattern recognition rather than complex multidimensional mapping. Results: The Ternary Card method successfully translates the efficiency of hypercube pooling into a user-friendly workflow. It maintains the high performance of traditional hypercubic algorithms—allowing for rapid identification of positive samples in a single step in the majority of cases—while significantly reducing the risk of manual pipetting errors and the need for specialized automated equipment. Conclusions: The Ternary Card Hypercube Pooling strategy bridges the gap between theoretical mathematical efficiency and practical laboratory application. By reducing the complexity of sample handling, this method provides a scalable solution for increasing PCR throughput in response to future pandemics, particularly in resource-limited settings. Clinical Trial: NA
From Pilot Trap to Infrastructure: A Governance Framework for Clinical AI Institutionalization in Health Systems
Date Submitted: Feb 2, 2026

Open Peer Review Period: Feb 3, 2026 - Mar 31, 2026
Peer Review Me
Background: Despite increasing technical maturity, most clinical artificial intelligence (AI) systems remain confined to pilot or experimental settings, rarely achieving sustained integration into routine healthcare delivery. The persistence of this "pilot trap" is driven primarily by structural and institutional constraints rather than algorithmic performance limitations. Objective: To develop a governance framework that enables the transition of clinical artificial intelligence (AI) from project-based experimentation to durable institutional infrastructure, informed by the establishment of a provincial-level AI platform within a policy-oriented healthcare system in China. Methods: An 18-month real-world institutionalization process of the Hebei Provincial Clinical AI Platform was examined, encompassing the formation of a dedicated Medical AI laboratory, designation as a provincial engineering center, acquisition of regulatory authorizations, and deployment of structured clinical application pathways. Framework construction was grounded in systematic analysis of governance arrangements, policy legitimacy mechanisms, and translational implementation trajectories observed throughout the institutionalization process. Results: The framework comprises six interdependent modules encompassing institutional carrier formation, data and computational infrastructure, ethical and regulatory governance, interdisciplinary operational coordination, translational scaling and regional dissemination, and continuous evaluation. Implementation evidence indicates that governance architecture functions as a prerequisite to, rather than a consequence of, technical deployment. Organizational anchoring, external legitimacy, and coordinating capacity enable AI systems to operate as enduring institutional infrastructure rather than transient technological experiments. The framework reframes clinical AI from an algorithmic artifact to an embedded institutional capability, redirecting implementation logic from technical performance metrics toward governance maturity. Conclusions: Sustainable clinical AI implementation is associated with governance-first rather than technology-first strategies. Effective institutionalization requires the concurrent establishment of organizational ownership, policy legitimacy, and coordinating mechanisms prior to large-scale deployment. Although derived from a policy-oriented healthcare context in China, the core governance functions demonstrate potential transferability across health systems, with institutional mechanisms varying by context while functional requirements remain comparatively stable. The framework offers an operational architecture for health systems seeking AI as infrastructure rather than episodic experimentation. Clinical Trial: NA.
Cultural Adaptation of a mobile health application for Aboriginal and/or Torres Strait Islander mothers and families: A Qualitative Study
Date Submitted: Feb 2, 2026

Open Peer Review Period: Feb 3, 2026 - Mar 31, 2026
Peer Review Me
Background: Co-design ensures cultural safety of health interventions for Aboriginal and/or Torres Strait Islander communities. However, an intervention developed with one Indigenous community may not be suitable for another geographically and culturally distinct community. Objective: This study aimed to culturally adapt content and features of a mobile health (mHealth) application co-created by communities in one Australian state to better meet the needs of mothers and caregivers of Aboriginal and/or Torres Strait Islander children aged 0-18 years and health professionals in another state. Methods: The study followed the stages of the cultural adaptation stepwise model by Barrera et al. Mothers/caregivers of Aboriginal and/or Torres Strait Islander children aged 0-5 years and their health professionals were recruited from multiple community sites. Data were collected through culturally appropriate yarning circles or interviews facilitated by Aboriginal research staff. Qualitative data were transcribed and inductively analysed to generate themes. The feedback was translated into practical changes that were applied to the mHealth application. Results: Data saturation was achieved after yarning circles with 21 women and seven health professionals. Nine themes were generated from mothers/caregivers’ data: 1) cultural relevance and sensitivity, 2) linking with culturally appropriate services, 3) Use of lay language and more audio-visual content , 4) concerns with mobile data usage, 5) Perceptions about the current content of the Jarjums app, 6) raising children, 7) safety, 8) health and wellbeing of mothers and caregivers, and 9) coordinating health care. Four themes were generated from data collected from health professionals: 1) favourable features of the app, 2) potential barriers to the use of the app, 3) healthcare system access issues, and 4) recommended modifications. Based on feedback received, the mHealth application changes included the addition of information on healthy relationships and raising children, more visual content, and localized service directories for different categories of care and support. Conclusions: A co-designed, culturally sensitive mHealth application is likely to support Aboriginal and/or Torres Strait Islander families facing health disparities due to disruption of Indigenous culture by a foundation for a potential clinical trial for effectiveness evaluation and wider implementation.
Extracting Quality of Life Information from Forum Posts Using Open-Source Large Language Models: Feasibility Study
Date Submitted: Feb 2, 2026

Open Peer Review Period: Feb 3, 2026 - Mar 31, 2026
Peer Review Me
Background: Quality of Life (QoL) questionnaires are an established instrument designed to assess overall wellbeing and quality of life of patients. They are important in predicting the outcome of the disease and understanding the needs of individual patients. However, their repeated collection imposes substantial burden on both patients and clinical professionals. Many patients seek emotional support and mutual exchange in online communities for peer-support, where they frequently share detailed descriptions of symptoms and treatment experiences, addressing topics covered in QoL questionnaires. The emergence of large language models (LLMs) uncover potential for automatic extraction of relevant QoL information from patient-generated text. Objective: The aim of this study is to evaluate and compare various open-source LLMs and optimization approaches for automated extraction of QoL information from forum posts. Methods: The dataset consisted of 2,683 English-language posts from breast cancer patients recruited on Inspire.com online communities, manually annotated with sentence-level text spans indicating whether and where posts contained information relevant to 53 QoL questions from EORTC QLQ-C30 and QLQ-BR23 questionnaires. 11 open-source LLMs (8B-70B parameters) were evaluated in a zero-shot setup, generating 4,452 post-question predictions per model under two input conditions: post-only and post with additional context. For the best-performing model, additional experiments assessed the impact of chain-of-thought prompting, instruction optimization, few-shot prompting and parameter-efficient fine-tuning. For correctly classified yes/no instances, the overlap between model-generated evidence and human-annotated spans was evaluated. Results: Across 11 evaluated LLMs, GPT-OSS 20B achieved the highest macro F1-score (0.79) in the zero-shot post-only setting. Providing additional context consistently reduced performance of all models. Model size did not correlate with F1-score, with several mid-sized models (14B-30B) outperforming 70B models. For GPT-OSS 20B, chain-of-thought prompting did not improve performance (0.77). Instruction optimization produced results similar to the baseline in both zero-shot and few-shot settings (0.78-0.80). Bootstrap few-shot prompting with random search achieved the highest score overall (0.81). Parameter-efficient fine-tuning decreased performance (0.71). Most classification errors occurred in semantically broad or ambiguous terms and the fallback question. For correctly predicted yes/no answers, model-generated evidence matched or partially matched human-annotated spans in 89% of cases. Conclusions: Open-source LLMs are a promising tool for extracting QoL information that aligns with standardized questionnaire responses from online health forums. Mid-sized models achieved the highest accuracy, particularly in zero-shot, post-only settings. Few-shot prompting can further improve the results. Models were also able to generate evidence spans that closely matched human annotations. However, they consistently struggled with ambiguous and semantically overlapping terms. Overall, automated extraction of QoL information from patient-generated content may offer a faster, lower-cost and low-burden complement to traditional QoL questionnaires, given that limitations such as symptom ambiguity are addressed in future work.
Web-based, open-source LGBTQ+ Affirming Care Education for Primary Healthcare Providers: A Descriptive Analysis
Date Submitted: Feb 1, 2026

Open Peer Review Period: Feb 2, 2026 - Mar 30, 2026
This manuscript needs more reviewers Peer Review Me
Background: Affirming Care for lesbian, gay, bisexual, transgender, and queer (LGBTQ+) populations refers to culturally and clinically competent healthcare that recognizes specific health needs and provides respectful, inclusive, equitable, and non-discriminatory services that are supportive of diverse identities. LGBTQ+ populations face greater discrimination in healthcare, leading to higher levels of unmet health needs than the general population. Very few primary care practices in the United States have training for staff and clinicians on LGBTQ+ healthcare needs. Despite the growing needs for LGBTQ+ affirming care, there are no national standards or requirements for LGBTQ+ cultural competence training for primary-care healthcare providers in the United States. Objective: This study explores the accessibility and quality of online ‘grey literature’ providing LGBTQ+ affirming and culturally competent care information for primary care providers in the United States. Grey literature is produced by government, academic, business, and industry sources in formats not controlled by commercial publishing. Methods: We conducted a Google search of grey literature to identify readily available resources and training materials. Two thousand websites were screened. Those published in a language other than English before January 1, 2014, as well as those that were peer-reviewed literature or behind a paywall, were excluded. Fifty-four websites met the inclusion criteria for a full-text review. Results: We identified six themes from the existing academic literature: (1) affirming physical and visual environments, (2) sexual orientation and gender identity (SOGI) data collections, (3) training on LGBTQ+ health needs, (4) anti-discrimination policies, (5) appropriate, relevant services for LGBTQ+ patients, and (6) use of inclusive language. We then applied these themes as a deductive coding framework to the web-based sources and, during analysis, two additional sub-themes emerged: (1) staff diversity, (2) health inequalities and inequities. Findings revealed that not every web-based source addressed all themes. This unequal distribution of coverage across these themes means that providers must consult multiple web-based sources to obtain a comprehensive understanding. Additionally, existing grey literature resources often lacked depth, technical detail, and practical guidance, making it difficult for primary care providers to access actionable information on LGBTQ+ affirming care. ‘Training on LGBTQ+ health needs’ was the most frequently covered theme, and ‘SOGI data collection’ was the least addressed. Study limitations included geolocation biases and embedded advertisements in the Google search results. Conclusions: The study highlights that grey literature is insufficient for self-guided training. We recommend integrating formal LGBTQ+ affirming care training into medical and nursing curricula, as well as professional associations and continuing education, particularly amid growing federal and state-level restrictions on LGBTQ+ healthcare.
Effectiveness of enhanced computerised physician order entry and clinical decision support system in optimizing medication safety in special populations: a systematic review and meta-analysis
Date Submitted: Feb 1, 2026

Open Peer Review Period: Feb 2, 2026 - Mar 30, 2026
This manuscript needs more reviewers Peer Review Me
Background: The consequences of medication errors are substantial as they pose a significant threat to the high-risk population, including paediatric, neonatal and geriatric patients. Computerised Provider Order Entry (CPOE) systems and clinical decision support systems (CDSS) are increasingly implemented to reduce medical errors by automating prescribing processes and providing real-time decision support. While alerts have been shown to provide value, barriers to widespread implementation exist in the form of alert fatigue and usability problems. Objective: This systematic review and meta-analysis assessed the effectiveness of CPOE and CDSS in reducing medication errors across diverse populations and clinical environments. Methods: A systematic review was conducted following the Preferred Items for Systematic Review and meta-analyses (PRISMA guidelines), with four databases searched up to February 2025 for studies evaluating the effects of CPOE and CDSS implementation on medication error in paediatric and geriatric populations. We included only cohort and prospective studies, not restricted by language or country of publication. Single measures of continuous outcomes on medication error rates were extracted from each study. The Comprehensive Meta-analysis (CMA) was then applied to perform separate analyses to compare the outcome pre-and post-CPOE/CDSS implementation. A random-effect meta-analysis was conducted, with subgroup analyses to assess differences by population, healthcare setting, and system design. The Newcastle–Ottawa Scale was used for quality appraisal. Forest plots and funnel plots were applied for pooled results and publication bias assessment. Results: Fourteen studies met the inclusion criteria (paediatric: n = 12; geriatric: n = 2), all rated as good quality. In paediatrics, 10 of 12 studies reported significant reductions in medication errors post-implementation. Pooled analysis showed error rates were almost threefold higher pre-implementation (OR = 2.97; 95% CI 2.81–3.14), with substantial heterogeneity (I² = 94%) but consistent positive direction of effect. In geriatrics, both studies demonstrated significant reductions with no heterogeneity (I² = 0%) (OR = 2.45; 95% CI 2.29–2.62), though evidence remains limited in scope and setting due to the small number of studies. Descriptive synthesis indicated that CPOE/CDSS can intercept high severity errors, such as overdoses of high-risk medications, before reaching patients, although most studies assessed potential rather than actual harm. Meta‑regression showed study location as a significant moderator, with greater effects in North American studies compared to those conducted in Asia. No publication bias was detected, but regional variation suggests contextual factors such as healthcare infrastructure, informatics maturity and influence system effectiveness. Conclusions: CPOE/CDSS significantly reduces medication errors in special populations, with strong and consistent benefits in paediatrics and promising but limited evidence in geriatrics. Despite heterogeneity in paediatric studies, the direction of effect was uniformly positive. The systems also show potential to reduce the severity of harmful errors, although robust evidence on actual patient harm is lacking. Optimising and tailoring CPOE/CDSS to specific patient populations and healthcare settings, while addressing alert fatigue and workflow integration, are essential to maximise impact. Further research should expand the geriatric and neonatal evidence base, assess long-term outcomes and explore advanced decision support capabilities to enhance patient safety and clinical impact.
Application of Ecological Momentary Assessment in Maternal Health Management: A Scope review
Date Submitted: Feb 1, 2026

Open Peer Review Period: Feb 2, 2026 - Mar 30, 2026
This manuscript needs more reviewers Peer Review Me
Background: Ecological momentary assessment (EMA) enables real-time, repeated evaluation of participants' emotions, thoughts, and behavioral patterns in natural settings. It effectively mitigates the retrospective bias inherent in traditional surveys and facilitates a longitudinal understanding of health status. However, its feasibility, practicality, and methodological details for monitoring and promoting maternal health remain unclear. Objective: To conduct a scoping review of studies on the application of EMA in maternal health management, providing a reference for future research and further promotion of maternal and infant health. Methods: Using the Joanna Briggs Institute (JBI) scoping review guidelines as the methodological framework, we searched the Web of Science, PubMed, CINAHL, Embase, Cochrane Library, China National Knowledge Infrastructure (CNKI), China Biomedical Literature Database, Wanfang Database, and VIP Database. The search covered publications from the inception of each database to December 2025, and the included studies were subjected to a comprehensive analysis. Results: The search yielded 2,989 publications, of which 14 were ultimately included. The findings were summarized across three dimensions: study design characteristics (publication year, country, and study design features, such as sample size, study population, and outcome measure type); EMA data collection methods (EMA schedule characteristics, such as monitoring cycle, duration, and data sampling methods, such as fixed-time, random-time, or event-based sampling); and EMA response-related outcomes (participation rate and response rate). Conclusions: The EMA effectively mitigates the recall bias inherent in traditional assessment methods, offering novel approaches to enhance the quality of maternal health management. This enables longitudinal monitoring of maternal experiences in natural settings, facilitating the early identification of abnormal physiological, psychological, and behavioral issues during pregnancy and postpartum. This allows timely intervention to safeguard maternal and infant health. Future research should refine EMA study designs and implementation formats to fully leverage their potential in promoting maternal health and personalized interventions for maternal-infant wellness. Clinical Trial: Trial Registration: OSF Registries 10.17605/OSF.IO/GMFKZ
Enhancing Healthcare Interoperability Using Large Language Models: A Generative Proof-of-Concept Framework to Extract Medical Information from Unstructured Clinical Text
Date Submitted: Jan 29, 2026

Open Peer Review Period: Feb 2, 2026 - Mar 30, 2026
Peer Review Me
Background: Unstructured clinical text remains a major barrier to interoperable data reuse and large-scale secondary analysis in healthcare. Large language models (LLMs) have the potential to automate the extraction of structured clinical information; however, their application is limited by the scarcity of high-quality annotated training data. Objective: To address these limitations, this study aims to develop and validate a scalable, privacy-preserving framework that utilizes synthetic data generated from structured Fast Healthcare Interoperability Resources (FHIR) to fine-tune open-source LLMs for the effective extraction of interoperable clinical information from unstructured text. Methods: We evaluated an LLM–based pipeline for extracting structured clinical information from cancer-related discharge letters and mapping it to representations compatible with Fast Healthcare Interoperability Resources (FHIR). To enable large-scale supervised training, we developed a random sample generator that creates synthetic discharge letters using Qwen3 235B by randomly sampling and aggregating structured FHIR data from 41,175 cancer patients. The resulting synthetic discharge letters (n=75k) were paired with their originating structured data, forming a large-scale dataset for fine-tuning MedGemma 27B. Evaluation was conducted on the synthetic test dataset (n=7,500), real-world discharge letters (n=30) which are evaluated by physicians and a medical student, and a comparative one-shot approach using open-source models (Qwen3, LLaMA, and GPT-OSS). Results: The fine-tuned model achieved high extraction performance across multiple clinical entities, including full ICD diagnosis codes (F1 = 0.84), tumor-related information (0.99), laboratory values (0.99), medication names and dosages (0.99), and ATC medication codes (0.94). Extraction of procedure-related information was more challenging but remained reliable, with F1 scores of 0.63 for OPS codes and 0.90 for procedure descriptions. In a one-shot comparison of general-purpose LLMs with the fine-tuned model, the fine-tuned model consistently outperformed general-purpose LLMs in nearly all extraction categories. When applied to real-world discharge letters, performance remained robust, with F1 scores of 78.9% for ICD diagnoses, 86.1% for tumor-related information, 93% for medications, and 61.3% for procedures. Conclusions: These results demonstrate that synthetic text generation from structured clinical data enables effective and scalable training of LLMs for extracting interoperable, multi-entity clinical information from unstructured documentation.
Assessment of a Digital Health Platform Using Web Analytics and User Experience Measurements: An Evaluation Study Based on RE-AIM
Date Submitted: Jan 30, 2026

Open Peer Review Period: Feb 2, 2026 - Mar 30, 2026
This manuscript needs more reviewers Peer Review Me
Background: In recent years, the field of digital health has grown exponentially, leading to notable benefits such as easier access to health-related information, but also to content saturation and misinformation. Thus, it is crucial to identify digital health tools that provide meaningful value and assess their real-world impact. Objective: This pre-registered study’s goal was to quantitively assess the LONDI platform, a German platform designed for different user groups supporting children with learning disorders. This assessment focused on user groups of mental health professionals (i.e., learning therapists and school psychologists), and was grounded on four of the five RE-AIM-framework dimensions: Reach, adoption, implementation, and maintenance. Methods: Data was collected over a 10-month period, between May first 2024 and March first 2025. The reach dimension was measured via a pop-up questionnaire (N=1324), collecting demographic and professional experience data. The adoption dimension was measured via a second pop-up questionnaire (N=160), measuring user experience (UX) and reuse intention for the platform’s help system. The implementation dimension was measured via web analytics (N= 37,133), measuring reading time for pages intended for mental health professionals. Moreover, this dimension was also assessed by comparing chatbot engagement rates with industry benchmarks. The maintenance dimension was measured via web analytics as well, comparing the usage in the previous (N= 20,496), and the current platform version (N= 37,133) in terms of number and location of users, time spent on the platform, number of actions per visit, and used devices and software. Results: 22% and 10.64% of the users that filled out the first pop-up questionnaire stated that they were learning therapists or school psychologists, respectively, exceeding their percentage in the German population (< 0.01%). The second pop-up questionnaire revealed an overall mean UX score of 1.46, surpassing the benchmark average, and UX ratings predicted intention to reuse. Time spent on the pages intended for mental health professionals was below the time needed to read them. The 0.18% rate of chatbot engagement was very low compared with industry benchmarks of 35-40%. Usage changed in the two compared time periods, and most strikingly, there was an 81.2% increase in the number of users. Conclusions: The study provides evidence to the LONDI platform’s optimal public health impact in terms of the reach, adoption, and maintenance RE-AIM-framework dimensions. Further research and endeavors and are needed to better understand and improve the platform’s impact in terms of the implementation dimension.
Balancing Value and Risk: Clinicians’ Perceptions and Adoption of AI-Enabled Clinical Decision Support Systems
Date Submitted: Jan 29, 2026

Open Peer Review Period: Jan 30, 2026 - Mar 27, 2026
This manuscript needs more reviewers Peer Review Me
Background: The increasing adoption of Artificial Intelligence (AI) in healthcare, particularly within Clinical Decision Support Systems (CDSSs), is transforming clinical practice and decision-making. Although AI-CDSSs hold the potential to improve diagnostic accuracy, operational efficiency, and patient outcomes, their implementation also creates ethical, technical, and regulatory concerns, affecting healthcare professionals’ willingness to adopt these systems. Objective: Building on a value-based perspective, the study integrates the Unified Theory of Acceptance and Use of Technology (UTAUT) framework as determinants of perceived benefits and a risk-based perception model as determinants of perceived risks to develop a unified model exploring clinicians’ behavioural intention to adopt AI-enabled CDSSs. Methods: A self-administered cross-sectional survey was distributed to licensed healthcare professionals to examine how validated factors influence perceptions of risks and benefits. Responses were collected from 215 clinicians across Italy and the United Kingdom. Recruitment was undertaken using email invitations, attendance at academic conferences, and direct approaches within healthcare settings. Results: Perceived Benefits were found to be the strongest positive predictor of clinicians’ intentions to use AI-enabled CDSSs (β=.45, p<.001), whereas perceived risks had a significant negative effect (β=-.18, p=.002). Performance Expectancy and Facilitating Conditions significantly increased the adoption intentions, whereas Effort Expectancy and Social Influence were not significant. Among the risk antecedents, Perceived Performance Anxiety, Communication Barriers, and Liability Concerns were significant predictors of Perceived Risks. The model explained 46% of the variance in the intention to use AI-enabled CDSSs. Conclusions: The findings offer theoretical and practical insights into human factors influencing AI adoption in clinical practice, underscoring the importance of value alignment, professional accountability and institutional readiness, and highlighting the need to foster clinician trust in AI tools beyond the boundaries of technical performance.
Adoption of asynchronous secure messaging in hospital-based ambulatory specialty clinics in Ontario, Canada: A convergent mixed-methods study
Date Submitted: Jan 29, 2026

Open Peer Review Period: Jan 30, 2026 - Mar 27, 2026
Peer Review Me
Background: The COVID-19 pandemic significantly increased adoption of virtual care, including patient-to-provider secure messaging. However, this surge has heightened physician workload and burnout and has raised concerns about message appropriateness and liability among physicians. Objective: This study characterizes secure messaging use in Canadian hospital-based specialty care and explores the experiences of healthcare providers, administrative staff, and patients. Methods: We employed a convergent mixed-methods design, analyzing aggregated electronic health record (EHR) usage data and qualitative interview data. The study was conducted at Women’s College Hospital in Toronto, Canada, across four high-messaging specialty clinics: mental health, rheumatology, dermatology, and surgery. Quantitative data (Oct, 2019-Oct, 2022) detailing message volumes, response patterns, and timing. Semi-structured interviews explored messaging workflows, barriers, and facilitators. Data were analyzed separately, then converged to identify areas of convergence and divergence. Results: Message volumes surged post-pandemic, particularly in mental health. The monthly message rate per patient varied, with higher rates in mental health and rheumatology. Physicians reported negative experiences due to increased workload, lack of compensation, and inadequate integration into clinical workflows. High patient-to-physician ratios and limited nursing support for message triage were associated with a poor messaging experience. Patients and administrative staff valued messaging for its convenience, accessibility, and efficiency. A key finding was the poor engagement of all user groups in decisions regarding messaging implementation. Conclusions: The study highlights a disconnect between the high perceived value of secure messaging for patients and administrative staff and the negative experiences of physicians. Successful implementation requires thoughtful integration into care models, clear guidelines for patient use, and proper triage and "channel management" to guide patients to appropriate visit modalities. Future research should explore triaging algorithms as part of a digital front door, specialty-specific variations and the crucial role of nursing staff in message management.
Quality of life of people living with dementia residing in nursing homes: A study using natural language processing to analyse observational data
Date Submitted: Jan 22, 2026

Open Peer Review Period: Jan 29, 2026 - Mar 26, 2026
Peer Review Me
Background: Quality of life (QoL) plays a crucial role in dementia care, yet QoL and its dynamic, context-dependent nature can be difficult to capture in people living with dementia due to challenges in memory and communication and limitations of self-reported QoL instruments. Observational tools such as the Maastricht Electronic Daily Life Observation (MEDLO) provide narrative descriptions of the daily life of people living with dementia in nursing homes. However, the MEDLO tool was not developed to assess QoL specifically, and it remains unclear to what extent its narrative descriptions reflect aspects of QoL. Analysing these narrative descriptions is labour-intensive and time-consuming. Recent advances in natural language processing (NLP), including Large Language Models, offer potential to analyse these narrative descriptions at scale. Objective: The study aims to gain insight into the QoL in people living with dementia residing in nursing homes in the Netherlands, using NLP to interpret narratives of daily life in existing MEDLO data. Methods: This study conducted a secondary analysis of existing MEDLO observational data from 151 people living with dementia residing in Dutch long-term care. Narrative data had been documented by trained observers, describing activities, interactions, settings and emotional expressions. For analysis, a local secure pipeline was developed in which GPT-4o-mini was deployed for NLP tasks. The pipeline comprised three analytical steps: (1) N-gram frequency analysis to identify common language patterns, (2) sentiment analysis of positive and negative expressions per QoL domains, and (3) topic modelling to group semantically related terms and map them to QoL domains. Outputs were iteratively refined through prompt engineering and validated through expert review for coherence and contextual relevance. Results: A total of 5,622 narratives (50,106 words) from 151 observed people living with dementia were analysed. The narratives were short, averaging 8.5 words per narrative. N-gram frequency analysis identified frequent documentation of passive activity (sits at the table) in limited indoor settings (living room). Emotional well-being was often described in positive terms (smiles, laughs), whereas explicitly negative expressions (cries, distress) occurred less frequently. Weighted sentiment analysis showed that, although fewer in number, negative expressions carried a stronger intensity, resulting in an overall predominance of negative sentiment across all QoL domains. Topic modelling identified eight coherent clusters, most of which mapped onto multiple QoL domains, underscoring QoL’s multidimensionality. Conclusions: NLP identified predominantly passive activities in little varying indoor settings, yet people living with dementia were often described with positive affect, underscoring both the complexity of QoL in dementia and the influence of documentation practices. In practice, NLP could help translate everyday care documentation into actionable information that guides more responsive, person-centred dementia care.
A Personalized Exercise Assistant using Reinforcement Learning (PEARL): Results from a four-arm Randomized-controlled Trial
Date Submitted: Jan 27, 2026

Open Peer Review Period: Jan 28, 2026 - Mar 25, 2026
Peer Review Me
Background: Consistent physical inactivity among adults and adolescents poses a major global health challenge. Mobile health (mHealth) interventions, particularly Just-in-Time Adaptive Interventions (JITAIs), offer a promising avenue for scalable and personalized physical activity promotion. However, developing and evaluating such adaptive interventions at scale, while integrating robust behavioral science, presents methodological hurdles. Objective: The PEARL study aimed to assess the feasibility and effectiveness of a reinforcement learning (RL) algorithm, informed by health behavior change theory (COM-B), to personalize the content and timing of physical activity nudges via the Fitbit app compared to fixed and random nudging strategies, and to a control group with no nudges. Methods: We conducted a large-scale, four-arm randomized controlled trial (RCT) enrolling 13,463 Fitbit users. Participants were randomized to: (1) Control (no nudges); (2) Random (random content/timing); (3) Fixed (logic based on baseline COM-B survey); and (4) RL (adaptive algorithm). The primary outcome was the change in average daily step count from baseline to 2 months. Secondary outcomes included user engagement and survey responses regarding capability, opportunity, and motivation. Results: 7,711 participants were included in the primary analysis (mean age 42.1 years; 86.3% female). At 1 month, the RL group showed a significant increase in daily steps compared to Control (+296 steps, P<.001), Random (+218 steps, P=.005), and Fixed (+238 steps, P=.002) groups. At 2 months, the RL group sustained a significant increase against the Control (+210 steps, P=.01). Generalized estimating equation (GEE) models confirmed a sustained significant increase in the RL group (+208 steps, P=.002). In exit surveys, the RL group reported higher favorable responses regarding nudge customization (37%) compared to other groups. Conclusions: This study demonstrates the feasibility and early efficacy of using RL to personalize digital health nudges at scale. While long-term retention remains a challenge, the adaptive approach outperformed static behavioral rules, showcasing the promise of dynamic personalization in a real-world mHealth setting. Clinical Trial: doi: 10.17605/OSF.IO/TW7UP
An AI-Based Smart Nursing Ward Model for Enhanced Recovery After Thoracic Surgery: A Historical Controlled Trial
Date Submitted: Jan 27, 2026

Open Peer Review Period: Jan 28, 2026 - Mar 25, 2026
This manuscript needs more reviewers Peer Review Me
Background: Due to surgical trauma and the impact of the disease, patients undergoing thoracic surgery often experience a series of postoperative symptom burdens, which affect their recovery. Traditional perioperative care has drawbacks. Objective: To evaluate the impact of an AI-based personalized smart nursing ward management model on postoperative recovery outcomes in patients undergoing thoracic surgery. Methods: According to patients' admission sequence, patients who met the inclusion criteria were divided into a control group (n=303) and an intervention group (n=240). The control group adopted the routine nursing mode of general wards, while the intervention group implemented the AI-based personalized smart nursing ward management model on the basis of the routine nursing provided to the control group. Results: Data from all 543 enrolled patients were analyzed. Compared with the control group (n=303) receiving routine care, the intervention group (n=240) had a significantly shorter median hospital stay (9.0 days vs 12.0 days) and chest tube indwelling time (5.0 days vs 7.0 days), as well as lower total hospitalization costs (¥61,032.87 vs ¥72,859.90) (all P < .001). The postoperative pulmonary complication rate was also significantly lower in the intervention group (3.8% vs 12.2%, P < .001). Furthermore, patient satisfaction was higher (98.53% vs 91.28%), and nurses' daily step count was reduced (12,359.52 vs 18,692.74 steps) in the intervention group (both P < .001) Conclusions: The AI-based smart nursing model effectively promotes postoperative recovery and offers an innovative management approach for thoracic surgery.
The Impact of AI-driven tools on Breastfeeding Outcomes: Systematic Review and Meta-Analysis
Date Submitted: Jan 27, 2026

Open Peer Review Period: Jan 27, 2026 - Mar 24, 2026
This manuscript needs more reviewers Peer Review Me
Background: The current global breastfeeding landscape presents both progress and challenges. The rise of artificial intelligence (AI) has emerged as a promising new strategy to enhance breastfeeding practices. Objective: To evaluate the impact of AI-driven tools on breastfeeding practices and outcomes. Methods: We searched PubMed, Web of Science, Cochrane Library, Embase, and CINAHL from inception to October 2025 for randomized controlled trials (RCTs) and quasi-experimental studies. The risk of bias in individual studies was assessed using the Cochrane risk of bias tool for randomized controlled trials (RoB 2) and the risk of bias in non-randomized studies of interventions tool (ROBINS-I). Data were extracted independently by two reviewers and combined using Review Manager 5.4 and R-4.5.2 to obtain pooled results via random-effects models, with subgroup analyses based on intervention type, timing of implementation, population characteristics, and country income level. Results: This review included 39 studies with 10735 participants from 15 countries. AI-driven tools increased exclusive breastfeeding (EBF) rates (at <3 months: relative risk [RR] 1.21, 95% CI 1.13-1.29; P<.001, I²=56%; at 3–6 months: RR 1.54; 95% CI 1.29-1.85; P<.001, I2=69%; at ≥6 months: RR 1.47, 95% CI 1.22-1.77, P<.001, I2=78%), breastfeeding self-efficacy (BSE) (standardized mean difference [SMD] 0.41, 95% CI: 0.04-0.78; P=.03, I2=93%), and breastfeeding knowledge (SMD 1.69; 95% CI: 0.54-2.84, P=.004, I2=98%). Conclusions: AI-driven tools effectively increase exclusive breastfeeding rates, breastfeeding self-efficacy, and breastfeeding knowledge. Future studies are needed to provide stronger evidence about clinical care interventions. Clinical Trial: PROSPERO CRD420251233352; https://www.crd.york.ac.uk/PROSPERO/view/CRD420251233352
‘Carer-as-Sensor’ in Decentralized Trials: Passive Sensing Data Accuracy, Parkinson’s, and Observers
Date Submitted: Jan 26, 2026

Open Peer Review Period: Jan 27, 2026 - Mar 24, 2026
Peer Review Me
Background: Parkinson's clinical trials depend on patient-reported outcomes, often overlooking the vital role of carers in collaboratively tracking symptom progression. This is a potential limitation for decentralized clinical trials aimed at measuring real-world, free-living symptoms with sensors, such as wearables and cameras in the home. Objective: The primary objective of our study was to inform the design of a multimodal sensor platform for decentralised clinical trials. Methods: A qualitative study was conducted with an inductive approach using semistructured interviews with a cohort of people with Parkinson's. Results: This study of 18 participants (14 people diagnosed with Parkinsons, 4 spouses/informal carers) found that carers, household members, and peers take a central role in helping people with Parkinson’s make sense of and manage their symptoms. Our participants relied on others to help with completing tasks and understanding their symptoms through comparison to others, using their Carer-as-Sensor. While our participants mostly viewed their relationships with others positively, this could lead to negative impacts on oneself. Participants could prioritize household needs over their health by not taking medication or risking a chance of falling, or even avoiding being around others to prevent their Parkinson's being on display to reduce carer burden. Conclusions: Our results argue that an 'outsider' and 'insider' approach to reporting symptoms can identify symptoms that are not noticed by people with Parkinson's, or withheld from carers. These form household-centred recommendations more broadly for the design of tracking and annotation strategies in the context of decentralised clinical trials and new innovations in AI to support the capture of nuanced and subtle changes in symptoms.
Working conditions in hospital associated with perceived usability of an established and a newly adopted electronic health record system: a repeated cross-sectional study of medical doctors and nurses
Date Submitted: Jan 26, 2026

Open Peer Review Period: Jan 26, 2026 - Mar 23, 2026
Peer Review Me
Background: Poor usability of electronic health record (EHR) systems is associated with workflow inefficiencies, patient safety risks, and burnout among health professionals. Health professionals are exposed to various work conditions, but the associations with perceived EHR usability are unknown. Objective: To examine whether medical doctors’ and nurses’ usability perceptions of an established electronic patient record (EPR) system and a newly adopted EHR system differ by work schedules, type of employment (full-time or part-time), work pace, and number of clinical settings. Methods: In the established EPR system, nurses were more likely to report low ease-of-use if they worked three-shift rotations (odds ratio [OR] 2.21, 95% CI: 1.34-3.65 vs. daytime), part-time (OR 1.63, 95% CI:1.20-2.21 vs. full-time), or faced very high work pace (OR 1.25, 95% CI: 1.42-3.58 vs. low work pace). Following EHR adoption, medical doctors and nurses reported a median (IQR) SUS score of 17.5 (7.5-32.5) and 32.5 (17.5-50.0), respectively. Both medical doctors and nurses reported lower SUS scores when they faced very high work pace compared to low work pace, with mean differences of -8.56, 95% CI (-12.60 to -4.51) and -8.43 (95% CI: -14.10 to -2.76), respectively. Part-time employed nurses reported 2.72 points (95% CI: -4.93 to -0.52) lower SUS score than full-time employed, and nurses working across 3-4 clinical settings reported 2.99 points (95% CI: -5.52 to -0.46) lower SUS score than nurses working across 1-2 settings. Results: 543 medical doctors and 1,869 nurses participated. In the established EPR system, nurses were more likely to report low ease-of-use if they worked three-shift rotations (odds ratio [OR] 2.21, 95% CI: 1.34-3.65 vs. daytime), part-time (OR 1.63, 95% CI:1.20-2.21 vs. full-time), or faced very high work pace (OR 1.25, 95% CI: 1.42-3.58 vs. low). Following EHR adoption, medical doctors and nurses reported a median (IQR) SUS score of 17.5 (7.5-32.5) and 32.5 (17.5-50.0), respectively. Both medical doctors and nurses reported lower SUS scores when they faced very high work pace compared to low work pace, with mean differences of -8.56, 95% CI (-12.60 to -4.51) and -8.43 (95% CI: -14.10 to -2.76), respectively. Part-time employed nurses reported 2.72 points (95% CI: -4.93 to -0.52) lower SUS scores than full-time employed, and nurses working across 3-4 clinical settings reported 2.99 points (95% CI: -5.52 to -0.46) lower SUS score than nurses working across 1-2 settings. Conclusions: These findings suggest that system usability perceptions differ by work conditions, particularly work pace. Although these results could guide tailored implementation strategies, ensuring adequate EHR usability architecture is likely to be as important.
We Are in control: A survey study of public perceptions of a personal data space for citizen-centered health data governance
Date Submitted: Jan 23, 2026

Open Peer Review Period: Jan 25, 2026 - Mar 22, 2026
Peer Review Me
Background: Personal Data Spaces (PDS) are increasingly promoted as digital infrastructures that enable citizen participation in health data governance by strengthening transparency and individual control over personal health data. Despite growing policy and technological attention, empirical evidence remains limited on whether citizens view PDS as acceptable and desirable governance instruments, how they evaluate different types of data and purposes of data use, and which factors shape public support. Objective: The objective of this study was to examine how citizens evaluate We Are, a proposed citizen-centered Personal Data Space model in Flanders, Belgium, and to assess overall support, reasons for endorsement, preferences for control versus transparency, acceptability of storing different types of health data, and acceptance of different purposes of data use. Methods: We conducted an online survey among adults aged 18-79 years in Flanders, Belgium (N=1,041). The sample was quota-based and representative for gender, age, education, province, and urbanization level. Participants evaluated the We Are model after reading a description. Measures included overall evaluation of the model, reasons for support, preferences for transparency and control, willingness to store medical versus lifestyle data, and willingness to share data across vignette-based scenarios varying purpose of use and recipient type. Data were analyzed using t-tests, linear regression, and mixed models with repeated measures. Results: Overall evaluations of We Are were moderately positive (Mean 2.51 on a 1-4 scale) and did not differ significantly from the scale midpoint (t(1040)=0.70, P=.24). Sociodemographic characteristics explained little variance in support, whereas understanding of the We Are model and psychographic factors substantially increased explained variance (R² increased from .03 to .24). Higher trust in technology was positively associated with support, while stronger privacy attitudes and privacy-related fears were negatively associated. Respondents valued control more strongly than transparency for both general personal data (t(1040)=-10.37, P<.001) and health data (t(1040)=-12.47, P<.001). Medical data were considered more acceptable to store than lifestyle data (Δ=0.38, P<.001). Both personal and public benefits motivated support, but commercial data use reduced willingness to share, particularly when framed around individual gain rather than collective benefit. Conclusions: Citizens view PDS as potentially valuable instruments for health data governance, but their support is conditional and shaped by understanding and psychographic factors rather than by sociodemographic factors. PDS can contribute to meaningful citizen participation only when technological features are embedded in governance arrangements that provide real agency, credible safeguards, and demonstrable public value.
Secondary Use of Health Data as a Core Capability in Medical Informatics
Date Submitted: Jan 23, 2026

Open Peer Review Period: Jan 25, 2026 - Mar 22, 2026
Peer Review Me
The European Health Data Space represents a landmark regulatory success in enabling the secondary use of health data for research, innovation, and policy within a trusted and interoperable framework. This Viewpoint discusses how strategic alliances—such as UNINOVIS—and translational research ecosystems, with IBIMA as a driving hub, operationalize this regulation by aligning governance, infrastructure, and applied data science. Together, they illustrate how European health data policy can be translated into real-world evidence generation and sustained clinical and societal impact.
Development of a novel musculoskeletal hypothesis in the ADVANCE cohort: application of sparse Group Factor Analysis methodology
Date Submitted: Jan 23, 2026

Open Peer Review Period: Jan 25, 2026 - Mar 22, 2026
Peer Review Me
Background: Musculoskeletal conditions are a leading global cause of disability, yet the factors influencing long-term musculoskeletal health, particularly following trauma, remain incompletely understood. Machine learning could be applied to identify previously unknown patterns in large-scale multimodal datasets. Objective: Test the ability of a new sparse Group Factor Analysis method to uncover hidden patterns in large-scale multi-modal datasets and generate testable, clinically relevant hypotheses. Methods: This study applies sparse Group Factor Analysis, a hierarchical unsupervised machine learning method, to the ADVANCE cohort—a longitudinal dataset of 1445 UK Afghanistan War servicemen—to identify latent structures in multimodal clinical data. Study 1 validated the approach by rediscovering known group-level patterns between combat-injured and non-injured participants, including poorer outcomes in pain, mobility, and bone health among those with lower limb loss. Study 2 explored the Injured, non-amputee subgroup without prespecified labels to identify new hypothesis-generating clusters that could subsequently be tested using standard hypothesis testing methods. Results: A subgroup of 125 individuals with worse musculoskeletal outcomes was uncovered. This group had greater body mass, higher injury severity, and a higher prevalence of head injury. These findings led to a novel hypothesis: that head injury, including potential traumatic brain injury, is associated with long-term musculoskeletal deterioration. This hypothesis is supported by literature in both athletic and military populations and will be tested in follow-up analyses. Conclusions: Our findings demonstrate how sparse Group Factor Analysis, combined with clinical insight, can uncover hidden patterns in large-scale datasets and generate testable, clinically relevant hypotheses that inform prevention, treatment, and rehabilitation strategies.
Design Requirements for Web-Based Digital Therapeutics in Chronic Kidney Disease: A Mixed-Methods Study Integrating Patient and Clinician Perspectives
Date Submitted: Jan 23, 2026

Open Peer Review Period: Jan 25, 2026 - Mar 22, 2026
Peer Review Me
Background: Chronic kidney disease (CKD) requires sustained self-management involving complex medication regimens, dietary restrictions, and symptom monitoring. These demands pose substantial challenges to medication adherence and daily disease management. Digital therapeutics (DTx) have the potential to support CKD self-management; however, CKD-specific design requirements informed by both patient and clinician perspectives remain insufficiently explored. Objective: This study aimed to identify key design requirements for CKD-specific digital therapeutics by integrating patient-reported self-management challenges with nephrologist perspectives on clinical needs and implementation considerations. Methods: A convergent mixed-methods study was conducted at a tertiary academic hospital. Quantitative data were collected through a structured survey of 60 adults with non–dialysis-dependent CKD to assess medication adherence challenges, digital health needs, and age-related differences. Qualitative data were obtained through focus group interviews with 19 nephrologists and analyzed using thematic analysis. Quantitative and qualitative findings were integrated to identify convergent priorities and design implications for CKD-specific DTx. Results: None of the patients reported prior experience with CKD-specific digital health applications, although 70% perceived a need for such tools. Younger patients (<60 years) expressed significantly greater interest in digital therapeutics than older patients (83.9% vs 55.2%, P=.015). Common patient-reported challenges included managing multiple medications (36.7%), irregular medication schedules (30.0%), and difficulty understanding medication timing relative to meals (28.3%). Nephrologists emphasized the importance of personalized medication reminders, comprehensive medication information (including adverse effects and nephrotoxic risks), symptom-monitoring systems, and features supporting dietary and lifestyle management. Integration findings highlighted the need for user-friendly, age-sensitive interfaces, data security, and clinically actionable feedback mechanisms. Conclusions: By integrating patient and nephrologist perspectives, this mixed-methods study identifies key design considerations for CKD-specific digital therapeutics. These findings provide formative, design-informed evidence to guide the early development of patient-centered and clinically relevant digital therapeutics for CKD.
Co-Producing a Coach-Supported Digital Intervention to Promote Cognitive Health in underserved Older Adults: ENHANCE (TailorEd iNtervention for brain HeAlth aNd Cognitive Enrichment)
Date Submitted: Jan 22, 2026

Open Peer Review Period: Jan 23, 2026 - Mar 20, 2026
Peer Review Me
Background: Digital multidomain interventions hold promise for dementia risk reduction; however, populations at higher dementia risk, including those experiencing socioeconomic and educational disadvantage, remain underrepresented in trials, and engagement with digital interventions often declines over time. Co-production and blended models that combine digital tools with human support may improve reach, acceptability, usability, and sustained engagement. Designing interventions that are usable and acceptable for individuals facing structural, educational, or digital barriers (underserved groups) is therefore likely to produce solutions that are both accessible and scalable for the wider older adult population. Objective: To describe the co-production process used to develop ENHANCE—a coach-supported digital intervention targeting ten modifiable dementia risk factors in older adults from underserved groups—and report key outputs and lessons learned for equitable digital prevention design. Methods: We co-produced ENHANCE between July 2023 and February 2025 using a multi-stage development process guided by the Medical Research Council framework for complex interventions and the Double Diamond design model. The Person-Based Approach informed user-centred guiding principles (key design objectives), while behaviour change content was operationalised using behavioural change theories. Co-production followed four phases. The Discovery phase explored barriers to engagement with existing digital materials and identified candidate components for each dementia risk-factor module. The Define phase translated these insights into guiding principles and blueprints of each risk-factor module integrated with behavioural change components. The Design phase involved iterative co-production and usability testing of prototypes. The Delivery phase evaluated a high-fidelity prototype through a one-week usability study with coaching support. Contributors included 162 research participants recruited from underserved community settings, 33 patient and public involvement contributors, and 4 human–computer interaction experts. Throughout development, co-production focused on reducing literacy, digital confidence, and cultural barriers to maximise usability across diverse older adult populations. Results: Co-production produced (1) evidence-informed module strategies for targeted dementia risk factors; (2) a set of guiding principles to ensure low-literacy, culturally relevant, and accessible content, supporting both equity of access and wider population usability; (3) a meadow-themed app integrating tailored check-ins, educational videos, cognitive training games, and in-app messaging; and (4) a structured coaching model, including onboarding, brief follow-up, and accompanying coaching manuals. Iterative testing and refinement improved navigation, simplified language, reduced text burden, and ensured the use of familiar and accessible game formats, resulting in a feasibility-ready prototype. Conclusions: : ENHANCE is a co-produced, coach-supported digital intervention designed to be accessible for underserved older adults at increased dementia risk, with design features intended to support accessibility, engagement, and scalability across the wider ageing population. The development process illustrates how integrating co-production with behavioural science and usability methods can support principled intervention design for equitable digital dementia prevention. Clinical Trial: ISRCTN17060879
Natural Language Processing for Electronic Health Records in Scandinavian Languages: Norwegian, Swedish, and Danish
Date Submitted: Jan 21, 2026

Open Peer Review Period: Jan 22, 2026 - Mar 19, 2026
Peer Review Me
Background: Clinical natural language processing (NLP) refers to computational methods for extracting, processing, and analyzing unstructured clinical text data, and holds a huge potential to transform healthcare. The advancement of deep learning, augmented by the recent emergence of transformers, has been pivotal to the success of NLP across various domains. This success is largely attributed to the end-to-end training capabilities of deep learning systems. Further, advances in instruction tuning have enabled Large Language Models (LLMs) like OpenAI’s GPT to perform tasks described in natural language. While these advancements have dramatically improved capabilities in processing languages like English, these benefits are not always equally transferable to under-resourced languages. In this regard, this review aims to provide a comprehensive assessment of the state-of-the-art NLP methods for the mainland Scandinavian clinical text, thereby providing an insightful overview of the landscape for clinical NLP within the region. Objective: The study aims to perform a systematic review to comprehensively assess and analyze the state-of-the-art NLP methods for the Scandinavian clinical domain, thereby providing an overview of the landscape for clinical language processing within the Scandinavian languages across Norway, Denmark, and Sweden. Generally, the review aims to provide a practical outline of various modeling options, opportunities, and challenges or limitations, thereby providing a clear overview of existing methodologies and potential avenues for future research and development. Methods: A literature search was conducted in various online databases, including PubMed, ScienceDirect, Google Scholar, ACM Digital Library, and IEEE Xplore between December 2022 and March 2024. The search considers peer-reviewed journal articles, preprints, and conference proceedings. Relevant articles were initially identified by scanning titles, abstracts, and keywords, which served as a preliminary filter in conjunction with inclusion and exclusion criteria, and were further screened through a full-text eligibility assessment. Data was extracted according to predefined categories, established from prior studies and further refined through brainstorming sessions among the authors. Results: The initial search yielded 217 articles. The full-text eligibility assessment was independently carried out by five of the authors and resulted in 118 studies, which were critically analyzed. Any disagreements among the authors were resolved through discussion. Out of the 118 articles, 17.9% (n=21) focus on Norwegian clinical text, 61% (n=72) on Swedish, 13.5% (n=16) on Danish, and 7.6% (n=9) focus on more than one language. Generally, the review identified positive developments across the region despite some observable gaps and disparities between the languages. There are substantial disparities in the level of adoption of transformer-based models. In essential tasks such as de-identification, there is significantly less research activity focusing on Norwegian and Danish compared to Swedish text. Further, the review identified a low level of sharing resources such as data, experimentation code, pre-trained models, and the rate of adaptation and transfer learning in the region. Conclusions: The review presented a comprehensive assessment of the state-of-the-art Clinical NLP in mainland Scandinavian languages and shed light on potential barriers and challenges. The review identified a lack of shared resources, e.g., datasets and pre-trained models, inadequate research infrastructure, and insufficient collaboration as the most significant barriers that require careful consideration in future research endeavors. The review highlights the need for future research in resource development, core NLP tasks, and de-identification. Generally, we foresee that the findings presented will help shape future research directions by shedding some light on areas that require further attention for the rapid advancement of the field in the region
Photoplethysmography in Healthcare: An Umbrella Review of Clinical Applications, Validation, and Evidence Gaps
Date Submitted: Jan 20, 2026

Open Peer Review Period: Jan 21, 2026 - Mar 18, 2026
Peer Review Me
Background: Photoplethysmography (PPG) is widely used in consumer and clinical devices for heart rate, rhythm, sleep, respiratory, and hemodynamic monitoring. However, rapid expansion of applications has produced a fragmented evidence base with heterogeneous methods and variable validation quality. Objective: To synthesize and critically appraise systematic reviews evaluating PPG-based applications in healthcare, map major clinical domains and methodological practices, and identify limitations and priorities for future research. Methods: A protocolized umbrella review (PROSPERO CRD420251015845) was conducted across six databases. Systematic reviews and meta-analyses involving human PPG applications were included. Screening, extraction, and AMSTAR-2 quality assessment were performed in duplicate following PRISMA-S and PRIOR guidelines. Results: Fifty-nine systematic reviews were included. PPG showed consistent accuracy for resting heart-rate monitoring and strong performance for opportunistic atrial fibrillation screening when paired with confirmatory ECG. HRV estimation, stress monitoring, sleep assessment, neonatal and maternal monitoring, and metabolic applications showed emerging but heterogeneous evidence. Cuffless blood pressure estimation remains limited by calibration dependence, motion sensitivity, and poor generalizability. Remote PPG (rPPG) achieves good accuracy under controlled lighting but degrades with motion, light variability, and darker skin pigmentation. Across domains, performance was typically higher in controlled environments and attenuated in free-living settings. Common methodological limitations included small samples, inconsistent reporting of device and preprocessing details, lack of external validation, algorithm opacity, and underrepresentation of diverse populations. Conclusions: PPG is approaching clinical maturity for atrial fibrillation screening and resting heart-rate monitoring, while other applications remain earlier in development. Safe integration into practice requires confirmatory ECG for rhythm abnormalities, awareness of bias sources, and adherence to transparent reporting. Future progress depends on multicenter longitudinal studies, real-world validation, diverse benchmark datasets, standardized metrics, and improved reproducibility across devices and algorithms. PPG holds promise as a scalable component of digital health infrastructure when developed and evaluated with methodological rigor. Clinical Trial: PROSPERO Registration: CRD420251015845
Adolescent’s Perspectives and Experiences with Dietary Mobile Health Apps: A Scoping Review
Date Submitted: Jan 19, 2026

Open Peer Review Period: Jan 20, 2026 - Mar 17, 2026
Peer Review Me
Background: Smartphones play a central role in adolescents’ daily lives, making dietary mobile health (mHealth) apps—tools that provide nutrition education and tracking eating behaviors—a promising avenue for influencing dietary habits. While numerous studies have examined the impact of mHealth apps on diet, few have investigated adolescents’ perspectives and experiences with these tools. Objective: This scoping review aimed to synthesize the evidence and map the research gaps on adolescents’ perspectives (positive or negative) and experiences (attitudes, barriers, and facilitators) of using dietary mHealth apps on their smartphones. Methods: A systematic scoping review was conducted according to the 5-stage framework by Arksey and O’Malley. Articles that included mixed-methods studies that focused on adolescents (10-19 years of age) reporting perspectives (positive or negative) and experiences (attitudes, barriers, and facilitators) related to dietary apps use were searched across: PsycINFO, Embase, Medline, Web of Science and CINAHL for studies that were published from 2012 until 2023. Articles that were not specific to diet, not research studies, and not written in English were omitted. Results: Of the 590 abstracts screened, 17 studies met the eligibility criteria. Ten studies assessed the usability, feasibility and acceptability of standalone or multi-component dietary mHealth apps, while nine examined app likability and effectiveness. Thematic analysis revealed seven overarching themes: (1) Technical Functionality and Usability; (2) Appreciation of Nutritional Education and Content Depth; 3) Importance of Social Connection, Feedback and Support; (4) Values of Entertainment and Gamification; (5) Significance of Personal Goals, Motivation and Tracking; (6) Interest for Simple Design and Interface; and (7) Perceived Effectiveness of Dietary mHealth Apps. Positively perceived features included food identification, tracking and gamification elements. Commonly barriers included technical difficulties, tracking inaccuracies, complex information delivery and limited social engagement. Facilitators to app use were ease of navigation, targeted information, social interaction, rewards and goal setting. Suggested improvements focused on tracking accuracy, interface design, feedback mechanisms and notification options. Overall, adolescents perceived effective apps to as those that raised awareness of eating habits and support improvements in dietary intake. Conclusions: This scoping review highlights that adolescents’ experiences with dietary mHealth apps are shaped by technical functionality, usability, social engagement, personalization, and gamification. While these features can enhance engagement, barriers such as tracking inaccuracies, technical issues, and limited social interaction reduce app effectiveness. Understanding these perspectives is critical for designing apps that are not only informative but also appealing and sustainable for adolescent users.
AI-Powered Health Chatbot and Plate Recognition for Weight Loss and Health Literacy in Adults With Overweight: Quasi-Experimental Case-Control Study
Date Submitted: Jan 16, 2026

Open Peer Review Period: Jan 18, 2026 - Mar 15, 2026
This manuscript needs more reviewers Peer Review Me
Background: Obesity remains a pressing global health issue. Research suggests that better health literacy can support obesity management. This study tested digital interventions combining healthy eating guidelines with AI and mobile tools, including a ChatGPT-powered Line chatbot for daily education and an AI food plate recognition system for calorie tracking and meal suggestions. Objective: This study aims to evaluate the efficacy of an integrated digital intervention, combining YOLOv5-based AI food plate recognition and a ChatGPT-powered LINE chatbot, on weight reduction (BMI) and health literacy among overweight and obese adults. Methods: The study used a quasi-experimental design-intervention case-control design. Both the case and intervention groups received basic health education through app notifications and used an AI food plate recognition tool to estimate their nutritional intake. Only the intervention group could access an AI weight-loss chatbot for timely suggestions. Questionnaire data were collected from users at several points during the intervention. Results: Eighty participants were enrolled. The intervention group demonstrated significantly greater reductions in BMI (β = −1.32; 95% CI, −1.56 to −1.09; P < .001) and improvements in health literacy (β = 4.71; 95% CI, 3.86 to 5.56; P < .001) versus controls. Physical activity (step count β = 1,926.5; 95% CI, 1,209.3 to 2,643.7; P < .001) and weekly exercise time (β = 0.56; 95% CI, 0.21 to 0.92; P = .002) also increased, while late-night snacking decreased (β = −0.45; 95% CI, −0.81 to −0.08; P = .017). The intervention group consistently outperformed the control group across key health measures. However, the AI chatbot alone lacked significant effects on primary outcomes. Conclusions: This integrated digital intervention effectively promotes weight loss and health literacy. Given the strong short-term efficacy, future research should employ randomized designs, larger sample sizes, and longer follow-ups to establish long-term weight maintenance and address potential influences such as the Hawthorne effect. It also highlights the need to further develop interactive, personalized health education tools and optimize AI food plate recognition systems to improve health literacy and weight management.
Relationship Between Awareness, Knowledge, and Anxiety to Cyber Behavior During Times of Crisis: Cross-Sectional Study
Date Submitted: Jan 15, 2026

Open Peer Review Period: Jan 16, 2026 - Mar 13, 2026
Peer Review Me
Background: During crisis, individuals increasingly rely on digital platforms for information, communication, and emotional support. Cyber behavior - which encompasses online engagement, security practices, and information sharing is shaped by cognitive and emotional factors such as awareness, knowledge, and anxiety. Understanding these relationships is crucial for promoting digital resilience and well-being during wartime and other large-scale emergencies. Objective: This study sought to examine how cybersecurity awareness, knowledge, and crisis-related anxiety influence cyber behavior and well-being during a national crisis. Drawing on the Protection Motivation Theory (PMT), the study further explored how cognitive and affective responses interact to shape individuals’ online engagement patterns and subsequent psychological outcomes. Methods: A cross-sectional online survey was conducted among 512 Israeli adults aged 18-65 during the ongoing war period (January 2024). Standardized psychometric instruments were used, including the WHO Well-Being Index, DASS-21 Stress subscale, and the Connor-Davidson Resilience Scale (CD-RISC-10). Media engagement was assessed across ten distinct digital activities. Data analysis employed a comprehensive approach, including cluster analysis, exploratory factor analysis (EFA), regression modeling, and path analysis. Results: Cluster analysis yielded two distinct segments: a high media engagement cluster and a low media engagement cluster. Participants in the high-engagement group reported significantly higher stress levels and greater utilization of digital media for news consumption, social networking, and charitable donations (p < .001). Furthermore, exploratory factor analysis revealed three salient dimensions of media usage: active, passive, and institutional. Path analysis indicated that stress was a positive predictor of all forms of media engagement. In predicting well-being, active media use (β = .12, p = .006) and resilience (β = .30, p < .001) were positively associated, whereas passive media use demonstrated a marginally negative association (β = -.08, p = .078). Conclusions: Cyber behavior during wartime is demonstrably influenced by both cognitive awareness and emotional stress. Specifically, while anxiety and stress tend to increase online engagement, overexposure to digital media may simultaneously well-being. Therefore, enhancing cyber literacy, cultivating emotional resilience, and promoting balanced media consumption are crucial strategies that can mitigate psychological distress and significantly strengthen digital resilience during crises.
Intelligent Identification of Pressure Injuries Using Multi-modal Deep Learning: A Scoping Review
Date Submitted: Jan 13, 2026

Open Peer Review Period: Jan 14, 2026 - Mar 11, 2026
Peer Review Me
Background: The global prevalence of pressure injuries is high and can cause severe infections, or death. Accurate staging is vital for effective intervention. Deep learning streamlines pressure injury assessment, enhances efficiency, and yields practical, accurate results. This scoping review summarized research on multi-modal deep learning for intelligent pressure ulcer recognition. Objective: It systematized models, training methods, and outcomes to identify the best systems for rapid detection and automated staging of pressure ulcers. Enhancing the timeliness, accuracy, and objectivity of diagnosis is the goal. Methods: We searched the following databases and sources: PubMed, the Cochrane Library, IEEE Xplore, and Web of Science. The scoping review was conducted in accordance with the JBI Scoping Review Methodology Group’s guidance and reported following Preferred Reporting Items for Systematic Reviews and Meta-Analyses—Extension for Scoping Reviews guidelines. The study protocol was registered with the International Prospective Registry of Systematic Reviews (PROSPERO) on 12 December 2025 (registration number: CRD420251251573). Results: 15 articles were included: 26 models were involved, including AlexNet; VGG16; ResNet18; DenseNet121; SE-Swin Transformer; Cascade R-CNN; vision transformer (ViT); ConvNextV2; EfficientNetV2; Meta Former; TinyViT; CCM; BCM; ResNext + wFPN; SE-Inception; Mask-R-CNN; SE-ResNext101; Faster R-CNN; ResNet50; ResNet152; DenseNet201; EfficientNet-B4; YOLOv5; Inception-ResNet-v2; InceptionV3; MobilNetV2. The training methodology for intelligent pressure ulcer recognition models involves establishing an image database, processing images, and constructing the recognition model. Different models exhibit varying accuracy rates in staging pressure ulcers, with overall accuracy fluctuating between 54.84% and 93.71%. The DenseNet121 model achieved the highest recognition accuracy of 93.71%, while VGG16 was the most widely applied. The same model demonstrated significant variations in recognition accuracy across different studies. Conclusions: The multi-modal and deep learning-based intelligent recognition model for pressure injuries demonstrates high overall accuracy, enabling rapid automated staging of such injuries. Future research may explore optimized intelligent assistance systems to enhance the accuracy, objectivity, and efficiency of pressure injury diagnosis.
Radiomics-based AI for predicting and prognosticating VETC in hepatocellular carcinoma: a systematic review and meta-analysis
Date Submitted: Jan 6, 2026

Open Peer Review Period: Jan 7, 2026 - Mar 4, 2026
Peer Review Me
Background: Vessels encapsulating tumor clusters (VETC) are a distinct vascular pattern associated with aggressive behavior and poor prognosis in hepatocellular carcinoma (HCC). Preoperative identification of VETC is crucial for treatment planning but currently relies on invasive pathological examination. Radiomics-based artificial intelligence (AI) offers a potential noninvasive solution, yet evidence regarding its diagnostic and prognostic accuracy remains synthesized. Objective: We aimed to systematically evaluate the diagnostic performance and prognostic value of radiomics-based AI models for noninvasively predicting VETC status in patients with HCC. Methods: We systematically searched PubMed, Embase, Web of Science, and the Cochrane Library for studies published up to July 11, 2025. Studies developing or validating AI models using medical imaging (contrast-enhanced MRI [CEMRI], contrast-enhanced CT [CECT], contrast-enhanced ultrasound [CEUS], or [18F]FDG PET/CT) to predict pathologically confirmed VETC status in HCC patients were included. Study quality was assessed using the PROBAST+AI tool. Diagnostic accuracy (sensitivity, specificity, AUC) and prognostic value for early recurrence (hazard ratio [HR]) were pooled using random-effects models. Results: Fourteen studies involving 729 patients in internal and 581 in external validation cohorts were analyzed. AI models based on CEMRI demonstrated the highest diagnostic accuracy, with a pooled AUC of 0.87 (95% CI 0.84-0.90), sensitivity of 0.82 (95% CI 0.75-0.88), and specificity of 0.77 (95% CI 0.71-0.82). Models using other modalities (CECT, PET/CT, CEUS) showed moderate to good performance. Prognostically, HCC patients classified as VETC-positive by AI had a significantly higher risk of early recurrence (pooled HR 2.34, 95% CI 1.93-2.84). Conclusions: Radiomics-based AI models, particularly those using CEMRI, are promising for the noninvasive prediction of VETC and offer valuable prognostic stratification for early recurrence risk in HCC. However, significant heterogeneity and the retrospective nature of current studies limit the strength of evidence. Prospective, multicenter validation is required to confirm clinical utility. Clinical Trial: PROSPERO CRD420251167155
Digital Transformation in Healthcare: Are we on the right track?
Date Submitted: Dec 26, 2025

Open Peer Review Period: Dec 29, 2025 - Feb 23, 2026
Peer Review Me
The healthcare digital transformation is gaining increasing notoriety, despite the observed challenges in its implementation. The envisioned benefits together with the growing need for better healthcare are motivating academia, organizations, regulatory agencies, and governments to develop more effective digital healthcare solutions. Through extensive debates among the authors and supported by a narrative literature review, this paper discusses how digital transformation is being conducted in the healthcare sector. Our discussion relies on the concepts from the sociotechnical systems theory categorizing it according to three social (people, culture, and goals) and three technical (processes/procedures, infrastructure, and technology) dimensions. Overall, we argue that both social and technical dimensions present elements that have been either encouraging or discouraging the progress of healthcare digital transformation. The identification of current trends on such (on- and off-track) elements allowed the formulation of propositions for future testing and validation. This approach can help the establishment of better government policies, foster private initiatives, and shift regulatory guidelines to support a successful digital transformation in health systems. Lastly, from a research perspective, we outline some opportunities for further interdisciplinary investigation in the field, promoting advances in the understanding of healthcare digital transformation.
Commercialization of Online Cancer Information in South Korea: Examining Covert Promotional Cancer-related Posts Across Two Major Search Engines
Date Submitted: Dec 25, 2025

Open Peer Review Period: Dec 25, 2025 - Feb 19, 2026
Peer Review Me
Background: Internet search engines serve as primary gateways for cancer information, yet the commercialization of health content within organic search results remains understudied. While covert promotional content—such as native advertising and stealth marketing—has been documented in various contexts, systematic comparisons across structurally divergent search platforms are lacking. Objective: This study examined the prevalence, distribution, and information quality characteristics of covert promotional cancer-related content across Naver and Google, South Korea's two dominant search engines, which have fundamentally different platform architectures. Methods: A two-phase cross-sectional content analysis was conducted. Phase 1 employed natural language processing to identify 33 cancer-related keywords from 1,400 preliminary posts. Phase 2 systematically collected 5,848 posts in October 2023, yielding 919 unique posts (598 from Naver and 321 from Google) that covered seven major cancer types, representing over 70% of Korean cancer incidence. Two trained coders analyzed promotional status, intensity, institutional sources, and information quality indicators (citation practices, information depth, and source attribution), with inter-coder reliability exceeding κ=.80. Chi-square tests examined the associations between platform and cancer type. Results: Covert promotional content appeared in 48.6% (447/919) of analyzed posts, with significantly higher prevalence on Google (54.2%, 174/321) than Naver (45.7%, 273/598; χ²₁=5.78, p=.016). Platform differences were pronounced: Naver promotional posts predominantly originated from blogs (96.0%, 262/273) and exhibited full promotional intensity (52.1%, 126/242), while Google posts primarily came from hospital websites (81.0%, 141/174) with simple institutional identification (57.8%, 52/90). Institutional source distribution varied significantly by platform (χ²₅=215.714, P<.001): traditional medicine institutions dominated Naver (99.2%, 119/120), whereas university-affiliated hospitals predominated on Google (85.0%, 96/113). Information quality differed substantially: indirect citation was more common on Google (81.6%, 142/174) than Naver (58.6%, 160/273; χ²₁=25.653, P<.001), while comparative informational depth was higher on Google (55.7%, 97/174) versus Naver (19.4%, 53/273; χ²₂=64.683, P<.001). Conclusions: Covert promotional cancer content is pervasive in Korean search results, with platform architecture systematically shaping promotional patterns, institutional sources, and information quality rather than reflecting deliberate marketing strategies. These findings underscore the need for platform-sensitive regulation and enhanced digital health literacy to protect vulnerable cancer information seekers from commercial exploitation embedded within ostensibly neutral search environments.
Large Language Models in Colorectal Cancer: A Systematic Review
Date Submitted: Dec 22, 2025

Open Peer Review Period: Dec 23, 2025 - Feb 17, 2026
Peer Review Me
Background: The growing complexity of colorectal cancer (CRC) management requires advanced tools for integrating multimodal data and clinical knowledge. Large language models (LLMs) offer a promising approach to address these challenges through sophisticated natural language processing and reasoning capabilities. Objective: This systematic review evaluates the current applications, performance, and practical implications of LLMs across the continuum of CRC care, from screening to treatment decision support. Objective: This systematic review evaluates the current applications, performance, and practical implications of LLMs across the continuum of CRC care, from screening to treatment decision support. Methods: We searched six databases (PubMed, Embase, Web of Science, Scopus, CINAHL, Cochrane) up to November 1, 2025, following PRISMA guidelines. Included studies were original research investigating LLM applications specific to CRC, with extractable outcome data. Quality was assessed using QUADAS-2, PROBAST, and ROBINS-I tools by two independent reviewers. Results: Following the screening of 1,261 records, 34 studies met the inclusion criteria, all published between 2023 and 2025. The synthesis highlighted the utility of LLMs in automating data extraction from clinical texts, supporting patient education, aiding diagnostic processes, and assisting in clinical decision-making, with growing evidence of their emerging visual interpretation and multimodal capacities. The effectiveness of these models was significantly influenced by prompt design, which varied from basic zero-shot queries to specialized fine-tuning techniques. While the overall methodological quality of the included studies was deemed adequate, assessments identified recurring concerns regarding insufficient control of biases and inadequate reporting on data security measures. Conclusions: LLMs demonstrate tangible potential to augment CRC care, particularly in structuring unstructured data and providing clinical decision support. However, translating this potential into practice requires solutions for domain adaptation, multimodal integration, and rigorous prospective validation to ensure reliability and safety in real-world settings. Clinical Trial: PROSPERO CRD420251248261; https://www.crd.york.ac.uk/PROSPERO/view/CRD420251248261.
Gender Bias and Assignment Consistency in Large Language Models for Clinical Decision-Making: Comparative Evaluation Study
Date Submitted: Dec 22, 2025

Open Peer Review Period: Dec 22, 2025 - Feb 16, 2026
Peer Review Me
Background: The integration of large language models (LLMs) into healthcare holds promise to enhance clinical decision-making, yet their susceptibility to biases remains a critical concern. Gender has long influenced physician behaviors and patient outcomes, raising concerns that LLMs assuming human-like roles, such as clinicians or medical educators, may replicate or amplify gender-related biases. Objective: To evaluate the consistency of LLM responses across different assigned genders (personas) regarding both diagnostic outputs and model judgments on the clinical relevance or necessity of patient gender. Methods: Using case studies from the New England Journal of Medicine Challenge (NEJM), we assigned genders (female, male, or unspecified) to multiple open-source and proprietary LLMs. We evaluated their response consistency across LLM-gender assignments regarding both LLM-based diagnosis and models’ judgments on the clinical relevance or necessity of patient gender. For representative models with high diagnostic accuracy, we further evaluated consistency across question difficulty tiers and clinical specialties. Results: All models showed high diagnostic consistency across assigned LLM genders (range of consistency rates: 91.45%–97.44%), though this did not always correspond to diagnostic accuracy (e.g., GPT-4.1: 97.44% consistency, 0.943 accuracy; Gemma-2B: 97.44% consistency, 0.478 accuracy). In contrast, judgments on the clinical importance of patient gender showed marked inconsistency: consistency rates ranged from 58.97% to 90.6% for relevance judgements, 78.63% to 98.29% for necessity judgements. Stratified by difficulty tier and specialty, the open-source model (LLaMA-3.1-8B) particularly showed statistically significant differences across LLM genders regarding both relevance and necessity judgements. Conclusions: Despite stable diagnostic outputs, LLMs varied substantially in their assessments of patient gender’s clinical importance across gendered personas. These findings present an underexplored bias that could undermine the reliability of LLMs in clinical practice, underscoring the need for routine checks of identity-assignment consistency when interacting with LLMs to ensure reliable and equitable AI-supported clinical care. Clinical Trial: not applicable

Other pages

Years

Issues

Search

Latest Submissions Open for Peer Review

Titles/Abstracts of Articles Currently Open for Review:

Date Submitted: Feb 12, 2026

Open Peer Review Period: Feb 13, 2026 - Apr 10, 2026

Date Submitted: Feb 12, 2026

Open Peer Review Period: Feb 13, 2026 - Apr 10, 2026

Date Submitted: Feb 12, 2026

Open Peer Review Period: Feb 13, 2026 - Apr 10, 2026

Date Submitted: Feb 12, 2026

Open Peer Review Period: Feb 13, 2026 - Apr 10, 2026

Date Submitted: Feb 12, 2026

Open Peer Review Period: Feb 12, 2026 - Apr 9, 2026

Date Submitted: Feb 11, 2026

Open Peer Review Period: Feb 12, 2026 - Apr 9, 2026

Date Submitted: Feb 11, 2026

Open Peer Review Period: Feb 12, 2026 - Apr 9, 2026

Date Submitted: Feb 11, 2026

Open Peer Review Period: Feb 12, 2026 - Apr 9, 2026

Date Submitted: Feb 11, 2026

Open Peer Review Period: Feb 11, 2026 - Apr 8, 2026

Date Submitted: Feb 10, 2026

Open Peer Review Period: Feb 11, 2026 - Apr 8, 2026

This manuscript needs more reviewers Peer Review Me

Date Submitted: Feb 10, 2026

Open Peer Review Period: Feb 11, 2026 - Apr 8, 2026

This manuscript needs more reviewers Peer Review Me

Date Submitted: Feb 9, 2026

Open Peer Review Period: Feb 10, 2026 - Apr 7, 2026

This manuscript needs more reviewers Peer Review Me

Date Submitted: Feb 7, 2026

Open Peer Review Period: Feb 9, 2026 - Apr 6, 2026

Date Submitted: Feb 3, 2026

Open Peer Review Period: Feb 6, 2026 - Apr 3, 2026

This manuscript needs more reviewers Peer Review Me

Date Submitted: Feb 5, 2026

Open Peer Review Period: Feb 6, 2026 - Apr 3, 2026

Date Submitted: Feb 5, 2026

Open Peer Review Period: Feb 6, 2026 - Apr 3, 2026

This manuscript needs more reviewers Peer Review Me

Date Submitted: Feb 5, 2026

Open Peer Review Period: Feb 6, 2026 - Apr 3, 2026

Date Submitted: Feb 5, 2026

Open Peer Review Period: Feb 6, 2026 - Apr 3, 2026

This manuscript needs more reviewers Peer Review Me

Date Submitted: Feb 4, 2026

Open Peer Review Period: Feb 5, 2026 - Apr 2, 2026

Date Submitted: Feb 4, 2026

Open Peer Review Period: Feb 5, 2026 - Apr 2, 2026

Date Submitted: Feb 4, 2026

Open Peer Review Period: Feb 5, 2026 - Apr 2, 2026

Date Submitted: Feb 3, 2026

Open Peer Review Period: Feb 4, 2026 - Apr 1, 2026

Date Submitted: Feb 3, 2026

Open Peer Review Period: Feb 4, 2026 - Apr 1, 2026

Date Submitted: Feb 3, 2026

Open Peer Review Period: Feb 4, 2026 - Apr 1, 2026

Date Submitted: Feb 3, 2026

Open Peer Review Period: Feb 4, 2026 - Apr 1, 2026

Date Submitted: Feb 3, 2026

Open Peer Review Period: Feb 4, 2026 - Apr 1, 2026

Date Submitted: Feb 2, 2026

Open Peer Review Period: Feb 3, 2026 - Mar 31, 2026

Date Submitted: Feb 2, 2026

Open Peer Review Period: Feb 3, 2026 - Mar 31, 2026

Date Submitted: Feb 2, 2026

Open Peer Review Period: Feb 3, 2026 - Mar 31, 2026

Date Submitted: Feb 1, 2026

Open Peer Review Period: Feb 2, 2026 - Mar 30, 2026

This manuscript needs more reviewers Peer Review Me

Date Submitted: Feb 1, 2026

Open Peer Review Period: Feb 2, 2026 - Mar 30, 2026

This manuscript needs more reviewers Peer Review Me

Date Submitted: Feb 1, 2026

Open Peer Review Period: Feb 2, 2026 - Mar 30, 2026

This manuscript needs more reviewers Peer Review Me

Date Submitted: Jan 29, 2026