Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

The leading peer-reviewed journal for digital medicine and health and health care in the internet age. 

Latest Submissions Open for Peer Review

JMIR has been a leader in applying openness, participation, collaboration and other "2.0" ideas to scholarly publishing, and since December 2009 offers open peer review articles, allowing JMIR users to sign themselves up as peer reviewers for specific articles currently considered by the Journal (in addition to author- and editor-selected reviewers).

For a complete list of all submissions across all JMIR journals as well as partner journals, see JMIR Preprints

Note that this is a not a complete list of submissions as authors can opt-out. The list below shows recently submitted articles where submitting authors have not opted-out of open peer-review and where the editor has not made a decision yet. (Note that this feature is for reviewing specific articles - if you just want to sign up as reviewer (and wait for the editor to contact you if articles match your interests), please sign up as reviewer using your profile).

To assign yourself to an article as reviewer, you must have a user account on this site (if you don't have one, register for a free account here) and be logged in (please verify that your email address in your profile is correct).

Add yourself as a peer reviewer to any article by clicking the '+Peer-review Me!+' link under each article. Full instructions on how to complete your review will be sent to you via email shortly after. Do not sign up as peer-reviewer if you have any conflicts of interest (note that we will treat any attempts by authors to sign up as reviewer under a false identity as scientific misconduct and reserve the right to promptly reject the article and inform the host institution).

The standard turnaround time for reviews is currently 2 weeks, and the general aim is to give constructive feedback to the authors and/or to prevent publication of uninteresting or fatally flawed articles. Reviewers will be acknowledged by name if the article is published, but remain anonymous if the article is declined.

The abstracts on this page are unpublished studies - please do not cite them (yet). If you wish to cite them/wish to see them published, write your opinion in the form of a peer-review!

Tip: Include the RSS feed of the JMIR submissions on this page on your homepage, blog, or desktop RSS reader to stay informed about current submissions!

JMIR Submissions under Open Peer Review

↑ Grab this Headline Animator

If you follow us on Twitter, we will also announce new submissions under open peer-review there.

Titles/Abstracts of Articles Currently Open for Review:

  • Refined Exclusion in Medical AI: Reframing Algorithmic Fairness as Data Justice and Patient Safety Governance

    Date Submitted: May 25, 2026
    Open Peer Review Period: May 27, 2026 - Jul 22, 2026

    Medical artificial intelligence (AI) systems are often evaluated through aggregate performance metrics and output-level fairness measures. However, clinically meaningful harms may remain hidden when systems perform well on average while underperforming for data-poor, underrepresented, or structurally marginalized populations. This Viewpoint uses the concept of refined exclusion to synthesize a recurring pattern in medical AI: systems may appear technically successful at the population level while transferring uncertainty, misclassification, delayed recognition, or reduced clinical reliability to groups that are less visible within training data, validation cohorts, proxy definitions, and deployment workflows. Drawing on representative cases from population health management, chest radiograph AI, dermatology, computational pathology, and foundation model applications, we argue that refined exclusion should not be treated merely as algorithmic bias or a defect of model outputs. Rather, it reflects a data governance failure with direct implications for patient safety. Moving beyond output-centered algorithmic fairness, we propose data justice as a governance foundation for medical AI, organized across distributional, procedural, and substantive dimensions. We further outline operational checkpoints across the medical AI lifecycle, including subgroup learnability assessment, data provenance documentation, local validation, procurement-stage accountability, explainability-based proxy audits, post-deployment subgroup monitoring, and patient participation. Reframing refined exclusion as a patient safety problem shifts the central governance question from “Is this model accurate on average?” to “For whom is this system safe, reliable, and clinically accountable?”

  • Background: Patient-reported outcome measure (PROM) completion is hindered by patient-level barriers—including motor, sensory, cognitive, and motivational constraints—that risk insufficient participation and non-response bias. While technology-enabled approaches such as multimodal speech assistance hold promise for reducing these barriers, assistance is a complex interaction: it can both alleviate and introduce barriers depending on how well it aligns with patients’ routines and needs. Objective: This qualitative study explores how patients perceive the advantages and disadvantages of AI-based speech assistance for PROM collection, focusing on how assistance functionalities interact with individual barriers and completion practices. Methods: We conducted semi-structured qualitative interviews with 96 psychosomatic and neurological rehabilitation outpatients, embedded in a pragmatic cross-randomised controlled trial. Participants completed PROMs with and without an AI-based speech assistance system offering speech output, speech input, and guidance by a socially interactive agent (SIA) that was physically, virtually, or voice-only embodied. The system was iteratively refined during data collection to address usability and performance issues. We included a broad sample to reflect real-world care settings, including patients without reported barriers. Using inductive content analysis (61 codes, grouped into 4 overarching and 9 subthemes), we examined perceived advantages and disadvantages of the three main assistance functionalities and multimodal interaction. Reporting followed the COREQ guideline. Results: The speech output function emerged as the most widely valued assistance feature, with many patients reporting improved concentration, question comprehension, and deeper engagement with item content. The social agent was described as making the interaction more engaging and less monotonous, by at the same time not evoking social pressure. Speech input was perceived as helpful by some, especially for those with motor impairments or a preference for verbal expression. However, each function also introduced challenges: speech output disrupted reading routines for some, the social agent was perceived as distracting or unnecessary by others, and speech input was criticised for recognition errors, inefficiency, and privacy concerns. Conclusions: AI-based speech assistance for PROM collection offers significant potential to reduce barriers and enhance patient engagement, but its effectiveness depends on alignment with individual needs, preferences and routines. While speech output proved broadly beneficial, speech input and socially interactive agents require careful design to avoid introducing new barriers, particularly for marginalised groups. Configurable, modular assistance systems that adapt to diverse user preferences and impairments are essential for equitable implementation. Future research should focus on inclusive co-design and longitudinal studies to refine these technologies for real-world clinical use. Clinical Trial: German Clinical Trail Register-ID: DRKS00035213

  • Background: Adolescent depression is clinically heterogeneous, and the presence of mixed features – defined as subthreshold manic symptoms co-occurring with a depressive episode – complicates diagnosis and treatment. Intensive longitudinal monitoring using wrist-worn actigraphy and daily ecological momentary assessment (EMA) may capture behavioral and experiential signatures that differentiate depression with mixed features (Mixed-Dep) from depression without mixed features (NoMix-Dep), but evidence in adolescents remains limited. Objective: This study aimed to examine whether multimodal digital monitoring using wrist-worn actigraphy and daily ecological momentary assessment of mood and energy can distinguish adolescents with depression with mixed features from those with depression without mixed features, and to identify dynamic energy–activity patterns specific to mixed depression. Methods: Ninety-eight adolescents (ages 12–18; 37 Mixed-Dep, 31 NoMix-Dep, 30 healthy controls) from the longitudinal Mood & Brain Circuitry in Adolescence (MBA) study wore wrist-worn actigraphy devices and completed daily mood and energy self-reports using the Mood and Energy Thermometer (MET) over two weeks. Group classification was defined based on the K-SADS-PL Mania Rating Scale. Dynamic within-person associations among mood, energy, and activity were estimated using generalized estimating equations with a first-order autoregressive working correlation structure, controlling for sleep duration, age, sex, and weekday/weekend status. Results: Both depressed groups showed lower overall activity and greater minimum activity suppression compared to healthy controls (mean activity: F = 32.67, p < 0.001), with NoMix-Dep showing lower minimum activity than Mixed-Dep (Min2: F = 17.91, p < 0.001; Min4: F = 23.37, p < 0.001). Mixed-Dep participants had significantly higher positive and negative energy scores (EnergyPosMax: F = 10.12, p < 0.001; EnergyNegMax: F = 91.93, p < 0.001), shorter wake after sleep onset (F = 3.67, p = 0.03), and higher sleep efficiency (F = 7.03, p < 0.01) than NoMix-Dep. Mood scores did not differ between depressed groups. Energy–mood associations were largely similar across groups. Energy–activity temporal coupling differed markedly: NoMix-Dep showed same-day congruent coupling (high energy predicted high activity), while Mixed-Dep showed an inverted lagged pattern (high energy today predicted lower activity tomorrow). Similar group-differential patterns were observed for mood–activity associations. Conclusions: An inverted, lagged energy–activity coupling represents a novel digital phenotype distinguishing mixed from non-mixed adolescent depression. Energy dysregulation, more than mood, differentiates the two depressed subgroups, with implications for scalable EMA-based screening and earlier identification of mixed features in clinical settings.

  • Background: Artificial intelligence (AI) is increasingly integrated into prostate cancer diagnostics, with the potential to improve accuracy and efficiency. However, it also raises important questions about the conditions and barriers that may influence its successful implementation in this clinical context. Objective: This study examined how patients and healthcare professionals perceive the integration of AI in prostate cancer diagnostics, with particular attention to its impact on clinical relationships and the roles of patients and physicians. Methods: A sequential explanatory mixed-methods design was used. Quantitative data were collected through an online questionnaire administered to patients with localized prostate cancer (N=51). Descriptive analyses focused on perceived benefits, willingness to use AI, and associated concerns. Qualitative data were collected through focus groups and semi-structured interviews with patients (n=16) and physicians (n=11). Data were analyzed using iterative, inductive thematic analysis. Results: Quantitative findings showed that despite recognizing the potential benefits of AI, patients remained divided regarding the use of such tools in their own care. Qualitative findings suggest that this hesitation cannot be explained solely in terms of perceived performance or utility. Rather than simply reducing complexity in clinical decision-making, AI appeared to reconfigure the certainties on which trust within the patient–physician relationship is established. This reconfiguration was reflected across epistemic, ethical, and role-related dimensions. Patients emphasized difficulties in understanding AI-generated knowledge, whereas clinicians focused on issues of reliability, validation, and clinical relevance. Ethical concerns centered on responsibility, which was consistently attributed to physicians, while errors made by AI were perceived as less acceptable than those made by clinicians. Role-related uncertainties were reflected in ambivalent patient positions: while some participants sought more information to remain involved in decision-making, others preferred to rely on physicians, reflecting variation in how patients engage with complex clinical information. AI was generally viewed as a supportive tool rather than a replacement for clinical judgement, while its integration was associated with evolving professional roles, including increased demands for interpretation, communication, and oversight. Conclusions: The integration of AI in prostate cancer diagnostics is shaped not only by its technical performance, but by how it reconfigures trust within the patient–physician relationship. Rather than eliminating uncertainty, AI redistributes it across knowledge, responsibility, and social roles. Ensuring that AI contributes positively to clinical practice therefore requires careful attention to clinician oversight, communication, and the relational context in which decisions are made. Clinical Trial: NCT07074405 (ClinicalTrials.gov)

  • Exploring the Impact of Social Media Use on Anxiety Symptoms in Healthcare Workers during the COVID-19 Pandemic

    Date Submitted: May 21, 2026
    Open Peer Review Period: May 22, 2026 - Jul 17, 2026

    Background: Healthcare workers experienced significant mental health challenges, particularly anxiety, during the COVID-19 pandemic. Although social media became a primary source of information and connection, it was also a potential source of stress. The influence of social media on healthcare workers’ anxiety is not well-understood. Objective: This study examined associations between social media use and anxiety symptoms among healthcare workers during the COVID-19 pandemic. Methods: This study examined associations between social media use and anxiety symptoms among healthcare workers during the COVID-19 pandemic. Methods: We conducted a cross-sectional analysis of data from the 2021 UC COVID study (N=427 healthcare workers). Anxiety symptoms were assessed using the Generalized Anxiety Disorder-2 (GAD-2). Social media use across five platforms (Twitter, Facebook, Instagram, other social media platforms, and other media sources) was evaluated using confirmatory factor analysis within a structural equation modeling framework. The confirmatory factor analysis supported a single latent factor representing overall social media use, with all platform indicators loading significantly (p=<0.01) with moderate loadings (0.33–0.59). Model fit was acceptable (χ²(5)=12.49, p=<0.01), indicating that the five observed variables coherently reflected a unified social media use construct. Logistic regression models estimated associations between overall social media use and anxiety symptoms, both unadjusted and adjusted for demographic, occupational, and health-related characteristics. Results: Of the 427 healthcare workers, Facebook was utilized the most with 54% of respondents utilizing the platform at least once a day. A total of 29% reported clinically relevant anxiety symptoms (GAD-2 ≥ 3). Overall, higher social media use was significantly associated with anxiety symptoms (OR=1.77, CI: 1.15-2.73). Older age was significantly associated with lessened anxiety symptoms (aOR=0.97, CI: 0.95–0.99). Healthcare workers with a history of mental health diagnoses reported higher levels of anxiety symptoms (aOR=2.38, CI: 1.38–4.09). Non-Hispanic, non-White healthcare workers reported fewer anxiety symptoms compared to White healthcare workers (aOR=0.49, CI: 0.27–0.91). Participants reporting higher income had significantly lower odds of anxiety symptoms than those in the lower-income group (aOR = 0.38, CI: 0.18–0.81). Conclusions: Social media use during the pandemic was associated with elevated anxiety symptoms among healthcare workers. However, the rapidly evolving digital landscape underscores the need for continued research. Future studies should include emerging social media sources (i.e., TikTok, Reddit, YouTube, etc.) and repeat factor analyses as digital behaviors shift over time. Longitudinal and mixed-methods approaches are necessary to understand patterns and methods of social media use, accounting for misinformation and disinformation, emotional involvement, and content types such as photos and videos. Larger and more diverse healthcare worker samples, stronger mental health measures (e.g., GAD-7, depression, and burnout scales), and analyses stratified by clinical role and work environment will be essential to guide interventions that support healthcare worker well-being in a post-pandemic era with new challenges.

  • Background: The high prevalence of sedentary lifestyles and non‑communicable diseases in Malaysia calls for scalable physical activity interventions. Hence, in this study, we leverage on the potential benefits of social media for exercise promotion, particularly Instagram. Objective: This pilot study examined the acceptability, observed changes, and predictors of improvement associated with an Instagram‑based exercise promotion among sedentary adults in Klang Valley, Malaysia. Methods: A total of 56 sedentary adults (34 females, 22 males) were recruited; 50 completed the 12‑week intervention (mean sedentary behaviour 7.30±2.75 hours/day; retention rate 89.3%). Participants joined a private Instagram page delivering cardiorespiratory‑focused exercise content every two days. Pre‑ and post‑intervention assessments included anthropometry, body composition (InBody 370), 6‑Minute Walk Test (6MWT), and Client Satisfaction Questionnaire‑8 (CSQ‑8). Results: Significant pre‑post changes were observed in body weight (mean change -2.05±2.88 kg, P<.001), BMI (-0.83±1.11 kg/m², P<.001), body fat percentage (-2.23±1.91%, P<.001), and 6MWT distance (67.82±40.81 m, P<.001). The mean total CSQ‑8 score was 27.02±4.91 (out of 32), indicating high satisfaction. Baseline body fat percentage, baseline 6MWT distance, and gender were associated with the degree of functional change (R²=0.71). Conclusions: This pilot study suggests that an Instagram‑based intervention is acceptable and may be associated with positive health changes among sedentary adults. These findings support the need for a definitive randomised controlled trial in the future.

  • Background: Large language models (LLMs) have shown considerable potential in intelligent healthcare consultation. However, their application in Traditional Chinese Medicine (TCM) gynecology remains limited by semantic gaps between colloquial patient descriptions and professional TCM reasoning, as well as risks of hallucinated medical content. Objective: We proposed MAGR-TCM, a knowledge graph-powered multi-agent retrieval-augmented generation framework for home-based TCM consultation and preliminary risk assessment. Methods: A domain-specific knowledge graph containing 10,231 entities and 32,051 relationships was constructed from 741 curated clinical case records. The framework integrates four specialized agents for question analysis, risk routing, graph reasoning, and response evaluation. Model performance was evaluated using the RAGAS framework and a double-blind expert assessment on 60 independent cases, including a safety stress-test with 10 emergency "Red Flag" scenarios. Results: MAGR-TCM achieved the best overall performance among baseline models, with an average RAGAS score of 0.900 and a consultation professionalism score of 0.904. The proposed framework demonstrated strong factual consistency (Faithfulness: 0.821) and comprehensive diagnostic accuracy (0.952), approaching the performance of human experts. In safety stress testing, MAGR-TCM achieved 100% emergency identification accuracy and the lowest unsafe recommendation rate (0.240) among all evaluated AI systems. Conclusions: The proposed MAGR-TCM framework demonstrates the potential of integrating knowledge graphs and multi-agent reasoning to support interpretable and safety-aware TCM consultation. The system serves as a reliable methodological prototype for intelligent home-based health management and preliminary risk assessment.

  • Global trends in digital behavior change interventions for overweight/obesity: a bibliometric and scoping review

    Date Submitted: May 20, 2026
    Open Peer Review Period: May 21, 2026 - Jul 16, 2026

    Background: Obesity represents a major global public health challenge. Digital behavior change interventions (DBCIs) have emerged as scalable, technology-enabled strategies for delivering evidence-based behavioral interventions using behavior change techniques (BCTs). However, current evidence remains fragmented regarding global research trends and the multi-dimensional distribution of BCTs within DBCIs across populations, intervention types, and health outcomes. Objective: This study aims to explore DBCIs among overweight and obese adults, focusing on temporal trends in research and patterns of BCTs utilization. Methods: A combined bibliometric analysis and scoping review was conducted based on publications from the Web of Science Core collection up to 2025. Publication trends, global collaboration pattern, digital technologies, BCT usage, intervention outcomes, and evidence gaps were systematically analyzed. Results: Research on DBCIs for obesity has grown rapidly since 2007, with leading contributions from high-income countries, accompanied by strengthened international collaboration and a gradual shift toward interdisciplinary and integrated digital health approaches. BCTs are typically applied in combination, with self-monitoring (79.4%) and goal setting (73.7%) as the core techniques, mainly targeting diet and physical activity. Their distribution varies significantly across digital technology types, targeted behaviors, clinical outcomes, and comorbid conditions. Conclusions: Current DBCIs prioritize behavioral self-regulation and cardiometabolic risk improvement. To enhance long-term sustainability and real-world effectiveness, future interventions should adopt a theory-driven framework, integrate psychological and physiological components, and implement personalized adaptive designs. Furthermore, integrating a big data-enabled systems paradigm of behavior will enable more dynamic, mechanism-informed, and proactive DCBIs for obesity management.

  • Use of Large Language Models (LLMs) in a Large Subspecialty Practice

    Date Submitted: May 20, 2026
    Open Peer Review Period: May 21, 2026 - Jul 16, 2026

    Background: Large language models (LLMs) are increasingly being incorporated into clinical practice for tasks such as rapid evidence retrieval, documentation support, and clinical decision-making. However, real-world data on clinician adoption, trust, verification practices, and perceived ethical or security concerns remain limited. Objective: To evaluate real-world use of LLMs among clinicians at a large academic medical center and assess perceptions regarding usefulness, reliability, ethical appropriateness, data security, and verification practices. Methods: We conducted a web-based cross-sectional survey of clinicians within the Department of Medicine at Mayo Clinic (Rochester, Minnesota, USA) between December 2025 and February 2026. Eligible participants included attending physicians, nurse practitioners, physician assistants, residents, and fellows. The survey evaluated awareness and clinical use of LLMs, frequency and context of use, perceived usefulness and ease of use, trust and data security perceptions, verification practices, behavioral intention to use, and comparisons with traditional point-of-care reference tools. Descriptive statistics were used to summarize responses, and associations between years of clinical experience and LLM use were assessed using chi-square tests. Results: A total of 254 clinicians completed the survey (response rate 11.6%). Awareness of LLMs was high (248/254, 97.6%), and 227/246 (92.3%) respondents aware of LLMs reported clinical use. Daily use was reported by 103/222 (46.4%) respondents, while 196/222 (88.3%) reported at least weekly use. OpenEvidence was the most commonly used clinical platform (187/227, 82.4%). LLMs were primarily used for rapid evidence retrieval (174/227, 76.7%), support in complex clinical scenarios (84/227, 37.0%), and guideline summarization (75/227, 33.0%). Additional reported uses included drafting clinical communications, summarizing patient histories, educational activities, and research-related tasks. Most respondents considered LLM use ethically appropriate (202/220, 91.8%) and regarded outputs as generally reliable, although confidence in data security was lower (115/217, 53.0%). Verification practices varied, with 120/217 (55.3%) reporting always or often verifying outputs. Many respondents rated LLMs more favorably than traditional reference tools such as UpToDate and PubMed. Reported use did not differ significantly across years of clinical experience. Conclusions: LLMs were widely used among respondents for both clinical and administrative tasks at a large academic medical center. Clinicians reported frequent use across diverse workflows, particularly for rapid information retrieval and support with complex clinical questions. Although perceptions of ethical appropriateness and usefulness were generally favorable, variability in verification practices and lower confidence in data security highlight the need for institutional guidance, governance frameworks, and education to support safe and consistent use of LLMs in clinical practice.

  • Background: Digital phenotyping uses passively collected digital-sensing data to characterize real-world behavioral patterns. Such data may help identify everyday lifestyles that are relevant to mental well-being, but most prior approaches have used variable-centered methods that focus on single behaviors rather than person-centered combinations of behaviors across daily life. Objective: This study aimed to examine whether digitally captured behavioral and environmental data can be used to derive meaningful lifestyle profiles and whether these profiles are associated with mental well-being. Methods: The study used a two-week intensive longitudinal design with a German quota sample of 553 adults (Mage = 42.27, SD = 12.89; 44.4% female). Across the study period, participants contributed 7,635 person-days of smartphone-recorded data on social app use, mobility, physical activity, screen use, ambient loudness, and brightness, along with self-reported mental well-being and Big Five personality traits. We used an innovative two-level latent profile analysis to simultaneously identify day-level profiles at Level 1 and person-level profiles at Level 2. We then examined associations between person-level lifestyle profiles and mental well-being, including whether these associations were moderated by Big Five personality traits. Results: The analysis identified eight day-level profiles and seven person-level profiles. One person-level profile characterized by lighter phone usage combined with heavier physical activity reported greater positive functioning, an important aspect of mental well-being, than another profile characterized by extensive mobility combined with intensive social app use. Personality traits did not significantly moderate the associations between lifestyle profiles and mental well-being. Conclusions: These insights advance digital phenotyping by showing that interpretable, person-centered lifestyle profiles could reflect aspects of mental well-being. Potential clinical implications, including transparent monitoring and multi-behavior interventions are discussed.

  • The Applicability of Smart Wearables in Neurological Disorders: A Scoping Review

    Date Submitted: May 19, 2026
    Open Peer Review Period: May 20, 2026 - Jul 15, 2026

    Background: Neurological disorders present an increasing health burden worldwide and cause significant impairments which impede physical, social and cognitive function. Wearable technologies show promise for bolstering health monitoring in neurological populations due to their ease of use and accessibility, yet their applicability for use in measuring relevant parameters remains unclear. Objective: This scoping review aims to map the current evidence surrounding the use of smart wearable technologies in people with neurological disorders. Methods: Following the Preferred Reporting Items for Systematic Review Extension for Scoping Reviews (PRISMA-ScR), studies were systematically reviewed from five electronic databases: MEDLINE, EBSCOHost, Cochrane Library, IEEE Xplore, and Web of Science. Key features, limitations and potential clinical applications of these devices were identified. Two independent reviewers screened and selected the studies. Two reviewers then summarised the selected studies using an Excel data extraction sheet and used the NIH Risk of Bias tool to critically appraise them. Results: Of seventy-nine studies included in this review, twenty-seven focused on stroke, thirty-four on Parkinson’s Disease (PD), eight on Multiple Sclerosis (MS), seven on dementia, two each on Traumatic Brain Injury (TBI) and Amyotrophic Lateral Sclerosis (ALS), one each on epilepsy and Progressive Supranuclear Palsy (PSP), three on Spinal Cord Injury (SCI), and one on Huntington’s Disease (HD). Conclusions: Smart wearables demonstrate accuracy and feasibility, particularly in stroke and PD, with most studies focusing on physical parameters such as gait patterns. Future research should include psychosocial and physiological outcomes, use larger and more standardised samples, and address underrepresented neurological conditions, to better define the broader applicability of smart wearables. Clinical Trial: 10.17605/OSF.IO/ZHCPE

  • Background: Individuals who are affected by an injury to the brain or spinal cord at a young age will have to manage a complex health situation with a variety of functional and cognitive challenges. The users’ level of health literacy can be detrimental for their ability to navigate their daily life. In todays’ situation, where health information and health services increasingly is being disseminated digitally, digital tools can help users make proactive choices and enhance autonomy. However, to realize this potential, it is important to know which tools are available, in what regard they are tailored for end-users, and if they are developed and evaluated with scientific evidence. Objective: In this scoping review we aimed to map available digital tools in the form of mobile applications or interactive e-learning resources for a target population of children, adolescents and young adults with injury to the brain and/or spinal cord, and to assess evidence for the usability and efficacy of such tools to enhance health literacy. Methods: We conducted a scoping review following the Joanna Briggs Institute framework for scoping reviews and the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews). Searches involved a systematic search in key bibliographic databases, in grey literature, in specific search engines to identify mobile applications, and in selected relevant websites of organizations. To be eligible for inclusion, sources had to address our target population and involve digital tools with active engagement of the end-user. Results: Of 612 identified scientific records from databases, two studies were eligible for data extraction. The grey literature search resulted in six more eligible publications, resulting in a final number of eight included published papers. The searches for mobile applications identified 114 apps for consideration, of which 30 were included. Our findings show that the evidence base for the use and efficacy of digital applications enhancing health literacy for our target population is sparse. We identified very little reported user experience. The number of tailored apps was low, suggesting that general health apps are very dominant compared to diagnosis- and age-specific apps. Our findings imply that it is difficult to navigate and understand privacy management and security of available apps from the information given in app stores. Conclusions: Published evidence for the usefulness and efficacy of digital tools in the form of mobile applications and interactive e-learning websites for improving health literacy in our target population of children, adolescents and young adults with injury to the brain or spinal cord is sparse. Compared to the abundant number of available general health and wellness-apps, there are few tailored digital tools with active end-user involvement. To evaluate privacy management and security of available apps is a challenge. Clinical Trial: Open Science Framework https://osf.io/k4r35/overview

  • Background: Large language models (LLMs) have demonstrated potential as auxiliary tools in digital health scenarios, such as depression management. However, their effectiveness depends on their ability to meet both rigorous professional standards and individualised patient needs. Currently, a gap exists in research that systematically evaluates the quality of LLM responses from both medical and patient perspectives, hindering the development of “patient-centred” medical artificial intelligence. Objective: This study aimed to develop a dual-paradigm evaluation framework that integrates professional-safety and experience-practicality perspectives, and to systematically compare how clinicians and patients evaluate LLM-generated responses to common questions about depression, in order to identify communication features that can bridge the cognitive divide. Methods: We selected the 10 most frequently asked questions from patients with depression and generated responses using four mainstream Chinese LLMs (DeepSeek-V3.2, GLM-4.6, Qwen-3-Max, and Kimi-k2-thinking). Ten psychiatrists and 130 clinically diagnosed patients with depression were invited to independently conduct blind scoring from their respective professional or experiential perspectives across six evaluation dimensions. Results: Significant differences were found between healthcare providers and patients across all evaluation dimensions (p < 0.05), with the greatest perceptual gap observed in “safety boundaries and risk awareness” (effect size r = 0.38). Key findings include: (1) The symbiosis of safety and empathy: From the patient’s perspective, perceived “safety” of a response was highly positively correlated with its “linguistic approachability” (ρ > 0.5), in stark contrast to the negative correlation observed in the physician group (ρ = -0.289). This suggests that safety warnings incorporating expressions of empathy are more likely to gain patient acceptance and trust. (2) Structural differences in evaluation logic: Patients tended to evaluate “clarity”, “practicality”, and “approachability” as an integrated whole (strong positive correlations), whereas doctors were able to assess “medical accuracy” as an independent core metric. Conclusions: Based on these findings, this study proposes that “bridging communication” should serve as the core developmental paradigm for future medical AI. This paradigm emphasises that an effective AI response requires a delicate balance between professional rigour and individual relevance, centring on two key transformations: translating standardised medical language into personal narratives that resonate with patients’ lived experiences, and transforming structured knowledge into actionable, personalised guidance. The best-performing models in this study (GLM-4.6 and Kimi-k2-thinking) demonstrated preliminary evidence of this “bridging” characteristic in their responses. This study not only evaluates existing models but, more importantly, provides a crucial theoretical framework and empirical basis for building the next generation of medical AI assistants that possess genuine communicative intelligence, empower patients, and support clinical practice. Clinical Trial: NONE

  • Background: Other infectious diarrhea (OID) remains an important public health concern in China because of its high incidence, marked seasonality, and substantial burden, particularly among children. Accurate short-term forecasting and early warning are important for timely public health response. However, previous OID forecasting studies have mainly relied on reported case data, and the added value of multisource indicators remains insufficiently evaluated. Objective: This study aimed to develop and evaluate a multisource CNN-BiLSTM-SE Attention model for short-term forecasting and early warning of reported other infectious diarrhea cases in Chongqing, China. Methods: Daily OID case counts in Chongqing from January 2015 to June 2025 were collected, together with meteorological variables and Baidu search indices related to infectious diarrhea. After data normalization, Pearson correlation analysis and random forest variable-importance analysis were used for predictor selection. A CNN-BiLSTM-SE Attention hybrid model was developed to integrate multisource data, extract local temporal patterns, model temporal dependencies, and recalibrate informative feature channels. Forecasting performance was evaluated using RMSE, MAE, MAPE, and R², and compared across different input settings and benchmark models. In addition, 5-day-ahead predictions were converted into binary warning signals using training-set 75th and 90th percentile thresholds, and compared with a persistence baseline. Results: Under the full-input setting, the CNN-BiLSTM-SE Attention model achieved the best predictive performance, with an R² of 0.7828, RMSE of 35.418, MAE of 25.411, and MAPE of 17.27%. Compared with the case-only model, R² increased by 0.0326, while RMSE and MAE decreased by 2.560 and 1.643, respectively. The proposed model also outperformed random forest, XGBoost, CNN, and LSTM. In the threshold-based early-warning evaluation, the full-input model showed better overall warning performance than the persistence baseline at both the 75th and 90th percentile thresholds. Conclusions: The CNN-BiLSTM-SE Attention hybrid model improved short-term forecasting of reported OID case counts in Chongqing. Integrating epidemiological, meteorological, and internet search data provided complementary information, suggesting potential utility for OID surveillance, forecasting, and early warning.

  • Background: Medication nonadherence remains a major global health challenge, contributing to preventable disease, hospitalizations, and healthcare costs. Mobile health (mHealth) applications incorporating gamification and financial incentives have shown potential to improve adherence; however, most research has focused on patient perspectives, with limited understanding of how non-patient stakeholders perceive their feasibility, risks, and implementation. Understanding non-patient stakeholder perspectives in relation to patient viewpoints is essential for informing future policy development and establishing practical, industry-supported safeguards that protect consumers while enabling innovation. Objective: This study aimed to explore non-patient stakeholder perspectives on the use of gamification and financial incentives in mHealth apps for medication adherence and to integrate these with previously reported patient perspectives to inform consensus-based design and policy considerations. Methods: A mixed-methods study was conducted using a modified virtual Nominal Group Technique (vNGT). Non-patient stakeholders across healthcare, industry, and policy sectors in Australia were recruited. Data collection involved a pre-session survey followed by online focus groups. Qualitative responses were analyzed using thematic analysis supported by AI-assisted coding. Consensus statements derived from themes were rated during the focus groups. Additional prompts were used to elicit further discussion where consensus was not immediately achieved. Results: A total of 20 participants were included in the study. Six key themes were identified: tailored gamification for adherence, financial incentives as a contested motivator, designing for diversity and inclusion, usability barriers to engagement, trust through data governance, and validated and sustainable innovation. These informed 24 consensus statements, of which 54% (13/24) achieved unanimous agreement. Stakeholders strongly endorsed personalization, simplicity, and transparent data practices, while expressing nuanced concerns regarding the ethical use, sustainability, and potential unintended consequences of financial incentives. Compared with prior patient findings, the participants demonstrated substantial alignment on core design principles but contributed additional system-level considerations related to feasibility, scalability, and regulation. Conclusions: Non-patient stakeholders largely reinforce patient priorities while extending them with critical perspectives on implementation, governance, and sustainability. Gamification and financial incentives are viewed as potentially effective but require careful, ethically grounded design to balance engagement with long-term motivation and trust. These findings support the development of stakeholder-informed guidelines for responsible mHealth innovation and highlight the importance of integrating patient and system-level perspectives in digital health design. Future research should prioritize co-designed longitudinal studies utilizing apps with gamification and a range of incentive offers with clear redemption processes to evaluate the long-term impact on medication adherence across diverse patient populations.

  • Background: High-performing perioperative prediction models have not consistently translated into clinical benefit, in part because model outputs must be delivered through clinical decision support systems (CDSS) that align with anesthesia workflows and end-user needs. Objective: To identify anesthesia professionals’ requirements for perioperative CDSS and use these findings to inform the design specification of a user-centered perioperative CDSS. Methods: This user-centered study was conducted in four sequential phases: translation of a previously validated explainable machine-learning model into candidate CDSS functions; three rounds of focus group–based iterative prototyping; a nationwide cross-sectional questionnaire survey; and CDSS finalization based on iterative prototyping and survey findings. The survey assessed requirements for information display, alerting, explainability, intervention support, and workflow integration among anesthesia-related professionals in China. Results: Three rounds of focus group discussion and iterative prototyping generated a preliminary prototype comprising candidate modules for information display, alerting, explainability, intervention support, and workflow integration. A total of 2401 valid questionnaires were analyzed. Respondents generally preferred direct risk presentation, probability-based alerting, interpretable displays of modifiable risk factors, actionable intervention support, and integration within existing clinical platforms. These findings informed the final specification of an integrated CDSS within the anesthesia information system, including dynamic risk prediction, threshold-based alerting, explainable risk attribution, and evidence-informed intervention recommendations. Conclusions: In this user-centered design study, anesthesia professionals identified key requirements for perioperative CDSS, including direct information display, clinically meaningful alerts, explainable risk-factor presentation, actionable recommendations, and workflow integration. These findings may inform the translation of perioperative prediction models into decision support tools that are more usable and acceptable in routine anesthesia practice.

  • Background: The platform-based economy has expanded rapidly through the integration of digital platforms into sectors such as transportation, delivery, and freelance work. Platform labor combines features of precarious employment and digitalized work organization, encompassing both location-based and web-based work. However, the occupational health implications of platform work remain insufficiently understood, particularly regarding how risks differ across platform worker groups. Objective: This study aimed to explore how platform workers experience their working conditions and how platform work affects their health, wellbeing, and safety. Methods: A participatory photovoice study was conducted with platform-based taxi drivers, delivery couriers, and freelancers living in Stockholm. Between September and November 2022, 16 participants were recruited into three groups (5–6 participants per group). Across five sessions, participants documented their working lives through photographs and discussed them collectively, generating 105 photographs in total. Data were analyzed collaboratively to identify key themes and recommendations related to working conditions, health, and wellbeing. Results: Participants identified 14 themes representing major determinants of health, wellbeing, and safety at work, as well as 23 recommendations for improving working conditions. Workers reported exposure to both platform-specific risks, including algorithmic management and digital surveillance, and traditional occupational risks such as psychosocial strain, ergonomic challenges, and traffic-related hazards. Experiences differed substantially across platform work types. Delivery and taxi drivers reported greater exposure to physical and traffic-related risks, whereas freelancers emphasized psychosocial demands and digital work intensification. Economic insecurity and costs associated with maintaining work equipment emerged as common challenges across all groups. Attitudes toward flexibility, autonomy, and algorithmic management also varied between worker categories. Conclusions: This study highlights important similarities and differences in working conditions and health risks across platform work types. The findings suggest that research and occupational health interventions targeting platform workers should differentiate between specific forms of platform labor to better capture the diversity of workers’ experiences and exposures.

  • Background: Co-creation is increasingly used in health research, public health, and participatory initiatives to support inclusive, collaborative, and evidence-informed problem-solving. However, the integration of digital technologies into co-creation processes remains fragmented and largely ad hoc, with limited frameworks available to guide technology selection, evaluation, and development. Objective: This study aimed to develop the Co-Tech Taxonomy, an empirically grounded evaluative framework for assessing digital technologies used in co-creation and participatory digital health ecosystems. Methods: Using the Nickerson–Varshney–Muntermann (NVM) taxonomy-building method, the taxonomy was developed through the analysis of six foundational conceptual and empirical frameworks related to co-creation, participatory processes, and digital technologies. The taxonomy was subsequently refined through iterative empirical classification of 84 technologies used in co-creation contexts. Results: The final taxonomy consists of seven functional dimensions: governance, inclusivity, methodology, collaboration, engagement, data management, and cognitive support. Each dimension is operationalised across three progressive levels of co-creation alignment. The empirical mapping revealed that current digital ecosystems remain insufficiently aligned with participatory collaboration requirements, particularly regarding governance, inclusivity, and AI-supported cognitive facilitation. While communication and data-management functionalities were comparatively mature, participatory governance, collaborative decision-making, and AI explainability remained underdeveloped across most evaluated technologies. The taxonomy also enabled the development of a three-tier indicative certification model to support technology assessment and implementation. Conclusions: The Co-Tech Taxonomy provides a structured evaluative framework for assessing existing technologies, identifying implementation and innovation gaps, and guiding the development of more inclusive, transparent, interoperable, and AI-ready participatory digital infrastructures. The framework offers a practical foundation for strengthening digitally supported co-creation and participatory collaboration within health-related contexts.

  • Background: Generative artificial intelligence (GenAI) is increasingly used to produce patient-friendly clinical documentation, yet evaluation of these outputs remains inconsistent and difficult to scale. Patient-friendliness is commonly reduced to narrow readability metrics, such as Flesch-Kincaid grade level, without accounting for clinical accuracy, completeness, or the patient perspective. No standardized framework exists to evaluate the quality and safety of AI-generated patient-friendly documentation across document types or the full documentation lifecycle. Objective: To develop and preliminarily validate CLEAR (Clinical Language Evaluation and AI Documentation Review), a theoretically grounded evaluation framework for AI-generated patient-friendly clinical documentation across the generation, review, and monitoring stages of the AI documentation lifecycle. Methods: CLEAR was developed using Messick's validity framework across four stages: content validation, response process, internal structure, and consequences. Domains were identified through a targeted literature review and reviewed by a panel of six clinical and operational experts. An iterative, consensus-based process involving four board-certified internists across 10 rounds refined domain definitions and scoring instructions. Inter-rater reliability was assessed on 50 AI-generated patient-friendly discharge summaries using Cohen's kappa and Gwet's AC1 for binary domains and intraclass correlation coefficients (ICC) and Gwet's AC2 for continuous domains. Additionally, 19 semi-structured stakeholder interviews with clinicians, informaticists, institutional leaders, and patient education experts explored operational needs and implementation contexts. Results: CLEAR comprises five domains for evaluating patient-friendly AI documentation: readability, understandability, patient-centeredness, accuracy, and completeness. Inter-rater reliability was good to almost perfect across all subjectively scored domains per Gwet's agreement coefficients. Stakeholder interviews independently identified three operational gaps aligned with the CLEAR lifecycle: lack of structured guidance for prompt engineering, subjectivity in human review, and absence of scalable monitoring infrastructure, directly validating the framework's real-world relevance. CLEAR was applied across three illustrative implementation contexts: prompt engineering for patient-friendly echocardiogram reports, structured human review of discharge summaries, and development of LLM-as-judge automated monitoring tools. Conclusions: CLEAR provides a preliminarily validated evaluation framework designed to span the full AI documentation lifecycle, from prompt engineering through human review to automated monitoring. By conceptualizing patient-friendliness as a multidimensional construct that integrates communication quality with patient safety, CLEAR offers practical infrastructure for consistent and scalable governance of patient-facing AI documentation in healthcare systems.

  • Background: The severe shortage of dermatologists in Ethiopia creates significant barriers to specialized skincare, particularly for rural populations. Teledermatology (TD) offers a promising solution to bridge this gap. Objective: This study aimed to explore the pre-implementation perceptions of healthcare professionals in Hawassa city, Ethiopia, to identify the key facilitators and critical barriers influencing the potential adoption of a TD service. Methods: A qualitative study was conducted via in-depth and key informant interviews with 22 participants, including physicians, health officers, facility administrators, and allied health professionals from nine public health centers in Hawassa city. A convenience sampling approach was used. Data were collected through semi-structured interviews, transcribed, and analyzed via a rigorous thematic analysis approach to identify recurring themes related to facilitators, barriers, and implementation strategies. Results: Healthcare professionals demonstrated positive attitudes and recognized the potential of TD to improve patient access and enhance their clinical knowledge. Key facilitators included the widespread availability of personal smartphones and high intrinsic motivation among staff. However, significant barriers were also identified, with subthemes including technological infrastructure gaps, systemic and policy vacuums, patient-related concerns about privacy and trust, and provider-related issues such as the need for targeted training. Conclusions: Healthcare professionals expressed willingness to adopt TD, identifying key facilitators (e.g., smartphone access, personal motivation) and notable barriers (e.g., infrastructure challenges, unclear policies, privacy concerns). Successful implementation will require targeted investments, clear guidelines, staff training, and community buy-in.

  • Background: As the phenomenon of population aging continues, increased pressure is placed on healthcare services to treat widespread movement disorders which disproportionately affect older adults. Sit-to-stand and sit-to-walk movements are common, universally understood daily activities which provide deep insight into an individual’s health status. Developing comprehensive analysis metrics for sit-to-stand and sit-to-walk movements based on ear-worn inertial measurement units, which are found in widely used devices such as hearing aids and headphones, unlocks a prime opportunity to improve the monitoring of elderly individuals with movement disorders with minimal burden. Objective: This paper aims to introduce comprehensive, clinically useful analysis metrics for sit-to-stand and sit-to-walk movements based on data from ear-worn inertial measurement units, and to evaluate them against a gold standard optical motion capture system. Methods: Data were collected from 61 participants, who were organised into a typically functioning group (16M/34F, median age 27 (IQR 15)) and a group comprising those suffering from an underlying movement condition (7M/9F, median age 62 (IQR 23)). The participants wore a bespoke 3D-printed headset which contained 6-axis inertial measurement units (IMUs) placed beside their left and right ears. A reflective marker was placed on each participant’s 7th cervical vertebra (C7) to provide comparison with a gold standard optical motion capture system. Each participant performed 3 sit-to-stand movements and 3 sit-to-walk movements. After calculating the correlation between the IMUs and C7 marker data, the total movement duration, duration of each subphase, peak acceleration and velocity, power, movement smoothness and hesitation were analysed offline using the IMU and C7 marker data. Results: Very strong correlation scores (r≥0.99) were found between the ear-worn IMUs and C7 marker for forwards and upwards acceleration during sit-to-stand and sit-to-walk. The mean overall movement durations measured using the ear-worn IMUs were within 0.10 seconds of the optical system, while very strong linear relationships were observed between the power calculated by the ear-worn IMUs and the optical system for sit-to-stand (r=0.97) and sit-to-walk (r=0.94). Strong correlations were found between the IMU and optical systems when measuring hesitation in sit-to-stand (r=0.80) and sit-to-walk (r=0.88), though were comparatively poor for movement smoothness (r=0.47 for sit-to-stand and r=0.54 for sit-to-walk respectively). Conclusions: The proposed sit-to-stand and sit-to-walk movement metrics, including movement duration, power and peak acceleration and velocity, can be accurately analysed using ear-worn IMUs. These metrics provide a basis for to comprehensively analyse these movements using devices which are already integrated into users’ daily routines.

  • Background: Digital interventions for addiction have demonstrated effectiveness and scalability, yet their implementation remains uneven, particularly within social services, where responsibility for non-emergency addiction care often resides. In addition, limited research has examined how social workers interpret and engage with these technologies in practice. Objective: This study aimed to examine how social workers’ technological frames shape their evaluations of digital interventions, with particular attention to domain-level incongruence and contextual inconsistency across practice situations. Methods: An embedded mixed-methods design was used, combining survey data (N=169) with qualitative open-ended responses and 10 semi-structured interviews. Participants completed a validated questionnaire assessing attitudes toward digital interventions and evaluated internet-based interventions across three case vignettes and intervention scenarios. Quantitative analyses included typology construction, repeated-measures ANOVA, and gap analysis (value–use discrepancy). Qualitative data were analyzed using deductive thematic analysis. The study was guided by Technological Frames theory. Results: Practitioners reported moderately positive attitudes (mean 3.81/5) and rated both value (mean 6.18/10) and appropriateness (mean 6.07/10) above the scale midpoints. Four practitioner typologies emerged: Holistic Adopters (37.9%), System Skeptics (31.4%), Client-Centric Advocates (16.6%), and Efficiency Supporters (14.2%). A consistent value–use gap indicated that digital interventions were perceived as more valuable in principle than appropriate in practice (mean difference 0.12, P<.001), with no significant variation across typologies. Appropriateness ratings varied significantly across intervention scenarios, indicating frame inconsistency, with greater acceptance in later-stage scenarios. Qualitative findings suggested that digital interventions were viewed as valuable in principle but context-dependent in practice and were therefore typically positioned as complements rather than substitutes for face-to-face care. Conclusions: Social workers’ evaluations of digital interventions are shaped by both structural misalignment across technological frame domains and situational variation across contexts. The consistent gap between perceived strategic value and practical appropriateness highlights the importance of implementation conditions and contextual fit, rather than attitudinal resistance. These findings suggest that successful integration of digital interventions in social services requires alignment with professional practices, relational care values, and context-sensitive implementation.


  • Background: Diabetes-related depression refers to a depressive state triggered or exacerbated by factors such as diabetes diagnosis, the burden of treatment, and the risk of complications. The differences in efficacy among various telemedicine intervention models remain unclear, and the lack of direct and indirect comparisons across multiple methods makes it impossible to establish a clear hierarchy of relative merits. Objective: This study aims to conduct a systematic review and network meta-analysis of randomized controlled trials evaluating telemedicine interventions for diabetes-related depression. The study will synthesize efficacy data from various telemedicine intervention models, compare the effectiveness and acceptability of different intervention strategies, and identify the optimal intervention approach. Methods: We conducted a literature search in the PubMed and Web of Science (WOS) databases, covering the period from the inception of each database through January 17, 2026. We included peer-reviewed English-language studies examining the association between telemedicine interventions and psychological outcomes in patients with diabetes. The risk of bias and methodological quality of the studies was assessed using the Cochrane Risk of Bias Assessment Tool version 2.0. Data analysis was performed using the meta and metafor packages in R version 4.1.0 (R Statistical Computing Project). The standardized mean difference (SMD) was used to pool the effect sizes. We conducted a series of sensitivity analyses to assess the robustness of the results. Results: A total of 16 RCT studies involving 246 studies were included. The overall pooled effect size, estimated using a random-effects model, was SMD = -0.33 (95% CI: -0.43, -0.24), indicating a small-to-moderate beneficial effect of interventions compared to control conditions (p < 0.001). The overall pooled effect size, estimated using a random-effects model, was OR = 1.07 (95% CI: 0.71, 1.63). Moderate heterogeneity was observed across studies (I² = 49.4%, τ2 = 0.3292, p = 0.0133)。Subgroup analyses were conducted based on intervention type. For telephone-based interventions, the pooled odds ratio was 1.38 (95% CI: 0.98, 1.94; I² = 0.3%). For app-based interventions, the pooled odds ratio (OR) was 1.07 (95% CI: 0.49, 2.34; I² = 0.7%). For online training interventions, the pooled odds ratio (OR) was 0.33 (95% CI: 0.07, 1.64; I² = 0.0%). For video interventions, the odds ratio (OR) was 0.50 (95% CI: 0.09, 2.73). Overall, the odds ratio derived from the random-effects model was 1.07 (95% CI: 0.71, 1.63; I² = 0.5%), indicating no significant difference in all-cause dropout rates between the intervention and control groups. Conclusions: Digital psychological interventions are an effective means of alleviating depressive symptoms, with structured online training proving particularly effective. Clinical Trial: The study protocol has been registered in the PROSPERO database (registration number: CRD420251233353).

  • Digital augmentation of glucagon-like peptide-1 (GLP-1) for obesity management: A retrospective study

    Date Submitted: May 8, 2026
    Open Peer Review Period: May 10, 2026 - Jul 5, 2026

    Background: Glucagon-like peptide-1 receptor agonists (GLP-1 RAs), including semaglutide and tirzepatide, are highly effective for weight management; however, long-term outcomes depend on sustained behavioral engagement. Digital health tools may enhance these behavioral components, but evidence regarding their real-world benefit alongside pharmacotherapy remains limited. Objective: To evaluate whether augmenting a direct-to-consumer GLP-1 RA program with a behavioral support smartphone app improves weight loss outcomes compared with medication alone, and to identify which application features are associated with weight loss over time. Methods: A retrospective observational cohort study was conducted using data from 14,599 adults enrolled in an Australian direct-to-consumer weight-loss program, of whom 6,753 met inclusion criteria after data cleaning. Participants received semaglutide or tirzepatide, with optional access to a smartphone app and clinician support. Logistic regression models examined achievement of ≥5%, ≥10%, and ≥15% weight-loss thresholds drawn from established clinical guidelines and trial conventions at 60, 120, and 180 days. Linear mixed-effects models with linear and quadratic time terms assessed which application features impacted percentage weight loss trajectories, adjusting for medication type, gender, ethnicity, and baseline BMI. Results: Participants using the program plus medication were more likely to achieve clinically significant weight loss, with odds ratios ranging from 1.46 to 2.57 for ≥5% weight loss across timepoints (P< .001), 1.73 to 2.15 for ≥10% weight loss at later timepoints (P< .001), and 2.13 for ≥15% weight loss at 180 days (P=.001). Participants using the program plus medication were more likely to achieve clinically significant weight loss, with odds ratios ranging from 1.46 to 2.57 for ≥5% weight loss across timepoints (P< .001), 1.73 to 2.15 for ≥10% weight loss at later timepoints (P< .001), and 2.13 for ≥15% weight loss at 180 days (P=.001). Mixed-effects modelling demonstrated that progress logging was associated with greater weight loss over time (β = 0.28, 95% CI 0.05 to 0.52, P = .019), with no evidence of a quadratic effect. Recipe viewing was not associated with linear change over time, although the linear interaction approached significance (β = −0.18, 95% CI −0.37 to 0.02, P = .078), but demonstrated a small positive quadratic effect (β₂ = 0.05, 95% CI 0.01 to 0.08, P = .004), suggesting a delayed or increasing benefit over time. Goal setting and calorie target setting were not associated with changes in weight-loss trajectories. Conclusions: Augmenting GLP-1 RA treatment with a behavioral smartphone app was associated with higher likelihood of achieving clinically significant weight loss, although differences in overall weight-loss trajectories were modest. Medication type was the strongest determinant of weight loss over time, with tirzepatide producing greater early reductions. Among app features, progress logging emerged as the primary behavioral predictor of improved outcomes. These findings support the value of integrating digital tools with pharmacotherapy and highlight the importance of self-monitoring in weight management.

  • A Remote Monitoring System to Enhance Adverse Event Surveillance in Patients with Multiple Chronic Conditions

    Date Submitted: May 8, 2026
    Open Peer Review Period: May 10, 2026 - Jul 5, 2026

    Background: Adverse events (AEs) after hospitalization are common and disproportionately affect adults with multiple chronic conditions (MCC). Patient-reported symptoms and self-assessed health may enable earlier detection of post-discharge AEs, but scalable, workflow-integrated approaches are limited. Objective: To identify user requirements for, and field test, an automated remote monitoring system to enhance AE surveillance during transitions. Methods: We conducted a mixed-methods study using an iterative, user-centered design approach. Semi-structured interviews with patients and clinicians informed system requirements, followed by real-world field testing. The prototype leveraged interoperable electronic health record data services, delivered automated post-discharge check-ins using symptom questionnaires and patient-reported outcomes (PROs), provided risk-stratified health advice, and escalated high-risk symptoms to clinicians. Descriptive statistics assessed feasibility and utilization; conventional content analysis identified user needs and implementation considerations. Results: Thirty-seven patients with MCC and 23 clinicians participated. Key requirements included clear communication of personalized risk based on red-flag symptoms, actionable guidance aligned with discharge instructions, explicit delineation of responsibility between inpatient and outpatient clinicians, and selective escalation to minimize burden. In field testing with 20 patients, 60% of automated questionnaires were completed. Seven patients received risk-stratified advice for new or worsening symptoms; among those with moderate- or high-risk alerts, emergency department visits occurred within one week of discharge. Patients found the system understandable and helpful, while clinicians noted challenges interpreting PRO trends. Conclusions: A user-informed, automated remote monitoring system was feasible and acceptable for AE surveillance during transitions but should prioritize clear risk communication, role clarity, and interpretable patient-reported data to support safer transitions in this population.

  • Background: House dust mite (HDM) sensitization commonly begins in early life and contributes to persistent allergic airway inflammation and asthma chronicity. Primary prevention via early-life environmental control is a key pathway to reduce HDM sensitization and asthma risk. Objective: To characterize child caregivers’ knowledge, attitudes, and practices (KAP) regarding pediatric HDM control using a hybrid literature/expert-driven and social media-driven approach, and examine associations between KAP levels, child age and caregiver social media activity. Methods: This cross-sectional study comprised two interconnected components: (1) mining of content published between August 2023 and July 2025 from five major Chinese social media platforms, analyzed via Latent Dirichlet Allocation (LDA); and (2) a social media-enhanced web-based KAP survey administered in November 2025 to child caregivers in Chongqing, a warm-humid region where HDMs dominate indoor allergens, with participants recruited via local child health facilities. In total, 132,341 social media documents and 2,275 caregivers of children <18 years were included in the analysis. The main outcomes included social media discourse patterns and domain-specific KAP levels across five dimensions: foundational knowledge (K1), recommended control knowledge (K2), attitude toward social media topics (A1), attitude toward recommended methods (A2), and control practices (P). Stratified analysis was conducted by two exposure variables: child age (≤3 years vs >3 years) and caregiver social media activity (active vs. inactive). Results: LDA topic modeling identified five distinct topic clusters in the social media content. Commercial, emotional, and misleading content collectively dominated the information landscape, accounting for 83.3% of included documents, with commercial content often systematically conflating the concepts of “disinfection” and “mite elimination”. Only 16.7% was classified as health educational content focusing on HDM allergy prevention. The average KAP levels of K1, K2, A1, A2, and P domains were 62.9%, 84.7%, 57.0%, 37.8%, and 25.8%, respectively. Social media emerged as the primary knowledge source (80.7%), with methodological knowledge gaps (47.5%) being the top implementation barrier. Caregivers of children ≤3 years had significantly lower self-rated knowledge (23.5% vs. 28.3%, P=.01), stronger endorsement of recommended methods, but also greater information overload (OR 1.39, 95% CI 1.15-1.67, P<.001) and decision difficulties (OR 1.23, 95% CI 1.01-1.52, P<.001). Socially active caregivers showed better performance across multiple items in five domains, but also increased non-recommended practices (ultraviolet irradiation: OR 1.85, 95% CI 1.35-2.53, P<.001) and misconception acceptance (allergy impact exaggeration: OR 1.39, 95% CI 1.04-1.87, P=.03). Conclusions: Complex and suboptimal KAP levels exist, particularly among caregivers of young children (≤3 years). Social media activity associates with both enhanced implementation of control practices and elevated misconception endorsement. These findings reveal critical educational gaps and the necessity of social media intervention. Clinical Trial: Not applicable.

  • Examining inequities in the use of Continuous Glucose Monitors Among People

    Date Submitted: May 6, 2026
    Open Peer Review Period: May 6, 2026 - Jul 1, 2026

    Background: Continuous glucose monitoring (CGM) offers clinical and behavioural benefits for people with type 2 diabetes (T2D), including improved glycaemic control and enhanced self-management. However, important evidence gaps remain regarding whether CGM use is equitably distributed across patient groups and whether Objective: To examine the relationship between CGM use among individuals with type 2 diabetes (T2D) and a range of patient characteristics, including socio-demographic factors linked to health inequities, digital health literacy, clinical characteristics, and service utilisation. Methods: A cross-sectional online survey was conducted in November 2024 among adults in the UK with self-reported type 2 diabetes (T2D), recruited via the YouGov panel. The primary outcome was self-reported CGM use. Predictor variables included PROGRESS-Plus characteristics (age, gender, ethnicity, religion, education, occupation, household income, disability, and social engagement), digital health literacy (eHEALS scale), clinical characteristics (disease duration, current treatment, and complications), overall health status (number of long-term conditions), and healthcare utilisation (frequency of visits). Descriptive statistics and multivariable logistic regression were used to examine associations between CGM use and patient characteristics. Results: Among 403 participants, 12.7% reported CGM use. Nearly half of participants were aged 65 years or older, and 56.80% were male. Most participants were White 83.90% and lived in urban areas. Higher odds of CGM use were observed among insulin users (OR=3.80, 95% CI: 1.6–9.22, p<0.001). No other demographic, clinical, or service utilisation variables were statistically significantly associated with CGM use. Conclusions: CGM use was primarily driven by insulin therapy, consistent with established clinical pathways within the National Health Service that prioritise access for this group. No significant variation was observed across demographic, socioeconomic, or health literacy-related characteristics, suggesting no clear evidence of inequalities in this sample. These findings indicate potentially equitable access, although further research in larger and more diverse populations is needed to confirm these patterns.

  • Perceptions, Responsibility, and Implementation of AI-CDSS in VTE Prevention: A Qualitative Study

    Date Submitted: May 6, 2026
    Open Peer Review Period: May 6, 2026 - Jul 1, 2026

    Background: Background: Artificial intelligence–enabled clinical decision support systems (AI-CDSSs) are increasingly deployed for venous thromboembolism (VTE) prevention. However, healthcare professionals’ perceptions and experiences of these systems across diverse regional, occupational, and specialty contexts remain poorly understood, with limited evidence on how AI integration influences clinical workflows, responsibility allocation, and professional trust within multi‑tiered healthcare systems. Objective: Objective: This study aimed to systematically investigate healthcare professionals’ perceptions and experiences of using AI-CDSS for VTE prevention across different institutional levels and clinical roles in China. Methods: Methods: A nationwide qualitative study was conducted using semi‑structured interviews with 23 healthcare professionals from diverse institutional levels and clinical roles. Data collection proceeded until thematic saturation was reached. All interviews were transcribed verbatim and analyzed using inductive thematic analysis. Results: Five core themes were identified: (1) AI reduces workload but complicates clinical responsibility; (2) patient involvement is perceived as beneficial yet problematic; (3) digital readiness shapes implementation feasibility; (4) trust in AI varies by professional role; and (5) responsibility and risk remain ambiguous after AI introduction. Facilitating factors included clearly defined responsibility assignment, comprehensive training, incentive mechanisms, and institutional oversight. Key barriers comprised economic costs, additional workload burden, and complex hospital approval processes. Conclusions: Our findings reveal structural tensions arising from the interaction between professional roles, institutional readiness, and responsibility distribution during AI integration. These results underscore the need for tiered, role‑specific implementation strategies and provide practical insights for the sustainable deployment of AI in VTE prevention.

  • Background: Complex digital interventions that integrate electronic patient-reported outcome measures (ePROM) into clinical practice in cancer have the potential to improve quality of life, increase survival, and reduce health resource use and costs. Such systems can help oncology patients self-manage chemotherapy symptoms, reduce workloads for clinicians through automated decision support, and resolve problems earlier. However, there is a need for more research on the cost-effectiveness of such interventions. Objective: This review aims to (1) summarize and evaluate the quantitative and qualitative evidence related to the cost-effectiveness and economic evaluation methods of ePROM-integrated interventions, and (2) extract data and validate assumptions useful for health economic modelling of ePROM-based treatment strategies. Methods: We searched for original English-language papers published on or before March 2025 on Ovid (including MEDLINE and Embase), Scopus, and the International Health Technology Assessment Database (INAHTA) using search strings that combined terms related to ePROMs, health economics, and cancer/oncology. We included papers reporting health economic-related outcomes for ePROM interventions designed for adult cancer populations and excluded screening tools and conference abstracts. Results: We included 34 publications from 27 unique studies, and identified and analyzed 26 ePROM-integrated interventions within these. Most (23/26) of the included interventions explicitly described some form of alert handling and automated decision support based on remote ePROM monitoring. 5/34 publications presented full cost-utility analysis results, of which 3 were characterized by high uncertainty and a lack of clear differences in costs and health outcomes between ePROMs and standard care, while 2 presented strong evidence of cost-effectiveness due to quality-of-life improvements, reduced hospitalizations, and potentially more autonomy in health-related travel (e.g., ePROM-monitored patients can drive or walk to the hospital instead of using taxis or ambulances). A further 5/34 publications reported partial health economic results (e.g., cost-consequence, budget impact), of which 1 detected no difference in strategies, while 4 reported lower health resource use and costs of ePROMs, mainly due to hospitalization reductions. 12/27 studies included a qualitative component but mostly focused on user experience and design-related themes; only 2/12 of these addressed economic-specific themes (e.g., changes in workflow and resource use due to ePROM implementation and integration), indicating some potential for time saving due to ePROM monitoring. Conclusions: There is some evidence that ePROM-integrated interventions can be cost-effective in cancer care, but the evidence base remains limited. Where evidence does exist, cost-effectiveness appears driven by reduced hospitalization and improved quality of life. Qualitative research within the included studies rarely addressed economic questions. We provide a detailed parameter extraction for use in future economic modelling and recommend research priorities, including quantitative mapping of ePROM symptom data onto health resource use patterns, and qualitative work exploring how ePROM implementation affects clinical workloads and patient-perspective costs.

  • Background: Depressive disorders are one of the most prevalent psychiatric disorders globally and impose considerable individual and societal burdens. Psychotherapy, including cognitive behavioral therapy, is recommended as a first-line treatment especially for mild to moderate depressive disorders. However, face-to-face psychotherapy is often limited by issues of accessibility and cost. Digital therapeutics (DTx) have gained increasing attention as alternatives for overcoming these hurdles. With advances in digital technology, digital placebos have been increasingly adopted as comparators in the clinical trials for DTx. However, the characteristics of the clinical trials, the magnitude of digital placebos and their moderators remain poorly understood. Objective: The objectives of this study were to investigate the characteristics of clinical trials using digital placebos as comparators, and to assess the magnitude of the digital placebo effects and their moderators on depressive symptoms measured by Patient Health Questionnaire-9 (PHQ-9). Methods: The blind randomized clinical trials (RCTs) evaluating PHQ-9 by setting digital placebos as comparators were identified by searching MEDLINE, Scopus, Web of Science, PsycINFO, CINAHL, Cochrane Central Register of Controlled Trials, ClinicalTrials.gov, ISRCTN in November 2025. The characteristics of the RCTs and of the digital placebos were reviewed systematically. The meta-analysis including sub-group analyses and meta-regressions were conducted to investigate the magnitude and the moderators of the digital placebos. Results: 29 articles and 30 studies with 5680 participants were included in this systematic review and meta-analysis. The most common trial design was 2-arm, parallel-group study conducted in a single country, adopting “Replaced” and “Mobile” as the placebo approach and delivery type, respectively. The pooled effect size for all the included studies was Hedges’ g = 0.44 (95% CI 0.29 to 0.59) with an overall I2 = 93.2 %. Subgroup analyses showed moderate-to-large and statistically significant placebo effect in the group of primary psychiatric disorders (Hedges’ g = 0.69; 95% CI 0.40 to 0.99). Meta-regressions indicated that the group of primary psychiatric disorders and baseline PHQ-9 score were the independent moderators of the digital placebo effects and the major contributing factors of the high heterogeneity (R2 = 51.5%). Conclusions: Statistically significant digital placebo effects were observed on depressive symptoms, and target population and baseline PHQ-9 score were identified as the independent moderators. These findings would have implications for the planning of future DTx clinical trials using digital placebos for depressive symptoms.

  • Scoping Review of Interventions Used to Improve Self-Management Among Adults with Hypertension

    Date Submitted: May 5, 2026
    Open Peer Review Period: May 6, 2026 - Jul 1, 2026

    Background: Hypertension remains a major contributor to cardiovascular morbidity and mortality worldwide, and long-term blood pressure control depends greatly on patients’ ability to engage in daily self-management. Interventions such as mobile health, web-based education, SMS reminders, telemonitoring, nurse-led support, and structured patient education are increasingly used to improve hypertension self-management. However, the evidence is diverse in terms of intervention type, setting, delivery mode, and outcomes measured. Objective: This scoping review aimed to map the range of interventions used to improve self-management among adults with hypertension and summarize the outcomes targeted by these interventions. Methods: A scoping review was conducted using evidence identified from PubMed, CINAHL, Cochrane Library, and Web of Science. Eligible studies included adults with hypertension and evaluated interventions designed to improve self-management or related outcomes, including medication adherence, blood pressure monitoring, lifestyle modification, self-efficacy, health literacy, quality of life, and blood pressure control. Data were charted according to author, year, country, study design, population, intervention, outcomes, and key findings. A descriptive numerical summary and thematic synthesis were used to map the evidence. Results: A total of 76 studies were included. Most studies were published between 2022 and 2025, indicating growing interest in hypertension self-management interventions. Randomized trials accounted for the largest proportion of included studies. Mobile app and SMS-based interventions were the most common intervention category, followed by education or self-management training, family/community support, digital monitoring or telehealth, web-based programs, and nurse- or clinician-led support. Blood pressure control was the most frequently assessed outcome, followed by self-management behaviors, medication adherence, self-efficacy, health literacy, knowledge, and quality of life. Many interventions improved systolic blood pressure, medication adherence, self-monitoring, and self-management behaviors, although some studies reported stronger behavioral than clinical effects. Thematic analysis showed that effective interventions commonly combined structured education, digital delivery, self-monitoring, feedback, reminders, and human support. Conclusions: The evidence shows that hypertension self-management interventions are expanding rapidly, particularly through digital and mobile health approaches. Interventions appear most promising when they combine education, monitoring, feedback, reminders, and professional or family support rather than relying on technology alone. More context-sensitive research is needed in low-resource settings, especially in Africa, to determine which intervention components are most feasible, scalable, and effective for improving blood pressure control and long-term self-management.

  • Background: Background: Vestibular disorders are common, burdensome, and frequently misdiagnosed, particularly in non-specialist settings where history-taking is often incomplete or inconsistently structured. Digital health tools that standardize symptom elicitation could improve diagnostic triage, but most existing systems rely on static questionnaires or rule-based logic. Large language models (LLMs) offer a more flexible alternative through adaptive, natural-language consultations, but prospective evidence from real clinical workflows remains scarce. Objective: Objective: This study aimed to benchmark LLM diagnostic performance using static vestibular histories and to prospectively evaluate a locally deployed conversational LLM agent embedded in outpatient vertigo clinics. Methods: Methods: We conducted a two-phase diagnostic accuracy study. In the history-based evaluation (HBE), 10 LLMs and 5 senior otolaryngologists independently reviewed 227 structured vertigo histories, including 138 real patient questionnaires and 89 expert-simulated cases. In the prospective clinical evaluation (PCE), 176 outpatients with vertigo or dizziness were included in the analytic cohort across five centers in China. A nurse-assisted tablet-based agent powered by DeepSeek-R1, deployed locally within institutional infrastructure, conducted multi-turn symptom-history dialogues. The agent received history information only and did not receive physical examination or ancillary test findings. Attending clinicians were blinded to the agent output and recorded final clinical diagnoses after routine assessment. The primary outcome was Top-1 diagnostic concordance with the reference diagnosis; secondary outcomes included HBE Top-3 accuracy and disorder-specific performance. Results: Results: In the HBE, Gemini-2.5-pro and o1 achieved the highest Top-1 accuracy (69.6%; 95% CI 63.2%-75.5% for each), and no LLM significantly outperformed the specialist panel majority vote (63.9%; 95% CI 57.3%-70.1%; McNemar test, P>.10 for all models). In the PCE, the median age was 51 years (IQR 38-61), and 128 of 176 participants (72.7%) were women. The conversational agent matched the reference diagnosis in 140 of 176 patients (79.55%; 95% CI 72.8%-85.2%). Concordance was highest for benign paroxysmal positional vertigo (43/44, 97.73%; 95% CI 88.0%-99.9%) and Ménière disease (27/30, 90.00%; 95% CI 73.5%-97.9%), and lower for vestibular migraine (32/45, 71.11%; 95% CI 55.7%-83.6%) and persistent postural-perceptual dizziness (19/28, 67.86%; 95% CI 47.6%-84.1%). Conclusions: Conclusions: A locally deployed, history-only conversational LLM agent achieved approximately 80% concordance with specialist final diagnoses in a prospective multicenter outpatient workflow, with particularly high performance for benign paroxysmal positional vertigo. These findings support the development of conversational LLMs as clinician-facing tools for structured history-taking and diagnostic support, especially in settings with limited vestibular expertise. Future studies should test whether such systems improve clinical decisions, reduce unnecessary resource use, and maintain safety across languages and health care settings.

  • Socio-Cultural Challenges and Design Implications for Ethical AI in Healthcare: A Systematic Review

    Date Submitted: May 4, 2026
    Open Peer Review Period: May 5, 2026 - Jun 30, 2026

    Background: Artificial intelligence (AI) is increasingly embedded in healthcare, yet its benefits remain unevenly distributed due to persistent concerns regarding bias, inequity, and socio-cultural misalignment. Although existing Ethical AI frameworks typically emphasize universal principles, they often insufficiently address the socio-cultural contexts in which AI systems are developed, implemented, and used. Objective: This systematic review aimed to examine how socio-cultural factors shape ethical challenges in healthcare AI, influence the interpretation of ethical principles, and inform context-sensitive design and governance strategies. Methods: Following PRISMA 2020 guidelines, we conducted a systematic search of PubMed, IEEE Xplore, and Web of Science for studies published between 2018 and 2025. Eligible studies addressed ethical issues related to AI in healthcare through a socio-cultural lens. A thematic synthesis combining inductive and deductive coding was used to analyze reported challenges, context-dependent ethical interpretations, and proposed mitigation approaches. Results: A total of 49 studies were included. The findings show that ethical challenges in healthcare AI are deeply embedded in structural inequalities, data collection, curation, and documentation practices, institutional conditions, and cultural norms rather than being purely technical problems. Key challenges included algorithmic bias, underrepresentation of minorities in datasets, cultural and linguistic mismatches, limited transparency and trust, and systemic disparities in access to AI technologies. The reviewed literature proposed a broad range of technical, design-related, and governance-oriented strategies, but these remained fragmented and were rarely integrated systematically across the AI lifecycle. Based on this synthesis, the study proposes the Inclusive Ethical AI Framework (IEAF), a socio-technical framework that systematically translates socio-cultural context into context-sensitive ethical interpretations and actionable design and governance decisions across the AI lifecycle. Conclusions: The findings highlight that ethical challenges in healthcare AI are fundamentally shaped by socio-cultural context and cannot be addressed through technical solutions or universal ethical principles alone. Instead, effective and equitable AI systems require the systematic integration of socio-cultural considerations into data practices, system design, and governance across the AI lifecycle. Clinical Trial: PROSPERO CRD420251058607; prospectively registered.

  • Quality Criteria for Cancer Patient Portal Content: Framework Development and Pilot Audit Study

    Date Submitted: May 1, 2026
    Open Peer Review Period: May 1, 2026 - Jun 26, 2026

    Background: Patient-facing cancer portals are increasingly used to provide education, support interpretation of results, navigate services, and guide self-management across the cancer journey. However, variation in content quality, transparency, readability, accessibility, and governance can undermine equity, safety, and trust. Objective: To develop and present EU-CiP20 as a first-phase, evidence-informed, operational, and auditable framework of quality criteria for cancer patient portal content. Methods: We synthesised established instruments and authoritative guidance on online health information quality, health literacy and plain-language communication, transparency and conflicts of interest, patient engagement, privacy and data protection, digital governance, accessibility, and AI-related safety. Candidate criteria were harmonised from a broader evidence-mapped set (EU-CiP30) into a streamlined taxonomy (EU-CiP20) using explicit consolidation rules and an auditable mapping trail. Each category was operationalised into four observable sub-criteria and scored using a pragmatic 0-2 scale. EU-CiP20 is presented as an initial comprehensive framework to be refined in the next phase through stakeholder focus groups, an online survey with affected cancer patients, expert inquiry, and a Delphi expert panel, with the aim of reducing the 20 criteria to a final operational core of approximately 10 criteria. Results: EU-CiP20 comprises five domains and 20 categories spanning accessibility and comprehensibility; evidence and content governance; relevance and personalisation; human-centred design and empowerment; and ethics, safety, and trust. In the pilot, adjusted EU-CiP20 totals ranged from 19.5% to 40.6%. The most consistent gaps were governance signals required for portal readiness, including named clinical ownership, explicit review cycles, evidence traceability, and accessibility auditability. Comparator tools characterised content-level strengths but did not fully capture these governance risks. Conclusions: EU-CiP20 offers a practical and auditable first-phase approach to strengthen governance of patient-facing cancer portal content. It complements existing information-quality instruments by linking readability, evidence governance, relevance, empowerment, transparency, safety, and digital trust within a single operational taxonomy. The work is not yet complete: the current 20-criteria framework will be refined through stakeholder focus groups, an online survey with affected cancer patients, expert inquiry, and Delphi expert panel consensus to produce a shorter final set of approximately 10 criteria, followed by assessment of inter-rater reliability, feasibility, sensitivity to change, and real-world implementation impact.

  • Background: Diffusion of innovations theory posits that inequalities arising from the early adoption of new technologies, such as telemedicine, are likely to decrease over time. However, evidence is scarce on the evolution of inequalities related to individual telemedicine adoption over time. Objective: This study aims to assess changes in age and socioeconomic inequalities in telemedicine adoption in Japan from 2020 to 2024. Methods: We used data from a nationwide, internet-based panel survey of the general population in Japan. Participants aged 18–75 years who completed both the 2020 baseline and 2024 follow-up surveys were included. The primary outcome was self-reported telemedicine adoption (ever use at each survey). Using multivariable logistic regression models, we regressed telemedicine adoption on (1) indicators of age and socioeconomic status at baseline, (2) survey year, and (3) their interaction, adjusting for other demographic, socioeconomic, and health-related characteristics. We then estimated the adjusted prevalence of telemedicine adoption in 2020 and 2024 for each age and socioeconomic group. Results: We included 10,818 participants (mean [SD] age, 49.7 [16.8] years; 50.7% women). In 2020, 271 participants (2.5%) reported telemedicine adoption; by the 2024 follow-up survey, this increased to 840 participants (7.8%). The prevalence of telemedicine adoption was lower among older individuals, those with lower educational attainment, those with medium income (vs high income), and unemployed individuals (vs upper non-manual workers) in 2020. While the prevalence increased across groups from 2020 to 2024, the increases were smaller among older age groups (70–75 years: +1.0 percentage points [pp] vs 18–29 years: +13.2 pp; difference-in-differences, −12.1 pp; 95% CI, −18.3 to −6.0 pp). Similarly, increases were smaller among unemployed individuals than among upper non-manual workers (+2.8 vs +5.8 pp; difference-in-differences, −3.0 pp; 95% CI, −4.7 to −1.2 pp). Changes in the prevalence of telemedicine adoption did not vary significantly by educational attainment, urban vs rural residence, or income level. Conclusions: Despite growth in telemedicine adoption from 2020 to 2024, age-related and occupational inequalities widened, and educational inequalities persisted, underscoring the need for strategies to reduce age-related and socioeconomic barriers to telemedicine adoption.

  • Longitudinal Modeling or Monitoring of Depression in Speech: A Systematic Review

    Date Submitted: Apr 30, 2026
    Open Peer Review Period: Apr 30, 2026 - Jun 25, 2026

    Background: Depressive disorders are a leading cause of disability worldwide, and more than 40% of people who experience a single depressive episode will experience recurrence. It is, therefore, essential that people living with a depressive disorder are able to access appropriate means of monitoring, to identify recurrences and enable timely interventions. Existing monitoring methods are burdensome for both clinicians and patients, but previous research into automated depression diagnosis has demonstrated links between participants’ depression severity and speech features. Longitudinal depression modeling through speech aims to build on these links and provide automated methods of long-term depression monitoring. Objective: This systematic review collates existing research into the monitoring or modeling of changes in depression severity, through its impact on speech. Methods: We searched the ProQuest, Scoups, Web of Science, PubMed and IEEE Xplore databases for studies relating to the longitudinal modeling of depression in speech. Publications of any age were acceptable, but only English-language studies were included. All studies underwent quality appraisal using the CASP cohort study checklist. Results: We retrieved 22 relevant documents from the database searches, and a further 40 documents through citation chasing and manual searching. The observational periods employed by these studies varied from 7 days to 18 months, and sample sizes of 16-954. Speech features such as speaking rate and pause duration show promising sensitivity to changes in depression severity. However other features, such as average energy velocity, exhibit conflicting trends across different studies - as does the generalizability of prosodic and acoustic features between languages. Conclusions: We identified significant methodological variation within the data collection, feature extraction, and modeling stages of the studies. While there is evidence to suggest that speech features are sensitive to changes in depression severity, some findings are inconsistent between studies. We advocate for greater clarity and consistency in the reporting of methods to support comparisons of findings between studies and generalizability testing. Future work could explore the predictive capacity of speech to identify oncoming depressive episodes. Clinical Trial: PROSPERO CRD420251003661; https://www.crd.york.ac.uk/PROSPERO/view/CRD420251003661.

  • Liability and Standard of Care in AI-Driven Psychiatric Practice: A European Viewpoint

    Date Submitted: Apr 30, 2026
    Open Peer Review Period: Apr 30, 2026 - Jun 25, 2026

    Artificial intelligence is increasingly entering psychiatric care through decision-support systems, digital phenotyping tools, suicide-risk prediction models, documentation assistants, and conversational agents. These technologies may improve access, consistency, and personalised care, yet they also redistribute clinical authority and complicate liability when harm occurs. This article examines how European law and psychiatric ethics should respond to this shift. It argues that liability in AI-driven psychiatry cannot be understood only as a product-defect issue or only as a malpractice problem. Because psychiatric practice depends on interpretation, testimony, contextual judgment, and therapeutic alliance, the relevant standard of care must remain human, even when technologically augmented. The article advocates an augmented-clinician model in which AI informs but does not replace psychiatric reasoning. After outlining the European regulatory framework, including the AI Act, the Medical Device Regulation, the General Data Protection Regulation, the revised Product Liability Directive, and the European Health Data Space Regulation, the article analyses the implications of the withdrawal of the proposed AI Liability Directive and the persistence of divergent national tort regimes. It then examines psychiatric risk vectors, including automation bias, testimonial injustice, bias in mental health datasets, therapeutic chatbots, suicide prediction tools, passive monitoring, and large language model documentation. The discussion proposes a layered accountability model that links developers, deployers, and clinicians while preserving therapeutic integrity, patient rights, and legal clarity.

  • Background: Electronic nicotine delivery systems (ENDS) are at the center of global public health debate. China is the largest producer of e-cigarettes while the U.S. has the largest consumer market, yet analyses of news coverage of ENDS comparing China and the United States (U.S.) remain limited. Objective: The primary objective of this study is to identify and compare dominant themes in ENDS-related news coverage across leading broadcast-branded digital outlets in China and the United States, and to assess how these themes and coverage volume changed over time. Methods: We conducted a thematic analysis of 470 ENDS-related stories from January 1, 2020, to July 30, 2025, from four leading broadcast news digital media platforms: CNN.com and FoxNews.com in the U.S.; CCTV.com and ifeng.com in China. Using a single theme approach, coders identified core themes for each article based on prespecified rules and a hierarchical decision structure. Frequencies and proportion of each core theme were summarized for the overall sample and stratified by country. Pearson chi-square tests and binary logistic regression models were conducted to examine cross-national differences with false discovery rate (FDR) adjusted p-values. Temporal changes in themes were examined and visualized. Results: In U.S. coverage, the most prevalent themes were policy and regulatory governance (32.1%), youth appeal, flavors, and school responses (22.4%), and health risks, harms, symptoms, and dependence (13.9%). In Chinese coverage, the most prevalent themes were commercial practices and market dynamics of ENDS (26.0%), policy and regulatory governance (23.4%), and enforcement and compliance (15.7%). Cross-national differences in themes were consistently observed between the two countries. Between 2020 and 2025, coverage in China transitioned away from commercial and market themes toward greater focus on illicit substances and enforcement, while U.S. coverage showed relatively stable focus on commercial market with a gradual increase in enforcement-related reporting. Conclusions: Broadcast news in China and the U.S. may actively shape how ENDS are defined as a public issue and what policy responses appear legitimate. Chinese coverage tends to stress commercial activity and enforcement, whereas U.S. coverage more often foregrounds youth risks and regulatory debates. These distinct thematic patterns may influence risk perceptions and policies in each country and are important to consider in comparative media and public health research.

  • Digital health interventions to prevent post-traumatic arthritis after traumatic knee injury: a scoping review

    Date Submitted: Apr 28, 2026
    Open Peer Review Period: Apr 28, 2026 - Jun 23, 2026

    Background: Traumatic knee injuries (TKI) are common, associated with a 4-6 times increased risk of post-traumatic knee osteoarthritis (PTOAK) over the subsequent 15–20 year period. There is clear evidence that risk can be reduced, but long-term care availability is limited, prompting the development of DHIs (digital health interventions) such as wearable devices, telehealth innovations and mobile apps. Objective: To evaluate existing DHIs against the OPTIKNEE consensus guidelines for PTOAK prevention and investigate adoption into practice. Methods: A search of 7 online databases and the grey literature was completed from inception to 03/06/2025, complemented by hand searching government, charity and university websites for reports and technical prototype papers concerning DHIs to support care after TKI. DHI features were mapped to the OPTIKNEE recommendations, evaluated against the health-technology pathway to identify development stage, and implementation analysed using NPT (Normalisation Process Theory). Results: 81 reports, 53 peer-reviewed and 28 other, concerning 49 distinct DHIs were found. They were designed for injuries of the anterior cruciate ligament (ACL, n=12); ACL meniscus (n=15); meniscus (n=3); ACL or meniscus (n=2), bone (n=2), patella dislocation (n=1), and 14 were non-specific. No DHIs addressed all OTPIKNEE recommendations, however the eight most complete reported 4/7 components, including exercise, information provision, patient reported outcome measures, goal setting and overall patient outcome. A remote, self-assessed strength evaluation was not reported in any DHI. NPT analysis typically demonstrated low DHI adoption levels, and no clear correlation with health technology pathway stage. The DHI with the highest adoption into routine practice, according to NPT, was ‘getUbetter’ with 56% positive scores. Conclusions: There are many available, or developing, DHIs but none include the content recommended by OPTIKNEE to reduce the risk of PTOAK. Further, there is negligible evidence of DHIs being adopted into usual care. There is a clear need to develop guideline-compliant DHIs to support effective prevention.

  • Providing consultation recordings to patients in German routine cancer care: A mixed-methods pilot study

    Date Submitted: Apr 28, 2026
    Open Peer Review Period: Apr 28, 2026 - Jun 23, 2026

    Background: The provision of audio recordings of medical encounters to patients, referred to as consultation recordings, is a well-established intervention to address information needs like recall and comprehension in cancer care. Despite these benefits, consultation recordings are not routine practice. Furthermore, research on consultation recordings in Germany is lacking. Objective: This study aims to pilot test consultation recordings in routine cancer care in Germany and assess feasibility of implementation and perceived effects from patients’ perspective. Methods: Using a sequential mixed methods approach, we assessed consultation recordings’ use, usability, acceptability, appropriateness, influencing factors, and perceived effects. Consultation recordings were piloted in an outpatient setting. Adult cancer patients were eligible to participate. Four weeks after the recorded consultation, participants received a quantitative questionnaire. In addition, a selection of participants were qualitatively interviewed. Quantitative data was analyzed using descriptive statistics, qualitative data using a combination of Practical Thematic Analysis and qualitative content analysis. Results: Ninety-seven consultations were audio-recorded and provided to patients. Seventy participants returned the quantitative survey (response rate 72.2%) and 16 participated in qualitative interviews. Most participants listened to the consultation recording and experienced improvements in recall, comprehension, and feeling informed. Routine implementation of consultation recordings was desired by many. The results suggest that patients perceive consultation recordings as feasible. However, we encountered organizational implementation challenges. Conclusions: This study provides initial evidence on the patient-perceived feasibility of consultation recordings in German routine cancer care. Consultation recordings have the potential to help patients navigate complex medical information. However, organizational implementation challenges hinder their uptake. Future research could investigate technically easier solutions suited to the German healthcare context.

  • From Measurement Failure to Privacy Infrastructure: Reframing Contact Tracing Governance for the Next Pandemic

    Date Submitted: Apr 27, 2026
    Open Peer Review Period: Apr 28, 2026 - Jun 23, 2026

    Effective infectious disease control rests on a foundational principle: no measurement, no understanding; no understanding, no control. The COVID-19 pandemic exposed, with devastating clarity, how thoroughly this principle can fail in public health practice. Transmission chains spread invisibly; contact histories, mobility patterns, and biosignals essential for control were never systematically collected. The necessary sensors and digital technologies existed — the fundamental reason measurement failed was not the absence of technology, but the absence of privacy infrastructure that would allow people to share data with confidence. This failure has structural roots. The objects of measurement in infectious disease control are not physical phenomena but human beings, and measurement therefore inevitably engages the core of privacy: contact histories, social relationships, and bodily states. This asymmetry — whereby greater measurement precision deepens privacy intrusion — manifested acutely in COVID-19 contact tracing apps. Designs that prioritized privacy lost epidemiological utility; designs that prioritized utility were rejected through public distrust. Neither direction achieved sufficient measurement. This Viewpoint reframes the problem. Privacy protection is not a constraint that impedes infectious disease control; it is the enabling condition upon which effective measurement depends. Existing regulations and technical approaches were not designed from this premise, and have therefore been unable to break the cycle of structural distrust. As one institutional approach to filling this gap, we present VRAIO (Verifiable Record of AI Output), which integrates democratic rule-setting, metadata declaration, independent third-party verification, tamper-proof ledgers, and violation-deterrence incentives. When privacy infrastructure is established, the foundational scientific principle "no measurement, no understanding; no understanding, no control" will begin to operate freely in infectious disease control for the first time. This opens the path toward high-resolution epidemiology and precision intervention: a new public health paradigm that simultaneously pursues strengthened disease control and the preservation of individual autonomy and social freedom, without dependence on blanket social restrictions.

  • Background: With the rapid development and widespread application of artificial intelligence (AI) technology, AI has demonstrated high accuracy and reliability in medical practice, and patients' trust in algorithmic has gradually increased. However, in clinical practice, disagreements may still arise between algorithmic recommendations and clinical expert experience, and such disagreements can affect patients' trust. To date, however, the impact of these disagreements on patients’ medical trust and the strategies for addressing them have not been systematically reviewed. Objective: To systematically map the impact of disagreements between AI recommendations and clinical expert judgment on patients’ medical trust, identify influencing factors based on Mayer’s integrative model of organizational trust, and summarize strategies to enhance trust. Methods: Following Joanna Briggs Institute (JBI) scoping review methodology and Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews (PRISMA-ScR) guideline, we systematically searched Web of Science, PubMed, Embase, Scopus, and EBSCO up to March 2026, limited to English-language literature. Studies focusing on patients' trust in the context of disagreements between AI and expert opinions were included. Data were charted using the Population, Concept, Context (PCC) framework. Guided by Mayer’s integrative model of organizational trust, influencing factors were analyzed through a framework synthesis approach across the dimensions of ability, benevolence, integrity, and trustor propensity. The protocol was pre-registered on OSF (Registration DOI: 10.17605/OSF.IO/AHSGD). Results: A total of 2,630 records were identified, and 26 studies were ultimately included after screening, including six qualitative studies, seven quantitative studies, three mixed-methods studies, five theoretical studies, and five review articles. These studies were conducted across 10 countries and were published mainly between 2022 and 2026. Disagreements were concentrated in clinical diagnosis and risk assessment, treatment planning and medication decision-making, clinician–patient communication and intelligent interaction, as well as emerging application scenarios. In situations of disagreement, patients commonly expressed skepticism toward both algorithms and experts; overall, however, patients tended to trust experts more than algorithms. Data security and privacy risks, insufficient communication, AI accuracy and reliability, demographic and socioeconomic characteristics, and patients’ disease and health status were identified as high-frequency factors influencing patients’ medical trust. Six trust-enhancing strategies were extracted: transparency and explainability, patient participation and shared decision-making, clinician–patient communication and role positioning, institutional regulation and governance, education and capacity building, and privacy protection and data security. Conclusions: In situations of disagreement between AI and clinical experts, patients’ medical trust is dynamically shaped by ability, benevolence, integrity, and individual-contextual multiple interacting factors. Strengthening transparency, communication, and governance is essential for fostering trust in human–AI collaborative healthcare.

  • Deep Learning Algorithms for Predicting Intraoperative Hypotension: A Systematic Review and Meta-Analysis

    Date Submitted: Apr 25, 2026
    Open Peer Review Period: Apr 25, 2026 - Jun 20, 2026

    Background: Intraoperative hypotension (IOH) is associated with myocardial injury, acute kidney injury, perioperative stroke, and 30-day mortality, yet conventional blood pressure monitoring remains reactive rather than anticipatory. Deep learning (DL) algorithms applied to continuous physiological waveforms represent a rapidly expanding paradigm for early IOH prediction, but the comparative performance of distinct DL architectures and the influence of prediction-window length, input data modality, IOH reference standard, and analysis unit on diagnostic accuracy have not been systematically synthesised. Objective: To quantify the pooled diagnostic accuracy of DL-based IOH prediction models and to identify methodological and clinical factors that modify their performance. Methods: PubMed, Embase, Web of Science, and the Cochrane Library were searched through March 2026. Methodological quality was appraised with the PROBAST+AI tool and overall certainty of evidence with the GRADE framework. A bivariate random-effects model generated pooled sensitivity, specificity, and the area under the summary receiver operating characteristic (SROC) curve, with heterogeneity quantified by τ²(Se), τ²(Sp), and the inter-study correlation ρ. Threshold effect was tested with Spearman’s correlation, publication bias with Deeks’ test, and clinical utility with Fagan’s nomogram. Prespecified subgroup analyses (prediction window, DL architecture, input modality, IOH reference standard, analysis unit) and Bayesian random-effects meta-regression explored heterogeneity sources. Results: Twelve studies were included; nine contributed 22 validation datasets to the quantitative synthesis. The pooled sensitivity was 0.78 (95% CI 0.73–0.81), specificity 0.88 (0.82–0.92), and SROC-AUC 0.87 (0.83–0.90); the diagnostic odds ratio was 24.7 (16.1–37.9), positive likelihood ratio 6.31, and negative likelihood ratio 0.26. Heterogeneity was τ²(Se) = 0.25, τ²(Sp) = 1.04, and ρ = −0.28; no significant threshold effect was detected (Spearman ρ = 0.29, P = 0.20). The 5-minute window achieved the highest performance (sensitivity 0.81, 95% CI 0.77–0.85; specificity 0.91, 0.84–0.95). Meta-regression identified DL architecture as the only significant moderator of specificity (P = 0.02), with hybrid CNN-RNN exceeding pure CNN (β = 1.77, 95% CI 0.45–3.09); no covariate significantly moderated sensitivity. Deeks’ test showed no statistically significant publication bias (P = 0.06). At a 10% pre-test probability, post-test probabilities were 41% (positive) and 3% (negative). GRADE certainty was Low. Conclusions: Deep learning models for IOH prediction achieve moderate diagnostic accuracy, with hybrid CNN-RNN architectures and 5-minute prediction windows showing the most favourable performance. The universal absence of formal calibration assessment, scarce external validation, and geographic concentration of the evidence base constrain immediate clinical translation. Prospective multinational validation with mandatory calibration reporting and patient-level evaluation is required before DL-based IOH alerts can be safely integrated into perioperative decision support. Clinical Trial: PROSPERO CRD420261377604.

  • Background: Providing care to a family member or friend with a serious illness like cancer increases risk for poor physical, psychological, and functional health outcomes. Despite their critical role, family caregivers (FCGs) are rarely screened in clinical settings for the wide range of factors that may put them and the person they care for at risk for poor outcomes. Mobile health (mHealth) applications can efficiently facilitate access to high-quality health information for FCGs; however, few are clinically integrated. Objective: This study aimed to evaluate the usability of CareCheck, an mHealth-based digital risk screening tool designed to enable family caregivers' self-awareness of potential caregiving-related risks for adverse health and psychosocial outcomes and to support health care professionals in personalizing interventions that address FCGs' specific risk factors. Methods: We conducted a usability testing study of CareCheck using two evaluation methods: quantitative measurement with a modified 5-item Mobile Health App Usability Questionnaire (MAUQ) and exploratory qualitative thematic analysis based on feedback from FCGs and trained staff. FCGs of individuals with gynecologic cancer were recruited through the inpatient unit and the outpatient gynecologic oncology clinic of a Comprehensive Cancer Center. Participants completed CareCheck and the usability questionnaire via the mHealth app installed on tablets. Staff observed the assessment process and provided feedback. Results: A total of 56 CGs and 2 trained staff participated in the usability study. The mean MAUQ score was 6.49 (SD = 1.06) out of 7, indicating high usability. Qualitative analysis identified recommendations in three categories: 1) Improvements to CareCheck ; 2) Perceptions of CareCheck’s Usability and Functionality, and 3) Clinical Implementation Considerations for CareCheck. Conclusions: FCGs and staff found CareCheck to be user-friendly and easy to navigate. While further iterations are needed to refine content and optimize integration with clinical workflows, CareCheck demonstrated potential as a clinically integrated tool for identifying and addressing FCG risk for poor social, psychological, or health outcomes in gynecologic oncology care settings.

  • Background: Background: Hypertension remains a predominant global risk factor for cardiovascular disease. Conventional follow-up models frequently fail to address the requirements for real-time monitoring and sustained intervention, whereas mobile health (mHealth) offers a transformative trajectory for chronic disease management. Despite a surge in relevant literature, the diversity of intervention modalities and the fragmented nature of existing evidence necessitate a systematic synthesis. Objective: Objective: This study aimed to comprehensively evaluate the efficacy of mHealth in hypertension management through a systematic review combined with evidence mapping, identifying research gaps to provide evidence-based insights for precision nursing and future research directions. Methods: Methods: A systematic search was conducted across PubMed, Web of Science, Cochrane Library, and Embase for randomized controlled trials (RCTs) involving mHealth interventions for hypertension, with the search period extending through February 2026. Literature was screened according to PICOS criteria, and methodological quality was appraised using the Cochrane Risk of Bias tool (RoB 1.0). Visual analytics, including Sankey diagrams and bubble plots, were employed to characterize the associations between intervention modalities and clinical outcomes. The study protocol was prospectively registered on the Open Science Framework (URL: https://osf.io/2vkwu). Results: Results: A total of 106 publications (comprising 108 RCTs) were included. Publication volume has increased significantly since 2018, with the United States (31 papers) and China (19 papers) being the primary contributors. The intervention paradigm has evolved from rudimentary SMS reminders to a "closed-loop" management model centered on "App + Remote Monitoring," which demonstrates the most robust and consistent positive evidence for blood pressure (SBP/DBP) control and goal attainment rates. Blood pressure parameters occupied the "core evidence layer," while therapeutic adherence and disease knowledge formed the "behavioral evidence layer". Conversely, BMI, mental health, and quality of life remained in the "peripheral evidence layer," characterized by a notably higher proportion of non-significant results. Methodological quality was generally moderate-to-high with robust randomization; however, the implementation of blinding faced prevalent high risks due to the inherent nature of the interventions. Conclusions: Conclusion: mHealth significantly enhances hypertension management efficacy through a digital "monitoring-feedback-adjustment" loop, yet it encounters bottlenecks in achieving profound lifestyle modifications (e.g., weight management) and psychological interventions. Clinical decision-making should prioritize multicomponent interventions featuring real-time interaction. Future research should focus on long-term (>1 year) follow-up and cost-effectiveness transformation in resource-limited settings.

  • Background: Molecular tumor boards (MTBs) generate highly technical recommendations. The language used in their protocols is rarely accessible to patients. Lay-language patient protocols could support patient-clinician communication, yet manual production is difficult to sustain in high-volume oncology settings. Large language models (LLMs) may offer scalable drafting assistance, yet clinical usability remains largely uninvestigated under real-world deployment constraints. Existing evaluations rely predominantly on synthetic data or closed-source models that are incompatible with strict data protection requirements. Objective: This study evaluated whether open-weight LLMs can provide clinically usable drafting support for German MTB patient protocols under real-world deployment constraints and developed a transferable evaluation framework for patient-facing text generation. Methods: Eight open-weight LLMs were evaluated under zero-shot (A1) and one-shot (A2) prompting with constrained decoding, which ensures section-schema compliance. Automatic evaluation used ROUGE-1, BERTScore-F1, WSTF4, and DistilBERT-based complexity using a corpus of 316 MTB protocols and 47 expert-written patient protocols. For expert evaluation, seven medical oncologists evaluated 50 protocols from the best-performing model across three ISO 9241-11 usability dimensions using fine-grained error annotation, perceived post-editing effort (PPEE), and net promoter score (NPS). Critical errors were defined as bearing the risk of patient harm. Results: Llama-3.3-70B-Instruct achieved the strongest automatic performance. Across models, A2 significantly improved most automatic metrics compared to A1. However, expert usability evaluation showed the opposite picture: the proportion of protocols containing at least one critical error doubled under A2 (40% vs. 20%) compared with A1, and the dominant error type shifted from language (37%) errors to factual errors (48%). Overall, 6.1% of the annotated paragraphs contained errors. Median PPEE was 2 (low) and median NPS was 7. Detractors (46%) outweighed promoters (29%), which signals clinical hesitation toward routine adoption. Conclusions: Prompting strategies that improve automatic metrics can simultaneously increase the number of critical errors. Surface-level metric gains were, therefore, insufficient proxies for clinical safety. Nonetheless, the low paragraph-level error rate and favorable PPEE suggest that structured open-weight LLM generation may be a useful drafting aid in a clinician-supervised setting. The proposed evaluation framework establishes a text-quality-focused basis for future assessment of patient-facing LLM applications in real-world clinical settings.

  • Background: Adults may experience subjective cognitive decline (SCD). However, it is unclear whether SCD is related to measurable cognitive impairment, particularly women ages 40 to 60 and early dementia. Further, Medicare has mandated assessment of cognitive and memory function in individuals over 65 as part of the Medicare Annual Wellness Visit. In order to assess possible impairment and change over time, efficient, objective measures of SCD are needed. Objective: To assess the relationship between performance on an online continuous recognition task (CRT, MemTrax) and age, sex, and memory concern. Methods: This study evaluated CRT performance in participants aged 21-99 who enrolled in an online program (HAPPYneuron) to measure mental functions, including those who reported concerns about them. This program asked participants if they had complaints about their memory, and then the program offered them the opportunity to assess cognition using the CRT. This CRT instructs individuals to attend to visual stimuli (50 images) and respond as quickly as possible to repeated images (25 images). The CRT components were used to measure learning and memory (as related to HITs, response to a repeated image), executive function (as related to CRs, correctly not responding to an initial image presentation), and processing speed (HIT-RTs, average response time to HITs). Results: Analysis of 18,178 (5,795 males, 32%; 12,383 females, 68%) only included those who answered the sex, age, and memory questions. There were 11,786 (65%) between 40 and 70 years of age. Females outnumbered males by over two-fold, beginning about 35 years of age, peaking at 55 years of age at over three-fold, and falling below two-fold at about 65 years of age. Approximately 30% more men complained of memory problems than those who did not, primarily 30 – 60 years old. About 80% more women complained of memory problems, over two-fold more than women who did not, 30-50 years old. The number of HITs, number of CRs, and HIT-RTs varied little between men and women. While those without memory complaints generally performed better than those with memory complaints, there was little difference in performance levels for each group between males and females. For all groups, there was a gradual reduction of performance over age for HITs and CRs and a slowing of HIT-RTs. Conclusions: Most subjects were 40-65, more than twice as many females, suggesting that these demographics have a relationship to concern about SCD. However, there was little difference between males and females for the various CRT components, though SCD was associated with impairment. Age-related declines were progressive, the largest being in slower processing speed, presumably to compensate for age-related changes in cognitive function. Present results suggest clinicians may use these metrics to quantify patient concerns expressed in the primary care setting. Clinical Trial: none

  • Momentary Mood State Detection using Smartwatches: Algorithm Development and Validation

    Date Submitted: Apr 20, 2026
    Open Peer Review Period: Apr 21, 2026 - Jun 16, 2026

    Background: Mental health encompasses not only chronic conditions such as depression or anxiety, but also acute fluctuations in mood that unfold over minutes to hours and can disrupt daily functioning. These transient states, such as sudden fatigue, irritability, or low energy, remain largely invisible to current digital health approaches, which typically aggregate behavioral and physiological data over days or weeks to detect trait-level conditions. The ability to detect momentary mood shifts in real time carries significant clinical promise: continuous affective monitoring could enable early detection of mental health crisis, support clinical decisions and clinical trials with continuous mood measurements, and improve occupational safety with detection fo states like fatigue or confusion. However, affective computing research has demonstrated that while physiological signals carry information relevant to mood, most prior work relies on controlled laboratory settings where performance degrades substantially in naturalistic environments, or employs research-grade devices with proprietary sensors unavailable on consumer hardware. Bridging this gap between laboratory-validated sensing and real-world momentary mood detection is essential for translating these clinical possibilities into practice through just-in-time adaptive interventions. Objective: This study investigates whether continuous sensing from a low-cost, opensource smartwatch can support detection of multi-dimensional momentary mood states in naturalistic settings, using personalized models with on-device computation. Methods: We conducted a 7-day field study in which participants (N=10) wore Bangle.js 2 smartwatches that continuously collected physiological and contextual data, including heart rate, accelerometry, barometric pressure, temperature, and GPS, while prompting hourly mood self-reports using the Brunel Mood Scale (BRUMS) across six mood dimensions (tension, depression, anger, vigor, fatigue, confusion) and additional affective and physical states. All feature extraction was performed on-device. We developed personalized mood detection models using best-subset regression across multiple feature combinations. Results: Personalized models decoded momentary states with mean R2 values ranging from 0.09 (pain) to 0.31 (vigor). Fatigue, happiness, vigor, and depression were the most reliably decoded dimensions (mean R2 = 0.26–0.31). Cross-subject decoding was substantially lower, confirming that personalization is essential for accurate mood inference. Including privacy-preserving location features did not significantly improve prediction accuracy beyond physiological and contextual sensors alone. Conclusions: This work demonstrates that a broad range of momentary mood states can be decoded from low-cost, open-source wearable sensors as people go about their daily lives, bridging the gap between controlled laboratory studies and real-world momentary assessment. The finding that personalized models substantially outperform generalized approaches underscores the need for individual calibration in affective computing systems. The on-device, privacy-preserving architecture establishes a foundation for future closed-loop adaptive interventions in clinical and occupational contexts, including continuous monitoring of high-risk psychiatric populations, early warning systems for substance use relapse, and real-time assessment of cognitive and emotional fitness in safety-critical work environments. Clinical Trial: N/A

  • Background: Patients with vestibular schwannoma often experience postoperative vestibular dysfunction, including vertigo, dizziness, and imbalance, which severely impair daily functioning and quality of life. Effective and accessible rehabilitation strategies are therefore essential. Objective: To evaluate the effects of a mobile health-based vestibular rehabilitation program in patients following unilateral vestibular schwannoma surgery. Methods: A prospective randomized controlled trial was conducted. A total of 60 patients who underwent unilateral vestibular schwannoma surgery at the Otology Center, Eye & ENT Hospital, Fudan University, from October 2023 to May 2025, was enrolled and randomly assigned to either the control group or the intervention group (n = 30 each) . Both groups underwent a 90-day vestibular rehabilitation program that targeted the gaze stability exercises, balance training and gait training. The intervention group received self-assessment, video-based guidance, symptom recording, and automated adherence monitoring via a customized mobile app, whereas the control group received face to face guidance and maintained their records using paper diaries. The primary outcomes was the between-group difference in the change in Dizziness Handicap Inventory (DHI) score from baseline (preoperative) to 90 days postoperatively. Secondary outcomes included DHI change at 30 days, visual analog scale (VAS) scores, and incidence of vestibular symptoms. Demographic, baseline, and outcome data were collected at admission and on postoperative days 7, 30, and 90. Intention-to-treat analysis was performed; missing continuous data were handled using multiple imputation, and dichotomous variables were imputed with the last observation carried forward. Independent t‑tests or Mann-Whitney U tests were used for continuous variables, and chi‑square tests for categorical variables. Results: The 90‑day follow-up primary outcome assessment was completed by 52 of 60 patients (86.7%), with 8 non-responders in the intervention group and 6 in the control group. No significant differences in baseline demographic or clinical data were observed between the two groups, whereas tumor size distribution differed significantly (χ2= –2.513, P=.012), with larger tumors in the intervention group. For the primary endpoint, no significant between‑group difference was observed in the change in DHI score from baseline to 90 days (P>.05). For secondary outcomes, no significant differences were found in DHI change at 30 days or VAS scores at any time point (all P>.05). However, on postoperative day 7, the incidence of postural symptoms was significantly lower in the intervention group than in the control group (53.33% vs 83.33%, χ² = 6.239, P = .012). Conclusions: The mobile health–based vestibular rehabilitation program demonstrated comparable efficacy to conventional face‑to‑face rehabilitation in improving vestibular function and may accelerate recovery during the early phase of unilateral vestibular loss.These findings support the feasibility of mHealth as an alternative approach for postoperative vestibular rehabilitation in patients with vestibular schwannoma. Clinical Trial: Chinese Clinical Trial Registry ChiCTR2200056123; https://www.chictr.org.cn/showproj.html?proj=150939

  • Virtual Reality for Cognitive Mastery in Airway Trauma Management: A Prospective Randomized Controlled Trial

    Date Submitted: Apr 17, 2026
    Open Peer Review Period: Apr 18, 2026 - Jun 13, 2026

    Background: Innovation in teaching methods is essential for advancing medical education, particularly for trainees developing crisis management skills. Virtual reality (VR) offers access to immersive, scalable, and accessible learning environments, but its effectiveness compared to traditional mannequin-based simulation remains underexplored. Objective: This prospective randomized controlled trial evaluates the efficacy of VR-based simulation versus traditional gold-standard mannequin-based training in enhancing medical trainees’ knowledge acquisition and application of decision-making concepts for airway trauma management. Methods: Forty medical students were randomized to either the VR (intervention) group or the Mannequin (control) group. Participants engaged in airway trauma management training using their assigned modality. Both groups completed a pre-and post-intervention test to evaluate knowledge acquisition, and undertook a mannequin-based crisis scenario one week after training to evaluate knowledge application. Results: Both groups demonstrated significant knowledge acquisition (VR: mean improvement +2.0/15, P=0.006; Mannequin: mean improvement +3.2/15, P<0.001), though no statistically significant differences were observed between groups (P=0.15). The VR group achieved self-assessed readiness and knowledge saturation faster, on average, than the Mannequin group. Both groups, on average, were successful in the post-training knowledge application test, however, the Mannequin group outperformed the VR group (mean difference: 1.58/15, P=0.021), and recognized a potential airway injury more quickly (P=0.004). Nevertheless, students in the VR group reported greater engagement and satisfaction, expressing a preference for VR as a future learning modality. Conclusions: Overall, VR-based simulation is a promising and engaging method for teaching airway trauma management and demonstrates comparable knowledge acquisition to traditional mannequin-based training. However, mannequin-based simulation still confers advantages for applied performance. Further studies using larger samples, multiple scenarios, and VR-based assessments are needed. Clinical Trial: ClinicalTrials.gov NCT04451590; https://clinicaltrials.gov/study/NCT04451590

  • Wearable Eye-Tracking Metrics From Smart Glasses for Cognitive Assessment: A Prospective Digital Health Study

    Date Submitted: Apr 17, 2026
    Open Peer Review Period: Apr 18, 2026 - Jun 13, 2026

    Background: Reading performance is closely associated with cognitive function, and eye-tracking metrics have emerged as sensitive, non-invasive indicators of cognitive processes. Recent advances in wearable technologies, such as smart glasses, enable continuous and scalable measurement of eye movements in real-world settings. However, rapid, accessible, and objective tools for cognitive screening remain limited. Integrating wearable eye-tracking with multidomain cognitive assessment may provide a scalable digital approach for early detection of cognitive impairment. Objective: To evaluate the association between wearable eye-tracking metrics and cognitive performance and to assess the feasibility of a smart glasses–based reading task as a rapid digital screening tool. Methods: In this prospective observational study, Mandarin-literate adults were recruited from Taipei Veterans General Hospital between May to August 2025. Participants completed a standardized reading task while wearing J7EF Gaze smart glasses. Eight eye-tracking metrics were recorded, followed by the six-domain cognitive assessment using gaze-based interaction. Associations were analyzed via multivariable regression adjusted for age and sex. Results: A total of 134 participants were enrolled (mean age 68.2 ± 13.4 years). Age correlated with all six cognitive domains and the total score, while sex exhibited smaller, domain-specific effects. In unadjusted analyses, total reading time showed the strongest associations with all cognitive domains (p < 0.001), while fixation duration, fixation frequency, and long or ultra-long fixations showed selective associations with orientation. After adjusting for age and sex, total reading time, total fixation time and average fixation time remained significant predictors. Conclusions: Total reading time emerged as a robust, age-independent eye-tracking marker of cognitive performance. Fixation-related metrics showed domain-specific associations, particularly with the puzzle game hobbies domain of the cognitive assessment. Wearable smart glasses with integrated eye tracking may provide a rapid, non-invasive, and scalable approach for digital cognitive screening in clinical and real-world settings.

  • Background: Large real-world data sources offer a unique opportunity to study the health of diverse ethnic groups. High-quality and accessible ethnicity data is needed to maximise this potential. Objective: To validate a newly developed ethnicity phenotype in the Oxford-Royal College of General Practitioners (RCGP) Research and Surveillance Centre (RSC). Methods: Retrospective cross-sectional study of individuals registered at a practice within the Oxford-RCGP RSC on 4th December 2024. An updated ethnicity phenotype was implemented and validated. Ethnicity data quality was assessed by evaluating completeness, distribution, and accuracy through external validation against estimates from the 2021 UK Census. Results: Of 21,902,852 individuals, 88.63% (19,412,154) had a recorded ethnicity following the implementation of the updated ethnicity phenotype. There was a marked improvement in the recording of granular (19-point) ethnicity data, with completeness increasing from 69.06% (15,126,835) to 88.63% (19,412,154) with the updated phenotype. There was significant variation in the completeness of ethnicity data according to demographic subgroups. The proportion of individuals in each ethnicity group was within 3.56 percentage points of the 2021 Census estimates for the same ethnicity group across England. Larger relative differences were observed for non-White ethnic groups. Conclusions: The updated ethnicity phenotype provides high-quality and granular ethnicity data based on official classifications for almost 90% of individuals. The overall ethnicity breakdown in the Oxford-RCGP RSC population was broadly similar to 2021 UK Census estimates. The updated ethnicity phenotype supports secondary uses of primary care CMRs, providing high-quality and accessible ethnicity data to study the health of diverse ethnic groups.

  • Background: Chronic or persistent pain can limit an individual’s ability to work or be productive at work, creating substantial societal and economic burden. Despite this, evidence-based work‑related advice and support for people with chronic pain is inconsistent. The Pain‑at‑Work Toolkit was co‑created with people living with pain, health care professionals, and employers to increase knowledge of employee rights, improve access to workplace support, and provide guidance on lifestyle behaviors that facilitate pain self‑management. Objective: This study aimed to establish the feasibility of conducting a definitive cluster randomized controlled trial comparing access to the Pain‑at‑Work Toolkit plus optional occupational therapist telephone support (intervention) with support-as-usual (SAU) from the employer (control). Primary outcomes were feasibility, acceptability, usability, and safety of the digital intervention. We also assessed the feasibility of candidate primary and secondary outcomes and tested research processes required for a definitive trial. Methods: We conducted an open‑label, parallel, two‑arm pragmatic feasibility cluster randomized controlled trial with exploratory health‑economics analysis and a nested qualitative study. Eligible organizations were based in England, had ≥10 employees, and were recruited through professional networks and direct approach. Individual participants were working adults aged ≥18 years, with internet access and self‑reported chronic pain interfering with their ability to undertake or enjoy productive work. A restricted 1:1 cluster‑level randomization allocated organizations to the intervention or control arms. After organizational and individual consent, participants completed a web‑based baseline survey (T0) assessing work capacity, health and wellbeing, and health‑care resource use. Follow‑up occurred at 3 months (T1) and 6 months (T2). Feasibility outcomes included recruitment, intervention fidelity (delivery, reach, uptake, engagement), retention, and follow‑up completion. Qualitative interviews with employees and stakeholders at T2 explored acceptability and contextual factors influencing delivery and uptake. Results: A total of 380 employees from 18 organizations participated. Recruitment exceeded targets at both organizational and individual levels, demonstrating strong feasibility and engagement. Follow‑up completion met predefined feasibility criteria but showed variability, largely due to employee turnover, providing realistic attrition estimates for a future trial. Outcome measures showed acceptable completion rates and variability, supporting their suitability for use in a future definitive trial. Employees and stakeholders reported high acceptability of the Pain‑at‑Work Toolkit, and qualitative findings highlighted improved knowledge, confidence, and self‑management among employees. Stakeholders endorsed the Toolkit’s relevance and practicality within workplace settings. Conclusions: The feasibility trial demonstrated that the Pain‑at‑Work Toolkit and trial procedures are acceptable, scalable, and deliverable across diverse workplaces. Findings identify responsive outcome measures, emphasize the need for strengthened retention strategies, and support the Toolkit’s use as a standalone intervention. Overall, the study provides a strong foundation for progressing to a fully powered definitive trial. Clinical Trial: ClinicalTrials.gov NCT05838677; https://clinicaltrials.gov/study/NCT05838677 International Registered Report Identifier (IRRID): DERR1-10.2196/51474

  • Machine Learning–Enabled Interventions in Palliative Care: A Scoping Review

    Date Submitted: Apr 14, 2026
    Open Peer Review Period: Apr 14, 2026 - Jun 9, 2026

    Background: Machine learning-based prognostic models have been increasingly developed to support palliative and serious illness care, particularly in oncology. While predictive accuracy has improved substantially, less is known about how these models are translated into real-world interventions and whether they meaningfully influence clinical practice and patient care. Objective: This scoping review aimed to map and synthesize interventional studies that used machine learning-enabled interventions to support palliative and serious illness care, with a focus on model integration strategies and reported effects on communication processes, care planning, and downstream clinical outcomes. Methods: Following PRISMA-ScR guidelines, we conducted a scoping review of peer-reviewed English language studies published since 2015. Searches were performed in PubMed, Embase, Web of Science, and the Cochrane Library. Eligible studies implemented Machine learning-based predictions to trigger or guide real-world palliative care related interventions, including serious illness conversations, advance care planning, or palliative care referral. Results: Eight interventional studies were included, encompassing cluster randomized trials, stepped wedge designs, and real-world implementation studies. Machine learning-enabled interventions were consistently associated with increased documentation of serious illness conversations and advance care planning, particularly when predictive outputs were embedded within clinical workflows through behavioral nudges, automated alerts, or facilitated outreach. In contrast, effects on treatment intensity, health care utilization, and end-of-life costs were limited, inconsistent, or not observed. Conclusions: Current evidence suggests that machine learning-enabled interventions in oncology palliative care are most effective when used to support prioritization and timing of communication related processes rather than to directly alter care trajectories or resource use. Future research should focus on implementation strategies, patient centered outcomes, and equity sensitive evaluation to better translate predictive insights into meaningful clinical impact.

  • Background: The Emergency Intensive Care Unit (EICU) is the core setting for the treatment of critically ill patients, where the diagnostic error rate is more than twice that of general inpatient wards, which seriously affects patient prognosis. Large Language Models (LLMs) have shown application potential in clinical diagnosis, but there is still very limited evidence comparing the diagnostic efficacy of critical care-specific LLMs and general-purpose LLMs in the complex diagnostic scenarios of the EICU. Objective: This study aimed to evaluate and compare the diagnostic accuracy of a critical care-specific LLM (Qiyuan 3.0.1) and three mainstream general-purpose LLMs (GPT5.1, DeepSeek V3.1, Qwen3-32B) in EICU diseases, and to provide evidence-based basis for the selection of intelligent auxiliary diagnostic tools in the EICU. Methods: This was a single-center retrospective paired diagnostic accuracy study, which consecutively enrolled 184 critically ill patients admitted to the EICU of Peking University Shenzhen Hospital from April 2025 to March 2026. Standardized datasets were constructed based on the patients' clinical data, including an initial diagnosis dataset (clinical data within 24 hours after admission) and a final diagnosis dataset (complete course data from admission to discharge). A unified zero-shot learning prompt strategy was adopted, and four LLMs independently generated corresponding diagnoses in a double-blind manner. The consensus diagnosis reached by three senior intensive care physicians with more than 10 years of EICU working experience, who were blinded to the model results, was used as the gold standard. The primary endpoint was the Top-1 accuracy in the final diagnosis stage, defined as the proportion of cases where the first primary diagnosis output by the model completely matched the gold standard. Secondary endpoints included the Top-1 accuracy in the initial diagnosis stage and the number of correct diagnoses in the Top-3 outputs in the final diagnosis stage. Cochran's Q test was used for the overall comparison of accuracy among multiple groups, and post hoc pairwise comparisons were performed using the paired McNemar test with Bonferroni correction for type I error. The Friedman non-parametric rank sum test was used for the intergroup comparison of the number of correct Top-3 diagnoses. Results: In the final diagnosis stage, the overall difference in Top-1 accuracy among the four models was statistically significant (Cochran's Q=20.32, df=3, P=4.57×10⁻⁵). The Top-1 accuracy of Qiyuan 3.0.1 was the highest (64.13%, 95%CI 56.83%-71.00%), followed by GPT5.1 (59.24%, 95%CI 51.83%-66.35%), DeepSeek V3.1 (57.07%, 95%CI 49.64%-64.28%), and Qwen3-32B had the lowest accuracy (51.63%, 95%CI 44.26%-58.98%). Post hoc pairwise comparisons showed that the Top-1 accuracy of Qiyuan 3.0.1, GPT5.1, and DeepSeek V3.1 was significantly higher than that of Qwen3-32B (all adjusted P<0.0083), while no significant difference was found in other pairwise comparisons (all adjusted P>0.0083). A similar trend was observed in the initial diagnosis stage, where only Qiyuan 3.0.1 was significantly superior to Qwen3-32B (adjusted P=0.008). The median number of correct Top-3 diagnoses for all four models was 2.0 (IQR 1.0-2.0), with no significant intergroup difference (Friedman χ²=3.34, df=3, P=0.339). Conclusions: The critical care-specific LLM Qiyuan 3.0.1 has superior Top-1 diagnostic accuracy in EICU diseases compared with some general-purpose LLMs, but the absolute diagnostic accuracy of all included models still has considerable room for improvement. LLMs have potential application value as auxiliary diagnostic tools in the EICU, but their clinical application still requires further optimization and multi-center prospective clinical trial validation.

  • The Role of Incentivisation in Communities of Practice: a systematic review

    Date Submitted: Apr 5, 2026
    Open Peer Review Period: Apr 6, 2026 - Jun 1, 2026

    Background: Incentivisation is increasingly used to maintain engagement and support behaviour change within communities of practice (CoPs), yet its effectiveness across chronic disease contexts remains uncertain. Objective: To examine how incentives are integrated into CoPs and related peer-support models, and to assess their impact on participant activation, engagement, and health-related outcomes. Methods: PubMed/MEDLINE, Embase, Scopus, and CENTRAL were searched from inception to June 2025 using predefined terms relating to CoPs, incentivisation, and patient-centred outcomes. Peer-reviewed empirical studies involving incentivised CoPs or analogous peer-support interventions for adults with chronic conditions were eligible. Four reviewers independently screened studies, extracted data, and assessed risk of bias in line with PRISMA 2020 guidance. Heterogeneity in design and outcomes required narrative synthesis. Results: From 667 records, four randomised controlled trials met inclusion criteria. Financial incentives produced the greatest short-term gains in physical activity, while non-financial approaches such as gamification, points, badges, and structured peer support yielded modest improvements in step count, treatment adherence, or diet quality. No consistent effects were observed for patient activation, self-efficacy, mental health, or quality of life. Engagement moderated effectiveness, although attrition was common. Conclusions: Incentivisation can enhance short-term behavioural outcomes within CoPs, but evidence for sustained psychosocial benefit is limited. Larger, longer-term studies are needed to clarify which incentive strategies deliver durable improvements in engagement and self-management. Clinical Trial: This review was registered on PROSPERO, an international prospective register of systematic review (January 2026, reference CRD420251244276).