Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

The leading peer-reviewed journal for digital medicine and health and health care in the internet age. 

Latest Submissions Open for Peer Review

JMIR has been a leader in applying openness, participation, collaboration and other "2.0" ideas to scholarly publishing, and since December 2009 offers open peer review articles, allowing JMIR users to sign themselves up as peer reviewers for specific articles currently considered by the Journal (in addition to author- and editor-selected reviewers).

For a complete list of all submissions across all JMIR journals as well as partner journals, see JMIR Preprints

Note that this is a not a complete list of submissions as authors can opt-out. The list below shows recently submitted articles where submitting authors have not opted-out of open peer-review and where the editor has not made a decision yet. (Note that this feature is for reviewing specific articles - if you just want to sign up as reviewer (and wait for the editor to contact you if articles match your interests), please sign up as reviewer using your profile).

To assign yourself to an article as reviewer, you must have a user account on this site (if you don't have one, register for a free account here) and be logged in (please verify that your email address in your profile is correct).

Add yourself as a peer reviewer to any article by clicking the '+Peer-review Me!+' link under each article. Full instructions on how to complete your review will be sent to you via email shortly after. Do not sign up as peer-reviewer if you have any conflicts of interest (note that we will treat any attempts by authors to sign up as reviewer under a false identity as scientific misconduct and reserve the right to promptly reject the article and inform the host institution).

The standard turnaround time for reviews is currently 2 weeks, and the general aim is to give constructive feedback to the authors and/or to prevent publication of uninteresting or fatally flawed articles. Reviewers will be acknowledged by name if the article is published, but remain anonymous if the article is declined.

The abstracts on this page are unpublished studies - please do not cite them (yet). If you wish to cite them/wish to see them published, write your opinion in the form of a peer-review!

Tip: Include the RSS feed of the JMIR submissions on this page on your homepage, blog, or desktop RSS reader to stay informed about current submissions!

JMIR Submissions under Open Peer Review

↑ Grab this Headline Animator

If you follow us on Twitter, we will also announce new submissions under open peer-review there.

Titles/Abstracts of Articles Currently Open for Review:

  • Background: Substance use disorders account for a significant portion of the disease burden attributed to mental health globally, but measurement remains suboptimal. Studies assessing substance use typically rely on retrospective recall often over long periods of time. However, the episodic, contextual and event- or time-contingent nature of substance use call into question the validity of these traditional retrospective measurement methods. One method to overcome these limitations is ecological momentary assessment (EMA). EMA methods repeatedly sample participant behaviours and experiences in real time, in the context in which they occur. Objective: This review aimed to systematically identify studies using EMA in substance use measurement, provide a comprehensive overview of the EMA methods used, and to provide a draft framework for reporting and methodological recommendations for future EMA studies in this field. Methods: Studies published between 2018 and 2023 were sourced from PubMed, Medline, Scopus, and PsycINFO via Ovid databases on 31st January 2023 using terms related to EMA, digital phenotyping, passive sensing, daily diary and specific terms for each drug type. Studies that actively or passively assessed thoughts and/or behaviour, in the participants’ natural environment/daily lives, in a repeated manner, at or close to the behaviour of interest (substance use), using either automatic prompts or notifications were included. Studies were included for all populations, any age, in any setting, any study design, including RCTs or experimental designs. This study was preregistered on PROSPERO (CRD42023400418). Results: The search identified 7053 articles of which 858 were reviewed in full, and 273 (n = 70,831 participants) were included and extracted. Most studies were conducted in the United States (80%) and focused on alcohol (78%) and cannabis use (30%) with or without the presence of other substance use. Alcohol and cannabis measurement co-occurred the most in 44 (16%) studies. Psychedelics (2%) were particularly understudied using EMA methods. PCP, bath salts, and inhalants were only measured in one study each. We found limited reporting consistency with respect to compliance, completion windows, attrition rates, survey duration and data collection technologies in EMA substance use studies. Sensing data were measured in a limited number of studies. Conclusions: While EMA is a powerful tool for capturing dynamic behaviours, inconsistencies in reporting and design transparency persist. Improving reporting practices, smart sensing and wearable integration, compliance monitoring alongside expanding EMA to underexplored substances such as psychedelics, will be critical to enhancing data quality and advancing the field.

  • Background: Primary care physicians in resource-constrained settings, particularly within low-income and middle-income countries (LMICs), frequently encounter a "diagnostic gap" when managing complex, rare, or multisystemic pathologies. While Large Language Models (LLMs) demonstrate significant potential to augment clinical reasoning, current state-of-the-art solutions rely predominantly on high-bandwidth cloud infrastructure, limiting their deployment in regions with unstable internet connectivity and strict data sovereignty regulations. Objective: The prevailing technological consensus in computer science suggests that "Agentic Workflows" or Multi-Agent Systems (MAS)—which orchestrate multiple models to simulate collective reasoning—inherently offer superior accuracy and safety compared to single models. However, the comparative efficacy, safety, and cost-effectiveness of complex MAS versus single localised models in offline, hardware-limited environments remain unproven. Methods: We conducted a prospective comparative benchmarking study using the DiagnosisArena dataset, comprising 915 complex clinical cases across 28 medical specialties. To simulate a secure, offline primary care environment, we evaluated five locally deployed single open-source LLMs (GPT-oss-20b Llama3.1-70B, Qwen3-32B, DeepSeek-R1-32B, Gemma3-27B) against two Multi-Agent architectures: a Standard voting ensemble and a novel hierarchical Adaptive Weighted System. All models were hosted on a local server (4×NVIDIA A100) using the Dify platform. Performance was adjudicated against a Reference Standard established by the consensus of three board-certified physicians using a dual-metric system: a 10-point Diagnostic Recall Scale and a comprehensive Hallucination/Safety Index. Inference latency and computational resource utilisation were recorded to assess cost-effectiveness. Results: Contrary to the hypothesis that architectural complexity yields diagnostic precision, single high-performance models significantly outperformed complex ensembles. The single GPT-oss-20b model achieved the highest Diagnostic Recall Score (mean 4.68 [SD 3.82]), statistically surpassing the Adaptive Weighted Multi-Agent System (4.13 [SD 3.43]; p<0.001) and smaller models such as Gemma3-27B (2.89 [SD 3.89]; p<0.001). The Adaptive System, despite utilising dynamic routing, failed to outperform the median score of human physicians (4.22 [SD 3.62]; p=0.432). Furthermore, the inclusion of mid-tier models in the adaptive workflow introduced an "ensemble degradation" effect, significantly lowering the Safety Score compared to the single GPT-oss-20b model (4.99 vs 5.50; p<0.001) and reducing the rate of Top-1 correct diagnoses from 51.58% to 46.89%. Crucially, the single GPT-oss-20b model demonstrated superior efficiency with an average inference time of 30 seconds per case, compared to 200 seconds for the Standard Multi-Agent System—representing an 85% reduction in latency. Conclusions: In the context of clinical diagnosis, architectural complexity does not equate to clinical utility. We identified a phenomenon of "ensemble degradation," where integrating mid-tier models into ensembles dilutes the reasoning capabilities of strong base models through the introduction of diagnostic noise. For global health equity, implementation strategies should prioritise "Lean AI"—localising a single, robust open-source model—rather than orchestrating computationally expensive agent swarms. This approach provides a safer, more accurate, and scientifically validated path for bridging the diagnostic gap in resource-constrained primary care.

  • Consensus-Based Recommendations for Optimizing Diversified TCM Data Collection during Clinical Work

    Date Submitted: Feb 5, 2026
    Open Peer Review Period: Feb 6, 2026 - Apr 3, 2026

    Background: Background: An increasing amount of TCM clinical data can be collected by software and equipment, forming diversified TCM data, which should typically be collected alongside clinical work. TCM diagnosis and treatment data collection is conducted concurrently with clinical work, typically. However, with the limited time, space, and human resources available in clinical work, collecting diversified TCM Data is difficult, which may affect the quality of the collected data. Objective: Objective: To develop recommendations for optimizing diversified traditional Chinese medicine (TCM) data collection. Methods: Method: A working group comprising 12 members was established. Based on previous survey findings regarding the burden of clinical data collection, the group developed a preliminary list of recommendations for optimizing diversified TCM data collection. A Delphi survey was conducted to investigate consensus levels(using a 5-point Likert scale for importance evaluation) on the list items, and open-ended opinions were also surveyed. If experts in the first round propose additions, deletions, or modifications, or if there is a lack of consensus on certain items, a next round of surveys will be conducted to obtain the experts' agreement rate on the related items. Results: Results: A total of 86 experts from China, the United Kingdom, and Singapore completed two rounds of surveys. Following the first Delphi survey, all items achieved agreement scores above 4, with coefficients of variation(CV) below 0.2. The working group revised 12 items based on open-ended opinions and resubmitted them for agreement assessment. All revised items achieved agreement rates of over 95%. Following the two-round survey process, the final version of the recommendations comprises 5 primary domains, 11 sub-domains, and 25 items. Conclusions: Conclusion: This study formulated recommendations for optimizing diversified TCM data collection. It is hoped that these recommendations will help clinical data collectors consider data collection in advance during the design phase

  • How pandemics have reshaped respiratory virus data landscape in Europe? A scoping review

    Date Submitted: Feb 5, 2026
    Open Peer Review Period: Feb 6, 2026 - Apr 3, 2026

    Background: Acute respiratory infections caused by influenza, respiratory syncytial virus (RSV), and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) remain a major public health challenge in Europe. Although surveillance systems for these pathogens are well established, the past two decades have seen a rapid diversification of data streams supporting surveillance and research. This expanding and increasingly complex data landscape, combined with fragmentation across institutions, sectors, and countries, may limit timely evidence synthesis and effective public health decision-making. Objective: This scoping review aimed to identify and characterize data sources used for surveillance and research on influenza, RSV, and SARS-CoV-2 across 12 European countries over the past 20 years, and to examine their evolution over time, their alignment with research objectives, and geographic variation in data availability and use. Methods: We conducted a scoping review using an objective-driven analytical framework. Empirical reports published between January 2005 and September 2025 were identified in Medline, Web of Science, and Embase. Eligible reports focused on influenza, RSV, or SARS-CoV-2 and included data from Western (France, Belgium, Germany, Netherlands), Northern (Denmark, England, Finland, Sweden), Southern (Italy, Spain), and Eastern Europe (Poland, Romania). Clinical and interventional studies were excluded. Reports were classified according to four research objectives: epidemiological monitoring; evaluation of interventions; assessment of disease burden and health outcomes; and analyses of population adherence and trust toward public health measures. Data sources were grouped into nine categories, including surveillance systems, electronic health records (EHRs), registries, claims, surveys, digital, environmental, and integrated datasets. Results: A total of 2,564 empirical reports were included. Over time, respiratory virus research relied on an increasingly diverse set of data streams. While surveillance systems remained central, particularly for epidemiological monitoring, their relative dominance declined. From 2020 onward, there was a marked expansion in the use of EHRs, registries, claims data, digital sources, and linked or integrated datasets, alongside increased use of open-access data. Data source use varied by research objective: surveillance data predominated in monitoring and intervention evaluation; EHRs in studies of risk factors and treatment effectiveness; surveys in seroprevalence and public trust analyses; and claims data in assessments of economic burden. Substantial geographic disparities were observed. Northern European countries more frequently used linked and multi-source datasets, whereas Western and Southern Europe relied more often on open-access or single-source data. Conclusions: Respiratory virus surveillance and research in Europe have expanded and diversified substantially over the past two decades, particularly after the Coronavirus disease 2019 (COVID-19) pandemic. However, access to advanced and integrated data streams remains uneven across countries. Strengthening preparedness for future respiratory virus threats will require sustained investment in interoperable data infrastructures, improved data governance, and the responsible use of artificial intelligence to integrate heterogeneous data sources.

  • Background: Talaromycosis and cryptococcosis are prevalent in Southern China and Southeast Asia and are frequently misclassified due to overlapping lesion morphology and limited access to confirmatory testing. Objective: To evaluate the zero-shot diagnostic performance of multimodal large language models in identifying and differentiating cutaneous lesions of talaromycosis and cryptococcosis Methods: Published clinical photographs of cutaneous lesions of talaromycosis and cryptococcosis were systematically retrieved up to 31 August 2025, and seven representative multimodal large language models were benchmarked under a strictly zero-shot setting using a standardized prompt template and a predefined output schema. Latency, unanswerable/invalid response rates, and diagnostic performance were evaluated using accuracy, precision, sensitivity, specificity, F1-score, and Matthews correlation coefficient. For explanation quality assessment, model-generated texts were independently rated by two clinicians across five dimensions, and hallucination events were quantified. Results: In total, 214 articles (95 for talaromycosis and 119 for cryptococcosis), including 244 talaromycosis cutaneous lesion images and 236 cryptococcosis cutaneous lesion images, were collected for zero-shot evaluation. Most models achieved acceptable performance recognition, among them, ChatGPT-5 achieved the best performance. For comprehensive performance comparison, ChatGPT-5 ranked first across six indicators but exhibited relatively lower sensitivity. Evaluation of the output text quality demonstrated that the diagnostic texts generated by GPT-5 were excellent. The EQI was 70.08, with a hallucination rate of 21.76%. Conclusions: ChatGPT-5 demonstrates feasibility in the recognition of cutaneous lesions of talaromycosis and cryptococcosis under zero-shot conditions and can serve as a potential tool for assisting in the analysis of infectious skin disease images.

  • Background: Task-oriented rehabilitation supported by exoskeletons has the potential to increase therapy intensity, personalization, and accessibility. However, to achieve fully automatic treatment, robotized systems need to analyze therapy in a more complex way than only based on reference trajectories following. Objective: This study investigates the effects of an intelligent, context-aware control algorithm for an upper-limb rehabilitation exoskeleton on patients’ musculoskeletal engagement, compared with constant-admittance robot-assisted therapy and conventional physiotherapist-guided treatment. Methods: A single-session experimental study was conducted with 34 adult participants performing six activities of daily living under three therapy modes: robot-assisted therapy with constant admittance, robot-assisted therapy with an intelligent assist-as-needed algorithm, and physiotherapist-guided therapy. Muscle activity was assessed using surface electromyography of eight upper-limb muscle groups, while joint kinematics were recorded using inertial measurement units. Metrics included EMG power, muscle activation time, joint range of motion, and burst duration similarity indices. Statistical comparisons were performed using the T-test and the Mann-Whitney U-test depending on data normality. Results: Results indicate that the intelligent control strategy engages the musculoskeletal system at least as effectively as constant-admittance control across all exercises. At the same time, more motion control is given to the patient, which is preferable for neuroplasticity training. Compared with physiotherapist-guided therapy, robot-assisted treatment with intelligent control elicited significantly higher and more consistent muscular engagement. Intelligent assistance also modified joint-level motion patterns by reducing compensatory movements, particularly in shoulder–elbow coupling, while maintaining functional task execution. Muscle activation timing patterns during intelligent robot-assisted therapy were more consistent with robotic control than with manual therapy, reflecting altered movement strategies. Conclusions: These findings demonstrate that context-aware, intelligent control in rehabilitation exoskeletons can promote active patient participation, reduce compensatory behaviors, and maintain physiologically meaningful muscle engagement. The proposed approach exceeds the results of recent similar studies, being a promising step toward effective, minimally supervised, task-oriented rehabilitation. Clinical Trial: The experiments were carried out under the KB/132/2024 approval of the Bioethical Committee of the Medical University of Warsaw (https://komisja-bioetyczna.wum.edu.pl/). Written informed consent was obtained from all of the subjects involved in this study.

  • Background: Early diagnosis, accurate severity assessment of acute pancreatitis (AP), and prediction of progression to severe acute pancreatitis (SAP) are critical. We evaluated an electronic medical record (EMR)-embedded large language model (LLM) for these tasks. Methods: The LLM reviewed earliest AP hospitalization records of 261 adults and answered three prompts (diagnosis, severity, and risk of progression to SAP). Results: 224 (85.8%) had mild AP (MAP), 30 (11.5%) moderately SAP (MSAP), and 7 (2.7%) SAP. The LLM diagnosed AP with 89.3% sensitivity and 100.0% positive predictive value (PPV). Severity classification was inconsistent (MAP sensitivity 49.1%, MSAP 66.7%, SAP 42.9%). For progression prediction from initial MAP, the LLM showed high sensitivity (87.5%) but low accuracy (26.8%); Bedside index for severity in acute pancreatitis (BISAP) had higher accuracy (95.5%) but low sensitivity (12.5%). In MSAP, the LLM sensitivity was 85.7% versus BISAP 0%. Conclusions: An EMR-embedded LLM can detect AP and identify many who progress to SAP, but specificity and severity classification require improvement.

  • Background: Background: The digital transformation of healthcare is reshaping how breast cancer patients access and use information, yet little is known about how their digital information behaviours evolve across the illness trajectory. Objective: Objective: To explore stage-specific digital health information behaviours and the cognitive, emotional and social factors shaping decision-making. Methods: Design: Descriptive qualitative study informed by Uncertainty Management Theory. Setting: A tertiary hospital in Shanghai, China. Participants: Fifteen women with breast cancer. Methods: Semi-structured, face-to-face interviews were conducted with purposive sampling across diagnostic, treatment and recovery phases; data were analysed using directed and inductive content analysis within a UMT framework. Results: Results: Five themes emerged, highlighting shifts from passive reception to active screening, complementary use of search engines, social media and AI tools, and the role of trust, emotion and social context in information acceptance or rejection. Conclusions: Conclusions: Digital health information behaviours are dynamic and stage-specific, suggesting phase-tailored, nurse-led digital support.

  • Background: Digital physical exercise interventions offer a scalable solution to combat age-related cognitive decline. While various modalities exist, their comparative effectiveness across different cognitive domains remains unclear, necessitating a systematic evaluation to guide clinical practice. Objective: This study aims to evaluate and rank the comparative effectiveness of different digital physical exercise interventions—including immersive VR (IVR_E), non-immersive exergames (NI_ExG), remote exercise (RE), and VR combined with cognitive training (VR_EC)—on global cognition, executive function, and memory function in older adults. Methods: We conducted a systematic review and Bayesian network meta-analysis of randomized controlled trials (RCTs) published between January 1, 2010, and April 30, 2025. Data sources included PubMed, Embase, and Web of Science. Eligible studies involved older adults (aged ≥60 years) and compared digital physical exercise interventions against routine interventions (RI) or non-intervention (NI). The primary outcomes were global cognition, executive function, and memory function. We estimated standardized mean differences (SMDs) and ranked interventions using the surface under the cumulative ranking curve (SUCRA). Results: A total of 41 RCTs involving 2919 participants were included. For global cognition, IVR_E emerged as the most effective intervention (SUCRA=96.6%), followed by NI_ExG (SUCRA=76.4%); both modalities were significantly superior to RI. Regarding executive function, RE (SUCRA=73.8%) and NI_ExG (SUCRA=69.3%) ranked highest. Notably, NI_ExG was the only intervention to demonstrate a statistically significant improvement over RI in this domain, while IVR_E showed no significant advantage. For memory function, IVR_E was the dominant intervention (SUCRA=82.8%) and was the only modality significantly more effective than RI. Subgroup analyses further indicated that a cumulative training dose exceeding 1000 minutes is critical for observing significant improvements in memory function. Conclusions: Digital physical exercise interventions significantly enhance cognitive function in older adults, but their optimal application is domain-specific. IVR_E appears most effective for global cognition and memory, likely due to high immersion and standardization. Conversely, NI_ExG and RE are preferable for enhancing executive function, potentially offering more scalable alternatives for home-based care. Future interventions targeting memory improvement should ensure sufficient cumulative training duration. Clinical Trial: PROSPERO CRD42025103014

  • Background: Assistive technologies can support independent living among older adults, but uptake is often constrained by attitudes and confidence. The COVID‑19 lockdowns accelerated technology use across all age groups, offering a natural experiment to examine changes in adoption. Objective: This study aimed to examine changing patterns of technology use in older adults, to provide insight as to how service providers can support the use of technology to support independence and well-being. Methods: Two cross‑sectional surveys were conducted in UK retirement villages, one before the pandemic (2020) and one after lockdowns (2023), to assess technology attitudes and use. Semi‑structured interviews with eight participants in a technology trial scheme provided qualitative insights. Results: Technology adoption increased significantly between 2020 and 2023, with older adults reporting greater confidence and comfort in digital use. Self‑education and informal support from family or friends were the most common pathways to adoption. Age‑related differences in confidence observed in 2020 were no longer apparent in 2023, although gender disparities persisted. Interviewees emphasized usefulness and accessibility as key drivers of sustained engagement. Findings demonstrate that the pandemic catalyzed lasting increases in technology adoption among older adults, including increased confidence and ownership. Conclusions: Findings demonstrate that the pandemic catalyzed lasting increases in technology adoption among older adults, including increased confidence and ownership. These results provide evidence for housing providers and policymakers to embed accessible technologies and targeted support in retirement communities, thereby enhancing independence and quality of life in later life.

  • Social media influencer marketing is a digital advertisement strategy that is growing in popularity. Its use has been documented in consumer purchasing behavior but is yet to be described for clinical trial recruitment. In this tutorial, we describe the steps we followed to develop and deploy a social media influencer advertisement for the recruitment of participants into the Groceries for Residents of Southeastern USA to Stop Hypertension (GoFreshSE) trial. We also provide a preparation framework for other studies who would like to use this modality for their own clinical trial recruitment. We used Cameo Business to identify potentially relevant influencers to hire by selecting influencers who were popular in the 3 geographic areas from which GoFreshSE is recruiting. We narrowed down the list of possible influencers by selecting those with ≥100,000 followers on their respective social media platforms (for a wide reach) and charged a cost of ≤$3,000/video. We ultimately selected a former football coach, who provided a high-quality video of him reading an institutional review board-approved script 4 days later. We utilized open source, commercially available tools to edit the video and deployed the 44-second-long video on Facebook and Instagram using Meta’s Advertising platform. Social media influencer marketing through the Cameo Business platform is a rapid mechanism to develop clinical trial influencer recruitment videos.

  • Background: Sample pooling is an essential strategy for optimizing polymerase chain reaction (PCR) resources during infectious disease outbreaks, especially in the beginning. While high-dimensional hypercube pooling strategies—such as those recently highlighted in Nature—offer superior efficiency in low-prevalence settings, they are difficult to implement in practice. The human cognitive and physical limitation to three-dimensional environments makes manual execution of four- or five-dimensional sample arrays prone to significant operational error. Objective: To develop and evaluate a novel "Ternary Card Hypercube Pooling" strategy that simplifies the implementation of multidimensional pooling, making it accessible for laboratory personnel without compromising mathematical efficiency. Methods: We integrated logic from ternary card games (based on sets of three attributes) to create a visual and physical framework for hypercube pooling. This method maps high-dimensional coordinates onto a simplified "card" system, allowing laboratory technicians to organize and track samples using intuitive pattern recognition rather than complex multidimensional mapping. Results: The Ternary Card method successfully translates the efficiency of hypercube pooling into a user-friendly workflow. It maintains the high performance of traditional hypercubic algorithms—allowing for rapid identification of positive samples in a single step in the majority of cases—while significantly reducing the risk of manual pipetting errors and the need for specialized automated equipment. Conclusions: The Ternary Card Hypercube Pooling strategy bridges the gap between theoretical mathematical efficiency and practical laboratory application. By reducing the complexity of sample handling, this method provides a scalable solution for increasing PCR throughput in response to future pandemics, particularly in resource-limited settings. Clinical Trial: NA

  • Background: Despite increasing technical maturity, most clinical artificial intelligence (AI) systems remain confined to pilot or experimental settings, rarely achieving sustained integration into routine healthcare delivery. The persistence of this "pilot trap" is driven primarily by structural and institutional constraints rather than algorithmic performance limitations. Objective: To develop a governance framework that enables the transition of clinical artificial intelligence (AI) from project-based experimentation to durable institutional infrastructure, informed by the establishment of a provincial-level AI platform within a policy-oriented healthcare system in China. Methods: An 18-month real-world institutionalization process of the Hebei Provincial Clinical AI Platform was examined, encompassing the formation of a dedicated Medical AI laboratory, designation as a provincial engineering center, acquisition of regulatory authorizations, and deployment of structured clinical application pathways. Framework construction was grounded in systematic analysis of governance arrangements, policy legitimacy mechanisms, and translational implementation trajectories observed throughout the institutionalization process. Results: The framework comprises six interdependent modules encompassing institutional carrier formation, data and computational infrastructure, ethical and regulatory governance, interdisciplinary operational coordination, translational scaling and regional dissemination, and continuous evaluation. Implementation evidence indicates that governance architecture functions as a prerequisite to, rather than a consequence of, technical deployment. Organizational anchoring, external legitimacy, and coordinating capacity enable AI systems to operate as enduring institutional infrastructure rather than transient technological experiments. The framework reframes clinical AI from an algorithmic artifact to an embedded institutional capability, redirecting implementation logic from technical performance metrics toward governance maturity. Conclusions: Sustainable clinical AI implementation is associated with governance-first rather than technology-first strategies. Effective institutionalization requires the concurrent establishment of organizational ownership, policy legitimacy, and coordinating mechanisms prior to large-scale deployment. Although derived from a policy-oriented healthcare context in China, the core governance functions demonstrate potential transferability across health systems, with institutional mechanisms varying by context while functional requirements remain comparatively stable. The framework offers an operational architecture for health systems seeking AI as infrastructure rather than episodic experimentation. Clinical Trial: NA.

  • Background: Co-design ensures cultural safety of health interventions for Aboriginal and/or Torres Strait Islander communities. However, an intervention developed with one Indigenous community may not be suitable for another geographically and culturally distinct community. Objective: This study aimed to culturally adapt content and features of a mobile health (mHealth) application co-created by communities in one Australian state to better meet the needs of mothers and caregivers of Aboriginal and/or Torres Strait Islander children aged 0-18 years and health professionals in another state. Methods: The study followed the stages of the cultural adaptation stepwise model by Barrera et al. Mothers/caregivers of Aboriginal and/or Torres Strait Islander children aged 0-5 years and their health professionals were recruited from multiple community sites. Data were collected through culturally appropriate yarning circles or interviews facilitated by Aboriginal research staff. Qualitative data were transcribed and inductively analysed to generate themes. The feedback was translated into practical changes that were applied to the mHealth application. Results: Data saturation was achieved after yarning circles with 21 women and seven health professionals. Nine themes were generated from mothers/caregivers’ data: 1) cultural relevance and sensitivity, 2) linking with culturally appropriate services, 3) Use of lay language and more audio-visual content , 4) concerns with mobile data usage, 5) Perceptions about the current content of the Jarjums app, 6) raising children, 7) safety, 8) health and wellbeing of mothers and caregivers, and 9) coordinating health care. Four themes were generated from data collected from health professionals: 1) favourable features of the app, 2) potential barriers to the use of the app, 3) healthcare system access issues, and 4) recommended modifications. Based on feedback received, the mHealth application changes included the addition of information on healthy relationships and raising children, more visual content, and localized service directories for different categories of care and support. Conclusions: A co-designed, culturally sensitive mHealth application is likely to support Aboriginal and/or Torres Strait Islander families facing health disparities due to disruption of Indigenous culture by a foundation for a potential clinical trial for effectiveness evaluation and wider implementation.

  • Background: Quality of Life (QoL) questionnaires are an established instrument designed to assess overall wellbeing and quality of life of patients. They are important in predicting the outcome of the disease and understanding the needs of individual patients. However, their repeated collection imposes substantial burden on both patients and clinical professionals. Many patients seek emotional support and mutual exchange in online communities for peer-support, where they frequently share detailed descriptions of symptoms and treatment experiences, addressing topics covered in QoL questionnaires. The emergence of large language models (LLMs) uncover potential for automatic extraction of relevant QoL information from patient-generated text. Objective: The aim of this study is to evaluate and compare various open-source LLMs and optimization approaches for automated extraction of QoL information from forum posts. Methods: The dataset consisted of 2,683 English-language posts from breast cancer patients recruited on Inspire.com online communities, manually annotated with sentence-level text spans indicating whether and where posts contained information relevant to 53 QoL questions from EORTC QLQ-C30 and QLQ-BR23 questionnaires. 11 open-source LLMs (8B-70B parameters) were evaluated in a zero-shot setup, generating 4,452 post-question predictions per model under two input conditions: post-only and post with additional context. For the best-performing model, additional experiments assessed the impact of chain-of-thought prompting, instruction optimization, few-shot prompting and parameter-efficient fine-tuning. For correctly classified yes/no instances, the overlap between model-generated evidence and human-annotated spans was evaluated. Results: Across 11 evaluated LLMs, GPT-OSS 20B achieved the highest macro F1-score (0.79) in the zero-shot post-only setting. Providing additional context consistently reduced performance of all models. Model size did not correlate with F1-score, with several mid-sized models (14B-30B) outperforming 70B models. For GPT-OSS 20B, chain-of-thought prompting did not improve performance (0.77). Instruction optimization produced results similar to the baseline in both zero-shot and few-shot settings (0.78-0.80). Bootstrap few-shot prompting with random search achieved the highest score overall (0.81). Parameter-efficient fine-tuning decreased performance (0.71). Most classification errors occurred in semantically broad or ambiguous terms and the fallback question. For correctly predicted yes/no answers, model-generated evidence matched or partially matched human-annotated spans in 89% of cases. Conclusions: Open-source LLMs are a promising tool for extracting QoL information that aligns with standardized questionnaire responses from online health forums. Mid-sized models achieved the highest accuracy, particularly in zero-shot, post-only settings. Few-shot prompting can further improve the results. Models were also able to generate evidence spans that closely matched human annotations. However, they consistently struggled with ambiguous and semantically overlapping terms. Overall, automated extraction of QoL information from patient-generated content may offer a faster, lower-cost and low-burden complement to traditional QoL questionnaires, given that limitations such as symptom ambiguity are addressed in future work.

  • Background: Affirming Care for lesbian, gay, bisexual, transgender, and queer (LGBTQ+) populations refers to culturally and clinically competent healthcare that recognizes specific health needs and provides respectful, inclusive, equitable, and non-discriminatory services that are supportive of diverse identities. LGBTQ+ populations face greater discrimination in healthcare, leading to higher levels of unmet health needs than the general population. Very few primary care practices in the United States have training for staff and clinicians on LGBTQ+ healthcare needs. Despite the growing needs for LGBTQ+ affirming care, there are no national standards or requirements for LGBTQ+ cultural competence training for primary-care healthcare providers in the United States. Objective: This study explores the accessibility and quality of online ‘grey literature’ providing LGBTQ+ affirming and culturally competent care information for primary care providers in the United States. Grey literature is produced by government, academic, business, and industry sources in formats not controlled by commercial publishing. Methods: We conducted a Google search of grey literature to identify readily available resources and training materials. Two thousand websites were screened. Those published in a language other than English before January 1, 2014, as well as those that were peer-reviewed literature or behind a paywall, were excluded. Fifty-four websites met the inclusion criteria for a full-text review. Results: We identified six themes from the existing academic literature: (1) affirming physical and visual environments, (2) sexual orientation and gender identity (SOGI) data collections, (3) training on LGBTQ+ health needs, (4) anti-discrimination policies, (5) appropriate, relevant services for LGBTQ+ patients, and (6) use of inclusive language. We then applied these themes as a deductive coding framework to the web-based sources and, during analysis, two additional sub-themes emerged: (1) staff diversity, (2) health inequalities and inequities. Findings revealed that not every web-based source addressed all themes. This unequal distribution of coverage across these themes means that providers must consult multiple web-based sources to obtain a comprehensive understanding. Additionally, existing grey literature resources often lacked depth, technical detail, and practical guidance, making it difficult for primary care providers to access actionable information on LGBTQ+ affirming care. ‘Training on LGBTQ+ health needs’ was the most frequently covered theme, and ‘SOGI data collection’ was the least addressed. Study limitations included geolocation biases and embedded advertisements in the Google search results. Conclusions: The study highlights that grey literature is insufficient for self-guided training. We recommend integrating formal LGBTQ+ affirming care training into medical and nursing curricula, as well as professional associations and continuing education, particularly amid growing federal and state-level restrictions on LGBTQ+ healthcare.

  • Background: The consequences of medication errors are substantial as they pose a significant threat to the high-risk population, including paediatric, neonatal and geriatric patients. Computerised Provider Order Entry (CPOE) systems and clinical decision support systems (CDSS) are increasingly implemented to reduce medical errors by automating prescribing processes and providing real-time decision support. While alerts have been shown to provide value, barriers to widespread implementation exist in the form of alert fatigue and usability problems. Objective: This systematic review and meta-analysis assessed the effectiveness of CPOE and CDSS in reducing medication errors across diverse populations and clinical environments. Methods: A systematic review was conducted following the Preferred Items for Systematic Review and meta-analyses (PRISMA guidelines), with four databases searched up to February 2025 for studies evaluating the effects of CPOE and CDSS implementation on medication error in paediatric and geriatric populations. We included only cohort and prospective studies, not restricted by language or country of publication. Single measures of continuous outcomes on medication error rates were extracted from each study. The Comprehensive Meta-analysis (CMA) was then applied to perform separate analyses to compare the outcome pre-and post-CPOE/CDSS implementation. A random-effect meta-analysis was conducted, with subgroup analyses to assess differences by population, healthcare setting, and system design. The Newcastle–Ottawa Scale was used for quality appraisal. Forest plots and funnel plots were applied for pooled results and publication bias assessment. Results: Fourteen studies met the inclusion criteria (paediatric: n = 12; geriatric: n = 2), all rated as good quality. In paediatrics, 10 of 12 studies reported significant reductions in medication errors post-implementation. Pooled analysis showed error rates were almost threefold higher pre-implementation (OR = 2.97; 95% CI 2.81–3.14), with substantial heterogeneity (I² = 94%) but consistent positive direction of effect. In geriatrics, both studies demonstrated significant reductions with no heterogeneity (I² = 0%) (OR = 2.45; 95% CI 2.29–2.62), though evidence remains limited in scope and setting due to the small number of studies. Descriptive synthesis indicated that CPOE/CDSS can intercept high severity errors, such as overdoses of high-risk medications, before reaching patients, although most studies assessed potential rather than actual harm. Meta‑regression showed study location as a significant moderator, with greater effects in North American studies compared to those conducted in Asia. No publication bias was detected, but regional variation suggests contextual factors such as healthcare infrastructure, informatics maturity and influence system effectiveness. Conclusions: CPOE/CDSS significantly reduces medication errors in special populations, with strong and consistent benefits in paediatrics and promising but limited evidence in geriatrics. Despite heterogeneity in paediatric studies, the direction of effect was uniformly positive. The systems also show potential to reduce the severity of harmful errors, although robust evidence on actual patient harm is lacking. Optimising and tailoring CPOE/CDSS to specific patient populations and healthcare settings, while addressing alert fatigue and workflow integration, are essential to maximise impact. Further research should expand the geriatric and neonatal evidence base, assess long-term outcomes and explore advanced decision support capabilities to enhance patient safety and clinical impact.

  • Application of Ecological Momentary Assessment in Maternal Health Management: A Scope review

    Date Submitted: Feb 1, 2026
    Open Peer Review Period: Feb 2, 2026 - Mar 30, 2026

    Background: Ecological momentary assessment (EMA) enables real-time, repeated evaluation of participants' emotions, thoughts, and behavioral patterns in natural settings. It effectively mitigates the retrospective bias inherent in traditional surveys and facilitates a longitudinal understanding of health status. However, its feasibility, practicality, and methodological details for monitoring and promoting maternal health remain unclear. Objective: To conduct a scoping review of studies on the application of EMA in maternal health management, providing a reference for future research and further promotion of maternal and infant health. Methods: Using the Joanna Briggs Institute (JBI) scoping review guidelines as the methodological framework, we searched the Web of Science, PubMed, CINAHL, Embase, Cochrane Library, China National Knowledge Infrastructure (CNKI), China Biomedical Literature Database, Wanfang Database, and VIP Database. The search covered publications from the inception of each database to December 2025, and the included studies were subjected to a comprehensive analysis. Results: The search yielded 2,989 publications, of which 14 were ultimately included. The findings were summarized across three dimensions: study design characteristics (publication year, country, and study design features, such as sample size, study population, and outcome measure type); EMA data collection methods (EMA schedule characteristics, such as monitoring cycle, duration, and data sampling methods, such as fixed-time, random-time, or event-based sampling); and EMA response-related outcomes (participation rate and response rate). Conclusions: The EMA effectively mitigates the recall bias inherent in traditional assessment methods, offering novel approaches to enhance the quality of maternal health management. This enables longitudinal monitoring of maternal experiences in natural settings, facilitating the early identification of abnormal physiological, psychological, and behavioral issues during pregnancy and postpartum. This allows timely intervention to safeguard maternal and infant health. Future research should refine EMA study designs and implementation formats to fully leverage their potential in promoting maternal health and personalized interventions for maternal-infant wellness. Clinical Trial: Trial Registration: OSF Registries  10.17605/OSF.IO/GMFKZ

  • Background: Unstructured clinical text remains a major barrier to interoperable data reuse and large-scale secondary analysis in healthcare. Large language models (LLMs) have the potential to automate the extraction of structured clinical information; however, their application is limited by the scarcity of high-quality annotated training data. Objective: To address these limitations, this study aims to develop and validate a scalable, privacy-preserving framework that utilizes synthetic data generated from structured Fast Healthcare Interoperability Resources (FHIR) to fine-tune open-source LLMs for the effective extraction of interoperable clinical information from unstructured text. Methods: We evaluated an LLM–based pipeline for extracting structured clinical information from cancer-related discharge letters and mapping it to representations compatible with Fast Healthcare Interoperability Resources (FHIR). To enable large-scale supervised training, we developed a random sample generator that creates synthetic discharge letters using Qwen3 235B by randomly sampling and aggregating structured FHIR data from 41,175 cancer patients. The resulting synthetic discharge letters (n=75k) were paired with their originating structured data, forming a large-scale dataset for fine-tuning MedGemma 27B. Evaluation was conducted on the synthetic test dataset (n=7,500), real-world discharge letters (n=30) which are evaluated by physicians and a medical student, and a comparative one-shot approach using open-source models (Qwen3, LLaMA, and GPT-OSS). Results: The fine-tuned model achieved high extraction performance across multiple clinical entities, including full ICD diagnosis codes (F1 = 0.84), tumor-related information (0.99), laboratory values (0.99), medication names and dosages (0.99), and ATC medication codes (0.94). Extraction of procedure-related information was more challenging but remained reliable, with F1 scores of 0.63 for OPS codes and 0.90 for procedure descriptions. In a one-shot comparison of general-purpose LLMs with the fine-tuned model, the fine-tuned model consistently outperformed general-purpose LLMs in nearly all extraction categories. When applied to real-world discharge letters, performance remained robust, with F1 scores of 78.9% for ICD diagnoses, 86.1% for tumor-related information, 93% for medications, and 61.3% for procedures. Conclusions: These results demonstrate that synthetic text generation from structured clinical data enables effective and scalable training of LLMs for extracting interoperable, multi-entity clinical information from unstructured documentation.

  • Background: In recent years, the field of digital health has grown exponentially, leading to notable benefits such as easier access to health-related information, but also to content saturation and misinformation. Thus, it is crucial to identify digital health tools that provide meaningful value and assess their real-world impact. Objective: This pre-registered study’s goal was to quantitively assess the LONDI platform, a German platform designed for different user groups supporting children with learning disorders. This assessment focused on user groups of mental health professionals (i.e., learning therapists and school psychologists), and was grounded on four of the five RE-AIM-framework dimensions: Reach, adoption, implementation, and maintenance. Methods: Data was collected over a 10-month period, between May first 2024 and March first 2025. The reach dimension was measured via a pop-up questionnaire (N=1324), collecting demographic and professional experience data. The adoption dimension was measured via a second pop-up questionnaire (N=160), measuring user experience (UX) and reuse intention for the platform’s help system. The implementation dimension was measured via web analytics (N= 37,133), measuring reading time for pages intended for mental health professionals. Moreover, this dimension was also assessed by comparing chatbot engagement rates with industry benchmarks. The maintenance dimension was measured via web analytics as well, comparing the usage in the previous (N= 20,496), and the current platform version (N= 37,133) in terms of number and location of users, time spent on the platform, number of actions per visit, and used devices and software. Results: 22% and 10.64% of the users that filled out the first pop-up questionnaire stated that they were learning therapists or school psychologists, respectively, exceeding their percentage in the German population (< 0.01%). The second pop-up questionnaire revealed an overall mean UX score of 1.46, surpassing the benchmark average, and UX ratings predicted intention to reuse. Time spent on the pages intended for mental health professionals was below the time needed to read them. The 0.18% rate of chatbot engagement was very low compared with industry benchmarks of 35-40%. Usage changed in the two compared time periods, and most strikingly, there was an 81.2% increase in the number of users. Conclusions: The study provides evidence to the LONDI platform’s optimal public health impact in terms of the reach, adoption, and maintenance RE-AIM-framework dimensions. Further research and endeavors and are needed to better understand and improve the platform’s impact in terms of the implementation dimension.

  • Balancing Value and Risk: Clinicians’ Perceptions and Adoption of AI-Enabled Clinical Decision Support Systems

    Date Submitted: Jan 29, 2026
    Open Peer Review Period: Jan 30, 2026 - Mar 27, 2026

    Background: The increasing adoption of Artificial Intelligence (AI) in healthcare, particularly within Clinical Decision Support Systems (CDSSs), is transforming clinical practice and decision-making. Although AI-CDSSs hold the potential to improve diagnostic accuracy, operational efficiency, and patient outcomes, their implementation also creates ethical, technical, and regulatory concerns, affecting healthcare professionals’ willingness to adopt these systems. Objective: Building on a value-based perspective, the study integrates the Unified Theory of Acceptance and Use of Technology (UTAUT) framework as determinants of perceived benefits and a risk-based perception model as determinants of perceived risks to develop a unified model exploring clinicians’ behavioural intention to adopt AI-enabled CDSSs. Methods: A self-administered cross-sectional survey was distributed to licensed healthcare professionals to examine how validated factors influence perceptions of risks and benefits. Responses were collected from 215 clinicians across Italy and the United Kingdom. Recruitment was undertaken using email invitations, attendance at academic conferences, and direct approaches within healthcare settings. Results: Perceived Benefits were found to be the strongest positive predictor of clinicians’ intentions to use AI-enabled CDSSs (β=.45, p<.001), whereas perceived risks had a significant negative effect (β=-.18, p=.002). Performance Expectancy and Facilitating Conditions significantly increased the adoption intentions, whereas Effort Expectancy and Social Influence were not significant. Among the risk antecedents, Perceived Performance Anxiety, Communication Barriers, and Liability Concerns were significant predictors of Perceived Risks. The model explained 46% of the variance in the intention to use AI-enabled CDSSs. Conclusions: The findings offer theoretical and practical insights into human factors influencing AI adoption in clinical practice, underscoring the importance of value alignment, professional accountability and institutional readiness, and highlighting the need to foster clinician trust in AI tools beyond the boundaries of technical performance.

  • Background: The COVID-19 pandemic significantly increased adoption of virtual care, including patient-to-provider secure messaging. However, this surge has heightened physician workload and burnout and has raised concerns about message appropriateness and liability among physicians. Objective: This study characterizes secure messaging use in Canadian hospital-based specialty care and explores the experiences of healthcare providers, administrative staff, and patients. Methods: We employed a convergent mixed-methods design, analyzing aggregated electronic health record (EHR) usage data and qualitative interview data. The study was conducted at Women’s College Hospital in Toronto, Canada, across four high-messaging specialty clinics: mental health, rheumatology, dermatology, and surgery. Quantitative data (Oct, 2019-Oct, 2022) detailing message volumes, response patterns, and timing. Semi-structured interviews explored messaging workflows, barriers, and facilitators. Data were analyzed separately, then converged to identify areas of convergence and divergence. Results: Message volumes surged post-pandemic, particularly in mental health. The monthly message rate per patient varied, with higher rates in mental health and rheumatology. Physicians reported negative experiences due to increased workload, lack of compensation, and inadequate integration into clinical workflows. High patient-to-physician ratios and limited nursing support for message triage were associated with a poor messaging experience. Patients and administrative staff valued messaging for its convenience, accessibility, and efficiency. A key finding was the poor engagement of all user groups in decisions regarding messaging implementation. Conclusions: The study highlights a disconnect between the high perceived value of secure messaging for patients and administrative staff and the negative experiences of physicians. Successful implementation requires thoughtful integration into care models, clear guidelines for patient use, and proper triage and "channel management" to guide patients to appropriate visit modalities. Future research should explore triaging algorithms as part of a digital front door, specialty-specific variations and the crucial role of nursing staff in message management.

  • Background: Emotional cognition deficits are a core feature of autism spectrum disorder (ASD) and contribute significantly to social difficulties in affected children. Digital, app-based training may offer scalable, structured practice, but evidence from randomized pilot trials remains limited. Objective: To evaluate the feasibility, acceptability, and preliminary efficacy of the Autism Emotion Cognition Training System (AECTS), a tablet-based, parent-mediated program designed to support emotional cognition in young children with ASD. Methods: We conducted a single-center, two-arm, parallel-group randomized controlled pilot trial between April and October 2025. Children aged 4–8 years with ASD were assigned to AECTS plus treatment as usual (TAU) or TAU alone for 8 weeks. Feasibility and acceptability were assessed in the intervention group using a study-specific mixed-methods questionnaire (25 Likert items and 5 open-ended questions). Preliminary efficacy was explored using the Social Responsiveness Scale (SRS) and the Clinical Global Impression (CGI), with ANCOVA adjusting for baseline SRS scores. Results: Of 20 randomized participants, 19 completed the trial (10 in the intervention group and 9 in the control group). Caregiver-rated feasibility was high across domains (mean scores 3.92–4.70 out of 5), with the highest ratings for overall acceptability and technical feasibility. Usability showed the lowest score and greatest variability. Qualitative analysis identified four themes: (1) strong but module-specific engagement, (2) smooth operation with unclear system status, (3) variable generalization to daily life, and (4) requests for smarter personalization and realistic scenarios. On secondary outcomes, SRS scores favored the intervention group but were not statistically significant. CGI outcomes were comparable between groups. Conclusions: This pilot trial demonstrated that AECTS is a feasible and acceptable digital intervention for children with autism, with positive caregiver feedback and preliminary signals of benefit. Although clinical efficacy was not statistically significant, favorable trends in social responsiveness suggest potential value. Future large-scale trials with enhanced usability, adaptive personalization, real-life social scenarios, and caregiver support are warranted to establish the intervention’s effectiveness and scalability.

  • Background: Quality of life (QoL) plays a crucial role in dementia care, yet QoL and its dynamic, context-dependent nature can be difficult to capture in people living with dementia due to challenges in memory and communication and limitations of self-reported QoL instruments. Observational tools such as the Maastricht Electronic Daily Life Observation (MEDLO) provide narrative descriptions of the daily life of people living with dementia in nursing homes. However, the MEDLO tool was not developed to assess QoL specifically, and it remains unclear to what extent its narrative descriptions reflect aspects of QoL. Analysing these narrative descriptions is labour-intensive and time-consuming. Recent advances in natural language processing (NLP), including Large Language Models, offer potential to analyse these narrative descriptions at scale. Objective: The study aims to gain insight into the QoL in people living with dementia residing in nursing homes in the Netherlands, using NLP to interpret narratives of daily life in existing MEDLO data. Methods: This study conducted a secondary analysis of existing MEDLO observational data from 151 people living with dementia residing in Dutch long-term care. Narrative data had been documented by trained observers, describing activities, interactions, settings and emotional expressions. For analysis, a local secure pipeline was developed in which GPT-4o-mini was deployed for NLP tasks. The pipeline comprised three analytical steps: (1) N-gram frequency analysis to identify common language patterns, (2) sentiment analysis of positive and negative expressions per QoL domains, and (3) topic modelling to group semantically related terms and map them to QoL domains. Outputs were iteratively refined through prompt engineering and validated through expert review for coherence and contextual relevance. Results: A total of 5,622 narratives (50,106 words) from 151 observed people living with dementia were analysed. The narratives were short, averaging 8.5 words per narrative. N-gram frequency analysis identified frequent documentation of passive activity (sits at the table) in limited indoor settings (living room). Emotional well-being was often described in positive terms (smiles, laughs), whereas explicitly negative expressions (cries, distress) occurred less frequently. Weighted sentiment analysis showed that, although fewer in number, negative expressions carried a stronger intensity, resulting in an overall predominance of negative sentiment across all QoL domains. Topic modelling identified eight coherent clusters, most of which mapped onto multiple QoL domains, underscoring QoL’s multidimensionality. Conclusions: NLP identified predominantly passive activities in little varying indoor settings, yet people living with dementia were often described with positive affect, underscoring both the complexity of QoL in dementia and the influence of documentation practices. In practice, NLP could help translate everyday care documentation into actionable information that guides more responsive, person-centred dementia care.

  • Background: Consistent physical inactivity among adults and adolescents poses a major global health challenge. Mobile health (mHealth) interventions, particularly Just-in-Time Adaptive Interventions (JITAIs), offer a promising avenue for scalable and personalized physical activity promotion. However, developing and evaluating such adaptive interventions at scale, while integrating robust behavioral science, presents methodological hurdles. Objective: The PEARL study aimed to assess the feasibility and effectiveness of a reinforcement learning (RL) algorithm, informed by health behavior change theory (COM-B), to personalize the content and timing of physical activity nudges via the Fitbit app compared to fixed and random nudging strategies, and to a control group with no nudges. Methods: We conducted a large-scale, four-arm randomized controlled trial (RCT) enrolling 13,463 Fitbit users. Participants were randomized to: (1) Control (no nudges); (2) Random (random content/timing); (3) Fixed (logic based on baseline COM-B survey); and (4) RL (adaptive algorithm). The primary outcome was the change in average daily step count from baseline to 2 months. Secondary outcomes included user engagement and survey responses regarding capability, opportunity, and motivation. Results: 7,711 participants were included in the primary analysis (mean age 42.1 years; 86.3% female). At 1 month, the RL group showed a significant increase in daily steps compared to Control (+296 steps, P<.001), Random (+218 steps, P=.005), and Fixed (+238 steps, P=.002) groups. At 2 months, the RL group sustained a significant increase against the Control (+210 steps, P=.01). Generalized estimating equation (GEE) models confirmed a sustained significant increase in the RL group (+208 steps, P=.002). In exit surveys, the RL group reported higher favorable responses regarding nudge customization (37%) compared to other groups. Conclusions: This study demonstrates the feasibility and early efficacy of using RL to personalize digital health nudges at scale. While long-term retention remains a challenge, the adaptive approach outperformed static behavioral rules, showcasing the promise of dynamic personalization in a real-world mHealth setting. Clinical Trial: doi: 10.17605/OSF.IO/TW7UP

  • An AI-Based Smart Nursing Ward Model for Enhanced Recovery After Thoracic Surgery: A Historical Controlled Trial

    Date Submitted: Jan 27, 2026
    Open Peer Review Period: Jan 28, 2026 - Mar 25, 2026

    Background: Due to surgical trauma and the impact of the disease, patients undergoing thoracic surgery often experience a series of postoperative symptom burdens, which affect their recovery. Traditional perioperative care has drawbacks. Objective: To evaluate the impact of an AI-based personalized smart nursing ward management model on postoperative recovery outcomes in patients undergoing thoracic surgery. Methods: According to patients' admission sequence, patients who met the inclusion criteria were divided into a control group (n=303) and an intervention group (n=240). The control group adopted the routine nursing mode of general wards, while the intervention group implemented the AI-based personalized smart nursing ward management model on the basis of the routine nursing provided to the control group. Results: Data from all 543 enrolled patients were analyzed. Compared with the control group (n=303) receiving routine care, the intervention group (n=240) had a significantly shorter median hospital stay (9.0 days vs 12.0 days) and chest tube indwelling time (5.0 days vs 7.0 days), as well as lower total hospitalization costs (¥61,032.87 vs ¥72,859.90) (all P < .001). The postoperative pulmonary complication rate was also significantly lower in the intervention group (3.8% vs 12.2%, P < .001). Furthermore, patient satisfaction was higher (98.53% vs 91.28%), and nurses' daily step count was reduced (12,359.52 vs 18,692.74 steps) in the intervention group (both P < .001) Conclusions: The AI-based smart nursing model effectively promotes postoperative recovery and offers an innovative management approach for thoracic surgery.

  • The Impact of AI-driven tools on Breastfeeding Outcomes: Systematic Review and Meta-Analysis

    Date Submitted: Jan 27, 2026
    Open Peer Review Period: Jan 27, 2026 - Mar 24, 2026

    Background: The current global breastfeeding landscape presents both progress and challenges. The rise of artificial intelligence (AI) has emerged as a promising new strategy to enhance breastfeeding practices. Objective: To evaluate the impact of AI-driven tools on breastfeeding practices and outcomes. Methods: We searched PubMed, Web of Science, Cochrane Library, Embase, and CINAHL from inception to October 2025 for randomized controlled trials (RCTs) and quasi-experimental studies. The risk of bias in individual studies was assessed using the Cochrane risk of bias tool for randomized controlled trials (RoB 2) and the risk of bias in non-randomized studies of interventions tool (ROBINS-I). Data were extracted independently by two reviewers and combined using Review Manager 5.4 and R-4.5.2 to obtain pooled results via random-effects models, with subgroup analyses based on intervention type, timing of implementation, population characteristics, and country income level. Results: This review included 39 studies with 10735 participants from 15 countries. AI-driven tools increased exclusive breastfeeding (EBF) rates (at <3 months: relative risk [RR] 1.21, 95% CI 1.13-1.29; P<.001, I²=56%; at 3–6 months: RR 1.54; 95% CI 1.29-1.85; P<.001, I2=69%; at ≥6 months: RR 1.47, 95% CI 1.22-1.77, P<.001, I2=78%), breastfeeding self-efficacy (BSE) (standardized mean difference [SMD] 0.41, 95% CI: 0.04-0.78; P=.03, I2=93%), and breastfeeding knowledge (SMD 1.69; 95% CI: 0.54-2.84, P=.004, I2=98%). Conclusions: AI-driven tools effectively increase exclusive breastfeeding rates, breastfeeding self-efficacy, and breastfeeding knowledge. Future studies are needed to provide stronger evidence about clinical care interventions. Clinical Trial: PROSPERO CRD420251233352; https://www.crd.york.ac.uk/PROSPERO/view/CRD420251233352

  • ‘Carer-as-Sensor’ in Decentralized Trials: Passive Sensing Data Accuracy, Parkinson’s, and Observers

    Date Submitted: Jan 26, 2026
    Open Peer Review Period: Jan 27, 2026 - Mar 24, 2026

    Background: Parkinson's clinical trials depend on patient-reported outcomes, often overlooking the vital role of carers in collaboratively tracking symptom progression. This is a potential limitation for decentralized clinical trials aimed at measuring real-world, free-living symptoms with sensors, such as wearables and cameras in the home. Objective: The primary objective of our study was to inform the design of a multimodal sensor platform for decentralised clinical trials. Methods: A qualitative study was conducted with an inductive approach using semistructured interviews with a cohort of people with Parkinson's. Results: This study of 18 participants (14 people diagnosed with Parkinsons, 4 spouses/informal carers) found that carers, household members, and peers take a central role in helping people with Parkinson’s make sense of and manage their symptoms. Our participants relied on others to help with completing tasks and understanding their symptoms through comparison to others, using their Carer-as-Sensor. While our participants mostly viewed their relationships with others positively, this could lead to negative impacts on oneself. Participants could prioritize household needs over their health by not taking medication or risking a chance of falling, or even avoiding being around others to prevent their Parkinson's being on display to reduce carer burden. Conclusions: Our results argue that an 'outsider' and 'insider' approach to reporting symptoms can identify symptoms that are not noticed by people with Parkinson's, or withheld from carers. These form household-centred recommendations more broadly for the design of tracking and annotation strategies in the context of decentralised clinical trials and new innovations in AI to support the capture of nuanced and subtle changes in symptoms.

  • Application of Immediate Adaptive Intervention in Dietary Health Management: A Systematic Review

    Date Submitted: Jan 26, 2026
    Open Peer Review Period: Jan 27, 2026 - Mar 24, 2026

    Background: With the development of mHealth technology, Just-in-Time Adaptive Interventions (JITAI), as a new type of intervention that leverages real-time data to provide personalized support, has gradually gained attention. Relying on terminals such as smartphones and wearable devices, this intervention can collect individuals’ physiological indicators and environmental context data in real time, and dynamically adjust the type and intensity of support content. However, current research related to JITAI faces issues including inconsistent definition of core elements, high heterogeneity in intervention design, and controversial evidence on effectiveness. Additionally, there is a lack of systematic sorting out of its feasibility and generalizability, requiring evidence integration to guide its optimization. Objective: To systematically review the application effect of Just-in-Time Adaptive Interventions (JITAI) in dietary health management, comprehensively analyze its impacts on diet-related behaviors and physiological indicators, and assess the certainty of existing evidence. Methods: Databases including PubMed, Embase, Scopus, CINAHL, and Web of Science were searched from their inception to August 20, 2025. The search terms included dietary health, dietary behavior, JITAI, EMA, and others. Two researchers independently conducted literature screening, data extraction, and quality assessment, sorted out participant characteristics, JITAI features, outcome measures, and other relevant content, and summarized these elements. Results: A total of 12 studies were ultimately included in this research. The study populations covered groups such as adults with overweight/obesity, patients with hypertension, young people with low fruit and vegetable intake, patients with type 1 diabetes, patients with kidney stones, hemodialysis patients, and individuals with binge-eating spectrum disorders. Intervention types mainly included smartphone app interventions, short message service (SMS) reminders, wearable device interventions, and context-aware location-triggered interventions. Most studies reported positive effects of JITAI, such as increasing fruit and vegetable intake, reducing sodium intake, improving uncontrolled eating behaviors, enhancing the automaticity of fluid intake, and decreasing the frequency of binge eating and compensatory behaviors in patients with eating disorders. Meanwhile, implementation barriers including insufficient device adaptability, differences in digital literacy, and limitations of GPS signals were also revealed. Conclusions: Existing evidence suggests that JITAI-based dietary interventions may have potential value in promoting dietary behavior change. However, due to research heterogeneity and methodological limitations, the certainty of their effectiveness remains limited. In the future, it is necessary to design high-quality studies with more rigorous methodologies, standardized outcome measures, and sufficient follow-up periods to clarify the effectiveness of JITAI in dietary health management.

  • Background: Poor usability of electronic health record (EHR) systems is associated with workflow inefficiencies, patient safety risks, and burnout among health professionals. Health professionals are exposed to various work conditions, but the associations with perceived EHR usability are unknown. Objective: To examine whether medical doctors’ and nurses’ usability perceptions of an established electronic patient record (EPR) system and a newly adopted EHR system differ by work schedules, type of employment (full-time or part-time), work pace, and number of clinical settings. Methods: In the established EPR system, nurses were more likely to report low ease-of-use if they worked three-shift rotations (odds ratio [OR] 2.21, 95% CI: 1.34-3.65 vs. daytime), part-time (OR 1.63, 95% CI:1.20-2.21 vs. full-time), or faced very high work pace (OR 1.25, 95% CI: 1.42-3.58 vs. low work pace). Following EHR adoption, medical doctors and nurses reported a median (IQR) SUS score of 17.5 (7.5-32.5) and 32.5 (17.5-50.0), respectively. Both medical doctors and nurses reported lower SUS scores when they faced very high work pace compared to low work pace, with mean differences of -8.56, 95% CI (-12.60 to -4.51) and -8.43 (95% CI: -14.10 to -2.76), respectively. Part-time employed nurses reported 2.72 points (95% CI: -4.93 to -0.52) lower SUS score than full-time employed, and nurses working across 3-4 clinical settings reported 2.99 points (95% CI: -5.52 to -0.46) lower SUS score than nurses working across 1-2 settings. Results: 543 medical doctors and 1,869 nurses participated. In the established EPR system, nurses were more likely to report low ease-of-use if they worked three-shift rotations (odds ratio [OR] 2.21, 95% CI: 1.34-3.65 vs. daytime), part-time (OR 1.63, 95% CI:1.20-2.21 vs. full-time), or faced very high work pace (OR 1.25, 95% CI: 1.42-3.58 vs. low). Following EHR adoption, medical doctors and nurses reported a median (IQR) SUS score of 17.5 (7.5-32.5) and 32.5 (17.5-50.0), respectively. Both medical doctors and nurses reported lower SUS scores when they faced very high work pace compared to low work pace, with mean differences of -8.56, 95% CI (-12.60 to -4.51) and -8.43 (95% CI: -14.10 to -2.76), respectively. Part-time employed nurses reported 2.72 points (95% CI: -4.93 to -0.52) lower SUS scores than full-time employed, and nurses working across 3-4 clinical settings reported 2.99 points (95% CI: -5.52 to -0.46) lower SUS score than nurses working across 1-2 settings. Conclusions: These findings suggest that system usability perceptions differ by work conditions, particularly work pace. Although these results could guide tailored implementation strategies, ensuring adequate EHR usability architecture is likely to be as important.

  • Background: Personal Data Spaces (PDS) are increasingly promoted as digital infrastructures that enable citizen participation in health data governance by strengthening transparency and individual control over personal health data. Despite growing policy and technological attention, empirical evidence remains limited on whether citizens view PDS as acceptable and desirable governance instruments, how they evaluate different types of data and purposes of data use, and which factors shape public support. Objective: The objective of this study was to examine how citizens evaluate We Are, a proposed citizen-centered Personal Data Space model in Flanders, Belgium, and to assess overall support, reasons for endorsement, preferences for control versus transparency, acceptability of storing different types of health data, and acceptance of different purposes of data use. Methods: We conducted an online survey among adults aged 18-79 years in Flanders, Belgium (N=1,041). The sample was quota-based and representative for gender, age, education, province, and urbanization level. Participants evaluated the We Are model after reading a description. Measures included overall evaluation of the model, reasons for support, preferences for transparency and control, willingness to store medical versus lifestyle data, and willingness to share data across vignette-based scenarios varying purpose of use and recipient type. Data were analyzed using t-tests, linear regression, and mixed models with repeated measures. Results: Overall evaluations of We Are were moderately positive (Mean 2.51 on a 1-4 scale) and did not differ significantly from the scale midpoint (t(1040)=0.70, P=.24). Sociodemographic characteristics explained little variance in support, whereas understanding of the We Are model and psychographic factors substantially increased explained variance (R² increased from .03 to .24). Higher trust in technology was positively associated with support, while stronger privacy attitudes and privacy-related fears were negatively associated. Respondents valued control more strongly than transparency for both general personal data (t(1040)=-10.37, P<.001) and health data (t(1040)=-12.47, P<.001). Medical data were considered more acceptable to store than lifestyle data (Δ=0.38, P<.001). Both personal and public benefits motivated support, but commercial data use reduced willingness to share, particularly when framed around individual gain rather than collective benefit. Conclusions: Citizens view PDS as potentially valuable instruments for health data governance, but their support is conditional and shaped by understanding and psychographic factors rather than by sociodemographic factors. PDS can contribute to meaningful citizen participation only when technological features are embedded in governance arrangements that provide real agency, credible safeguards, and demonstrable public value.

  • Secondary Use of Health Data as a Core Capability in Medical Informatics

    Date Submitted: Jan 23, 2026
    Open Peer Review Period: Jan 25, 2026 - Mar 22, 2026

    The European Health Data Space represents a landmark regulatory success in enabling the secondary use of health data for research, innovation, and policy within a trusted and interoperable framework. This Viewpoint discusses how strategic alliances—such as UNINOVIS—and translational research ecosystems, with IBIMA as a driving hub, operationalize this regulation by aligning governance, infrastructure, and applied data science. Together, they illustrate how European health data policy can be translated into real-world evidence generation and sustained clinical and societal impact.

  • Background: Musculoskeletal conditions are a leading global cause of disability, yet the factors influencing long-term musculoskeletal health, particularly following trauma, remain incompletely understood. Machine learning could be applied to identify previously unknown patterns in large-scale multimodal datasets. Objective: Test the ability of a new sparse Group Factor Analysis method to uncover hidden patterns in large-scale multi-modal datasets and generate testable, clinically relevant hypotheses. Methods: This study applies sparse Group Factor Analysis, a hierarchical unsupervised machine learning method, to the ADVANCE cohort—a longitudinal dataset of 1445 UK Afghanistan War servicemen—to identify latent structures in multimodal clinical data. Study 1 validated the approach by rediscovering known group-level patterns between combat-injured and non-injured participants, including poorer outcomes in pain, mobility, and bone health among those with lower limb loss. Study 2 explored the Injured, non-amputee subgroup without prespecified labels to identify new hypothesis-generating clusters that could subsequently be tested using standard hypothesis testing methods. Results: A subgroup of 125 individuals with worse musculoskeletal outcomes was uncovered. This group had greater body mass, higher injury severity, and a higher prevalence of head injury. These findings led to a novel hypothesis: that head injury, including potential traumatic brain injury, is associated with long-term musculoskeletal deterioration. This hypothesis is supported by literature in both athletic and military populations and will be tested in follow-up analyses. Conclusions: Our findings demonstrate how sparse Group Factor Analysis, combined with clinical insight, can uncover hidden patterns in large-scale datasets and generate testable, clinically relevant hypotheses that inform prevention, treatment, and rehabilitation strategies.

  • Background: Chronic kidney disease (CKD) requires sustained self-management involving complex medication regimens, dietary restrictions, and symptom monitoring. These demands pose substantial challenges to medication adherence and daily disease management. Digital therapeutics (DTx) have the potential to support CKD self-management; however, CKD-specific design requirements informed by both patient and clinician perspectives remain insufficiently explored. Objective: This study aimed to identify key design requirements for CKD-specific digital therapeutics by integrating patient-reported self-management challenges with nephrologist perspectives on clinical needs and implementation considerations. Methods: A convergent mixed-methods study was conducted at a tertiary academic hospital. Quantitative data were collected through a structured survey of 60 adults with non–dialysis-dependent CKD to assess medication adherence challenges, digital health needs, and age-related differences. Qualitative data were obtained through focus group interviews with 19 nephrologists and analyzed using thematic analysis. Quantitative and qualitative findings were integrated to identify convergent priorities and design implications for CKD-specific DTx. Results: None of the patients reported prior experience with CKD-specific digital health applications, although 70% perceived a need for such tools. Younger patients (<60 years) expressed significantly greater interest in digital therapeutics than older patients (83.9% vs 55.2%, P=.015). Common patient-reported challenges included managing multiple medications (36.7%), irregular medication schedules (30.0%), and difficulty understanding medication timing relative to meals (28.3%). Nephrologists emphasized the importance of personalized medication reminders, comprehensive medication information (including adverse effects and nephrotoxic risks), symptom-monitoring systems, and features supporting dietary and lifestyle management. Integration findings highlighted the need for user-friendly, age-sensitive interfaces, data security, and clinically actionable feedback mechanisms. Conclusions: By integrating patient and nephrologist perspectives, this mixed-methods study identifies key design considerations for CKD-specific digital therapeutics. These findings provide formative, design-informed evidence to guide the early development of patient-centered and clinically relevant digital therapeutics for CKD.

  • Background: Digital multidomain interventions hold promise for dementia risk reduction; however, populations at higher dementia risk, including those experiencing socioeconomic and educational disadvantage, remain underrepresented in trials, and engagement with digital interventions often declines over time. Co-production and blended models that combine digital tools with human support may improve reach, acceptability, usability, and sustained engagement. Designing interventions that are usable and acceptable for individuals facing structural, educational, or digital barriers (underserved groups) is therefore likely to produce solutions that are both accessible and scalable for the wider older adult population. Objective: To describe the co-production process used to develop ENHANCE—a coach-supported digital intervention targeting ten modifiable dementia risk factors in older adults from underserved groups—and report key outputs and lessons learned for equitable digital prevention design. Methods: We co-produced ENHANCE between July 2023 and February 2025 using a multi-stage development process guided by the Medical Research Council framework for complex interventions and the Double Diamond design model. The Person-Based Approach informed user-centred guiding principles (key design objectives), while behaviour change content was operationalised using behavioural change theories. Co-production followed four phases. The Discovery phase explored barriers to engagement with existing digital materials and identified candidate components for each dementia risk-factor module. The Define phase translated these insights into guiding principles and blueprints of each risk-factor module integrated with behavioural change components. The Design phase involved iterative co-production and usability testing of prototypes. The Delivery phase evaluated a high-fidelity prototype through a one-week usability study with coaching support. Contributors included 162 research participants recruited from underserved community settings, 33 patient and public involvement contributors, and 4 human–computer interaction experts. Throughout development, co-production focused on reducing literacy, digital confidence, and cultural barriers to maximise usability across diverse older adult populations. Results: Co-production produced (1) evidence-informed module strategies for targeted dementia risk factors; (2) a set of guiding principles to ensure low-literacy, culturally relevant, and accessible content, supporting both equity of access and wider population usability; (3) a meadow-themed app integrating tailored check-ins, educational videos, cognitive training games, and in-app messaging; and (4) a structured coaching model, including onboarding, brief follow-up, and accompanying coaching manuals. Iterative testing and refinement improved navigation, simplified language, reduced text burden, and ensured the use of familiar and accessible game formats, resulting in a feasibility-ready prototype. Conclusions: : ENHANCE is a co-produced, coach-supported digital intervention designed to be accessible for underserved older adults at increased dementia risk, with design features intended to support accessibility, engagement, and scalability across the wider ageing population. The development process illustrates how integrating co-production with behavioural science and usability methods can support principled intervention design for equitable digital dementia prevention. Clinical Trial: ISRCTN17060879

  • Background: Clinical natural language processing (NLP) refers to computational methods for extracting, processing, and analyzing unstructured clinical text data, and holds a huge potential to transform healthcare. The advancement of deep learning, augmented by the recent emergence of transformers, has been pivotal to the success of NLP across various domains. This success is largely attributed to the end-to-end training capabilities of deep learning systems. Further, advances in instruction tuning have enabled Large Language Models (LLMs) like OpenAI’s GPT to perform tasks described in natural language. While these advancements have dramatically improved capabilities in processing languages like English, these benefits are not always equally transferable to under-resourced languages. In this regard, this review aims to provide a comprehensive assessment of the state-of-the-art NLP methods for the mainland Scandinavian clinical text, thereby providing an insightful overview of the landscape for clinical NLP within the region. Objective: The study aims to perform a systematic review to comprehensively assess and analyze the state-of-the-art NLP methods for the Scandinavian clinical domain, thereby providing an overview of the landscape for clinical language processing within the Scandinavian languages across Norway, Denmark, and Sweden. Generally, the review aims to provide a practical outline of various modeling options, opportunities, and challenges or limitations, thereby providing a clear overview of existing methodologies and potential avenues for future research and development. Methods: A literature search was conducted in various online databases, including PubMed, ScienceDirect, Google Scholar, ACM Digital Library, and IEEE Xplore between December 2022 and March 2024. The search considers peer-reviewed journal articles, preprints, and conference proceedings. Relevant articles were initially identified by scanning titles, abstracts, and keywords, which served as a preliminary filter in conjunction with inclusion and exclusion criteria, and were further screened through a full-text eligibility assessment. Data was extracted according to predefined categories, established from prior studies and further refined through brainstorming sessions among the authors. Results: The initial search yielded 217 articles. The full-text eligibility assessment was independently carried out by five of the authors and resulted in 118 studies, which were critically analyzed. Any disagreements among the authors were resolved through discussion. Out of the 118 articles, 17.9% (n=21) focus on Norwegian clinical text, 61% (n=72) on Swedish, 13.5% (n=16) on Danish, and 7.6% (n=9) focus on more than one language. Generally, the review identified positive developments across the region despite some observable gaps and disparities between the languages. There are substantial disparities in the level of adoption of transformer-based models. In essential tasks such as de-identification, there is significantly less research activity focusing on Norwegian and Danish compared to Swedish text. Further, the review identified a low level of sharing resources such as data, experimentation code, pre-trained models, and the rate of adaptation and transfer learning in the region. Conclusions: The review presented a comprehensive assessment of the state-of-the-art Clinical NLP in mainland Scandinavian languages and shed light on potential barriers and challenges. The review identified a lack of shared resources, e.g., datasets and pre-trained models, inadequate research infrastructure, and insufficient collaboration as the most significant barriers that require careful consideration in future research endeavors. The review highlights the need for future research in resource development, core NLP tasks, and de-identification. Generally, we foresee that the findings presented will help shape future research directions by shedding some light on areas that require further attention for the rapid advancement of the field in the region

  • Photoplethysmography in Healthcare: An Umbrella Review of Clinical Applications, Validation, and Evidence Gaps

    Date Submitted: Jan 20, 2026
    Open Peer Review Period: Jan 21, 2026 - Mar 18, 2026

    Background: Photoplethysmography (PPG) is widely used in consumer and clinical devices for heart rate, rhythm, sleep, respiratory, and hemodynamic monitoring. However, rapid expansion of applications has produced a fragmented evidence base with heterogeneous methods and variable validation quality. Objective: To synthesize and critically appraise systematic reviews evaluating PPG-based applications in healthcare, map major clinical domains and methodological practices, and identify limitations and priorities for future research. Methods: A protocolized umbrella review (PROSPERO CRD420251015845) was conducted across six databases. Systematic reviews and meta-analyses involving human PPG applications were included. Screening, extraction, and AMSTAR-2 quality assessment were performed in duplicate following PRISMA-S and PRIOR guidelines. Results: Fifty-nine systematic reviews were included. PPG showed consistent accuracy for resting heart-rate monitoring and strong performance for opportunistic atrial fibrillation screening when paired with confirmatory ECG. HRV estimation, stress monitoring, sleep assessment, neonatal and maternal monitoring, and metabolic applications showed emerging but heterogeneous evidence. Cuffless blood pressure estimation remains limited by calibration dependence, motion sensitivity, and poor generalizability. Remote PPG (rPPG) achieves good accuracy under controlled lighting but degrades with motion, light variability, and darker skin pigmentation. Across domains, performance was typically higher in controlled environments and attenuated in free-living settings. Common methodological limitations included small samples, inconsistent reporting of device and preprocessing details, lack of external validation, algorithm opacity, and underrepresentation of diverse populations. Conclusions: PPG is approaching clinical maturity for atrial fibrillation screening and resting heart-rate monitoring, while other applications remain earlier in development. Safe integration into practice requires confirmatory ECG for rhythm abnormalities, awareness of bias sources, and adherence to transparent reporting. Future progress depends on multicenter longitudinal studies, real-world validation, diverse benchmark datasets, standardized metrics, and improved reproducibility across devices and algorithms. PPG holds promise as a scalable component of digital health infrastructure when developed and evaluated with methodological rigor. Clinical Trial: PROSPERO Registration: CRD420251015845

  • Adolescent’s Perspectives and Experiences with Dietary Mobile Health Apps: A Scoping Review

    Date Submitted: Jan 19, 2026
    Open Peer Review Period: Jan 20, 2026 - Mar 17, 2026

    Background: Smartphones play a central role in adolescents’ daily lives, making dietary mobile health (mHealth) apps—tools that provide nutrition education and tracking eating behaviors—a promising avenue for influencing dietary habits. While numerous studies have examined the impact of mHealth apps on diet, few have investigated adolescents’ perspectives and experiences with these tools. Objective: This scoping review aimed to synthesize the evidence and map the research gaps on adolescents’ perspectives (positive or negative) and experiences (attitudes, barriers, and facilitators) of using dietary mHealth apps on their smartphones. Methods: A systematic scoping review was conducted according to the 5-stage framework by Arksey and O’Malley. Articles that included mixed-methods studies that focused on adolescents (10-19 years of age) reporting perspectives (positive or negative) and experiences (attitudes, barriers, and facilitators) related to dietary apps use were searched across: PsycINFO, Embase, Medline, Web of Science and CINAHL for studies that were published from 2012 until 2023. Articles that were not specific to diet, not research studies, and not written in English were omitted. Results: Of the 590 abstracts screened, 17 studies met the eligibility criteria. Ten studies assessed the usability, feasibility and acceptability of standalone or multi-component dietary mHealth apps, while nine examined app likability and effectiveness. Thematic analysis revealed seven overarching themes: (1) Technical Functionality and Usability; (2) Appreciation of Nutritional Education and Content Depth; 3) Importance of Social Connection, Feedback and Support; (4) Values of Entertainment and Gamification; (5) Significance of Personal Goals, Motivation and Tracking; (6) Interest for Simple Design and Interface; and (7) Perceived Effectiveness of Dietary mHealth Apps. Positively perceived features included food identification, tracking and gamification elements. Commonly barriers included technical difficulties, tracking inaccuracies, complex information delivery and limited social engagement. Facilitators to app use were ease of navigation, targeted information, social interaction, rewards and goal setting. Suggested improvements focused on tracking accuracy, interface design, feedback mechanisms and notification options. Overall, adolescents perceived effective apps to as those that raised awareness of eating habits and support improvements in dietary intake. Conclusions: This scoping review highlights that adolescents’ experiences with dietary mHealth apps are shaped by technical functionality, usability, social engagement, personalization, and gamification. While these features can enhance engagement, barriers such as tracking inaccuracies, technical issues, and limited social interaction reduce app effectiveness. Understanding these perspectives is critical for designing apps that are not only informative but also appealing and sustainable for adolescent users.

  • AI-Powered Health Chatbot and Plate Recognition for Weight Loss and Health Literacy in Adults With Overweight: Quasi-Experimental Case-Control Study

    Date Submitted: Jan 16, 2026
    Open Peer Review Period: Jan 18, 2026 - Mar 15, 2026

    Background: Obesity remains a pressing global health issue. Research suggests that better health literacy can support obesity management. This study tested digital interventions combining healthy eating guidelines with AI and mobile tools, including a ChatGPT-powered Line chatbot for daily education and an AI food plate recognition system for calorie tracking and meal suggestions. Objective: This study aims to evaluate the efficacy of an integrated digital intervention, combining YOLOv5-based AI food plate recognition and a ChatGPT-powered LINE chatbot, on weight reduction (BMI) and health literacy among overweight and obese adults. Methods: The study used a quasi-experimental design-intervention case-control design. Both the case and intervention groups received basic health education through app notifications and used an AI food plate recognition tool to estimate their nutritional intake. Only the intervention group could access an AI weight-loss chatbot for timely suggestions. Questionnaire data were collected from users at several points during the intervention. Results: Eighty participants were enrolled. The intervention group demonstrated significantly greater reductions in BMI (β = −1.32; 95% CI, −1.56 to −1.09; P < .001) and improvements in health literacy (β = 4.71; 95% CI, 3.86 to 5.56; P < .001) versus controls. Physical activity (step count β = 1,926.5; 95% CI, 1,209.3 to 2,643.7; P < .001) and weekly exercise time (β = 0.56; 95% CI, 0.21 to 0.92; P = .002) also increased, while late-night snacking decreased (β = −0.45; 95% CI, −0.81 to −0.08; P = .017). The intervention group consistently outperformed the control group across key health measures. However, the AI chatbot alone lacked significant effects on primary outcomes. Conclusions: This integrated digital intervention effectively promotes weight loss and health literacy. Given the strong short-term efficacy, future research should employ randomized designs, larger sample sizes, and longer follow-ups to establish long-term weight maintenance and address potential influences such as the Hawthorne effect. It also highlights the need to further develop interactive, personalized health education tools and optimize AI food plate recognition systems to improve health literacy and weight management.

  • Background: During crisis, individuals increasingly rely on digital platforms for information, communication, and emotional support. Cyber behavior - which encompasses online engagement, security practices, and information sharing is shaped by cognitive and emotional factors such as awareness, knowledge, and anxiety. Understanding these relationships is crucial for promoting digital resilience and well-being during wartime and other large-scale emergencies. Objective: This study sought to examine how cybersecurity awareness, knowledge, and crisis-related anxiety influence cyber behavior and well-being during a national crisis. Drawing on the Protection Motivation Theory (PMT), the study further explored how cognitive and affective responses interact to shape individuals’ online engagement patterns and subsequent psychological outcomes. Methods: A cross-sectional online survey was conducted among 512 Israeli adults aged 18-65 during the ongoing war period (January 2024). Standardized psychometric instruments were used, including the WHO Well-Being Index, DASS-21 Stress subscale, and the Connor-Davidson Resilience Scale (CD-RISC-10). Media engagement was assessed across ten distinct digital activities. Data analysis employed a comprehensive approach, including cluster analysis, exploratory factor analysis (EFA), regression modeling, and path analysis. Results: Cluster analysis yielded two distinct segments: a high media engagement cluster and a low media engagement cluster. Participants in the high-engagement group reported significantly higher stress levels and greater utilization of digital media for news consumption, social networking, and charitable donations (p < .001). Furthermore, exploratory factor analysis revealed three salient dimensions of media usage: active, passive, and institutional. Path analysis indicated that stress was a positive predictor of all forms of media engagement. In predicting well-being, active media use (β = .12, p = .006) and resilience (β = .30, p < .001) were positively associated, whereas passive media use demonstrated a marginally negative association (β = -.08, p = .078). Conclusions: Cyber behavior during wartime is demonstrably influenced by both cognitive awareness and emotional stress. Specifically, while anxiety and stress tend to increase online engagement, overexposure to digital media may simultaneously well-being. Therefore, enhancing cyber literacy, cultivating emotional resilience, and promoting balanced media consumption are crucial strategies that can mitigate psychological distress and significantly strengthen digital resilience during crises.

  • Intelligent Identification of Pressure Injuries Using Multi-modal Deep Learning: A Scoping Review

    Date Submitted: Jan 13, 2026
    Open Peer Review Period: Jan 14, 2026 - Mar 11, 2026

    Background: The global prevalence of pressure injuries is high and can cause severe infections, or death. Accurate staging is vital for effective intervention. Deep learning streamlines pressure injury assessment, enhances efficiency, and yields practical, accurate results. This scoping review summarized research on multi-modal deep learning for intelligent pressure ulcer recognition. Objective: It systematized models, training methods, and outcomes to identify the best systems for rapid detection and automated staging of pressure ulcers. Enhancing the timeliness, accuracy, and objectivity of diagnosis is the goal. Methods: We searched the following databases and sources: PubMed, the Cochrane Library, IEEE Xplore, and Web of Science. The scoping review was conducted in accordance with the JBI Scoping Review Methodology Group’s guidance and reported following Preferred Reporting Items for Systematic Reviews and Meta-Analyses—Extension for Scoping Reviews guidelines. The study protocol was registered with the International Prospective Registry of Systematic Reviews (PROSPERO) on 12 December 2025 (registration number: CRD420251251573). Results: 15 articles were included: 26 models were involved, including AlexNet; VGG16; ResNet18; DenseNet121; SE-Swin Transformer; Cascade R-CNN; vision transformer (ViT); ConvNextV2; EfficientNetV2; Meta Former; TinyViT; CCM; BCM; ResNext + wFPN; SE-Inception; Mask-R-CNN; SE-ResNext101; Faster R-CNN; ResNet50; ResNet152; DenseNet201; EfficientNet-B4; YOLOv5; Inception-ResNet-v2; InceptionV3; MobilNetV2. The training methodology for intelligent pressure ulcer recognition models involves establishing an image database, processing images, and constructing the recognition model. Different models exhibit varying accuracy rates in staging pressure ulcers, with overall accuracy fluctuating between 54.84% and 93.71%. The DenseNet121 model achieved the highest recognition accuracy of 93.71%, while VGG16 was the most widely applied. The same model demonstrated significant variations in recognition accuracy across different studies. Conclusions: The multi-modal and deep learning-based intelligent recognition model for pressure injuries demonstrates high overall accuracy, enabling rapid automated staging of such injuries. Future research may explore optimized intelligent assistance systems to enhance the accuracy, objectivity, and efficiency of pressure injury diagnosis.

  • Objective: The exponential expansion of biomedical literature has created an urgent need for efficient methods to recognize and extract PICO (Population, Intervention, Comparison, Outcome) - the foundational elements of evidence-based medicine (EBM). This study systematically evaluates two complementary approaches for automating PICO recognition and extraction in medical literature: prompt engineering optimization and parameter-efficient Fine-Tuning of large language models (LLMs). Methods: We developed a dual-phase methodological framework: (1) systematic prompt optimization incorporating In-Context Learning (ICL), Chain-of-Thought (COT), and Tree-of-Thought (TOT) reasoning strategies; and (2) parameter-efficient fine-tuning (PEFT) of the LLM architecture using Low-Rank Adaptation (LoRA), Quantized LoRA (QLoRA), and Freeze techniques. PubMed-PICO and NICTA-PIBOSO benchmark datasets are used for recognition tasks while EBM-NLP is applied for extraction tasks. Performance metrics includes precision, recall, and F1-score . F1 is adopted as the major metric as it balances precision and recall. Results: COT prompting demonstrated superior recognition accuracy, achieving F1-scores of 77.1% (Population) and 84.5% (Outcome) on PubMed-PICO. In PEFT implementations, LoRA achieved peak classification performance (91.7% F1 for Population), while QLoRA showed best ex-traction capability (79.3% F1 for Intervention). Fine-tuned models established new benchmarks across all datasets, attaining SOTA results on NICTA-PIBOSO and EBM-NLP. PEFT demonstrated marked improvements over prompt engineering. Conclusion: Our findings indicate that large language models (LLMs) can effectively automate PICO recognition and extraction through two complementary approaches. First, prompt engineering allows the model to perform tasks directly without altering its internal settings. Second, the PEFT method further unlocks their maximum performance potential by incorporating additional fine-tuning based on prompt engineering. This work made significantly advances and provides critical insights for optimizing methodological approaches in clinical applications related to or comprised of PICO extraction and recognition tasks.

  • Background: Vessels encapsulating tumor clusters (VETC) are a distinct vascular pattern associated with aggressive behavior and poor prognosis in hepatocellular carcinoma (HCC). Preoperative identification of VETC is crucial for treatment planning but currently relies on invasive pathological examination. Radiomics-based artificial intelligence (AI) offers a potential noninvasive solution, yet evidence regarding its diagnostic and prognostic accuracy remains synthesized. Objective: We aimed to systematically evaluate the diagnostic performance and prognostic value of radiomics-based AI models for noninvasively predicting VETC status in patients with HCC. Methods: We systematically searched PubMed, Embase, Web of Science, and the Cochrane Library for studies published up to July 11, 2025. Studies developing or validating AI models using medical imaging (contrast-enhanced MRI [CEMRI], contrast-enhanced CT [CECT], contrast-enhanced ultrasound [CEUS], or [18F]FDG PET/CT) to predict pathologically confirmed VETC status in HCC patients were included. Study quality was assessed using the PROBAST+AI tool. Diagnostic accuracy (sensitivity, specificity, AUC) and prognostic value for early recurrence (hazard ratio [HR]) were pooled using random-effects models. Results: Fourteen studies involving 729 patients in internal and 581 in external validation cohorts were analyzed. AI models based on CEMRI demonstrated the highest diagnostic accuracy, with a pooled AUC of 0.87 (95% CI 0.84-0.90), sensitivity of 0.82 (95% CI 0.75-0.88), and specificity of 0.77 (95% CI 0.71-0.82). Models using other modalities (CECT, PET/CT, CEUS) showed moderate to good performance. Prognostically, HCC patients classified as VETC-positive by AI had a significantly higher risk of early recurrence (pooled HR 2.34, 95% CI 1.93-2.84). Conclusions: Radiomics-based AI models, particularly those using CEMRI, are promising for the noninvasive prediction of VETC and offer valuable prognostic stratification for early recurrence risk in HCC. However, significant heterogeneity and the retrospective nature of current studies limit the strength of evidence. Prospective, multicenter validation is required to confirm clinical utility. Clinical Trial: PROSPERO CRD420251167155

  • Digital Transformation in Healthcare: Are we on the right track?

    Date Submitted: Dec 26, 2025
    Open Peer Review Period: Dec 29, 2025 - Feb 23, 2026

    The healthcare digital transformation is gaining increasing notoriety, despite the observed challenges in its implementation. The envisioned benefits together with the growing need for better healthcare are motivating academia, organizations, regulatory agencies, and governments to develop more effective digital healthcare solutions. Through extensive debates among the authors and supported by a narrative literature review, this paper discusses how digital transformation is being conducted in the healthcare sector. Our discussion relies on the concepts from the sociotechnical systems theory categorizing it according to three social (people, culture, and goals) and three technical (processes/procedures, infrastructure, and technology) dimensions. Overall, we argue that both social and technical dimensions present elements that have been either encouraging or discouraging the progress of healthcare digital transformation. The identification of current trends on such (on- and off-track) elements allowed the formulation of propositions for future testing and validation. This approach can help the establishment of better government policies, foster private initiatives, and shift regulatory guidelines to support a successful digital transformation in health systems. Lastly, from a research perspective, we outline some opportunities for further interdisciplinary investigation in the field, promoting advances in the understanding of healthcare digital transformation.

  • Background: Internet search engines serve as primary gateways for cancer information, yet the commercialization of health content within organic search results remains understudied. While covert promotional content—such as native advertising and stealth marketing—has been documented in various contexts, systematic comparisons across structurally divergent search platforms are lacking. Objective: This study examined the prevalence, distribution, and information quality characteristics of covert promotional cancer-related content across Naver and Google, South Korea's two dominant search engines, which have fundamentally different platform architectures. Methods: A two-phase cross-sectional content analysis was conducted. Phase 1 employed natural language processing to identify 33 cancer-related keywords from 1,400 preliminary posts. Phase 2 systematically collected 5,848 posts in October 2023, yielding 919 unique posts (598 from Naver and 321 from Google) that covered seven major cancer types, representing over 70% of Korean cancer incidence. Two trained coders analyzed promotional status, intensity, institutional sources, and information quality indicators (citation practices, information depth, and source attribution), with inter-coder reliability exceeding κ=.80. Chi-square tests examined the associations between platform and cancer type. Results: Covert promotional content appeared in 48.6% (447/919) of analyzed posts, with significantly higher prevalence on Google (54.2%, 174/321) than Naver (45.7%, 273/598; χ²₁=5.78, p=.016). Platform differences were pronounced: Naver promotional posts predominantly originated from blogs (96.0%, 262/273) and exhibited full promotional intensity (52.1%, 126/242), while Google posts primarily came from hospital websites (81.0%, 141/174) with simple institutional identification (57.8%, 52/90). Institutional source distribution varied significantly by platform (χ²₅=215.714, P<.001): traditional medicine institutions dominated Naver (99.2%, 119/120), whereas university-affiliated hospitals predominated on Google (85.0%, 96/113). Information quality differed substantially: indirect citation was more common on Google (81.6%, 142/174) than Naver (58.6%, 160/273; χ²₁=25.653, P<.001), while comparative informational depth was higher on Google (55.7%, 97/174) versus Naver (19.4%, 53/273; χ²₂=64.683, P<.001). Conclusions: Covert promotional cancer content is pervasive in Korean search results, with platform architecture systematically shaping promotional patterns, institutional sources, and information quality rather than reflecting deliberate marketing strategies. These findings underscore the need for platform-sensitive regulation and enhanced digital health literacy to protect vulnerable cancer information seekers from commercial exploitation embedded within ostensibly neutral search environments.

  • Large Language Models in Colorectal Cancer: A Systematic Review

    Date Submitted: Dec 22, 2025
    Open Peer Review Period: Dec 23, 2025 - Feb 17, 2026

    Background: The growing complexity of colorectal cancer (CRC) management requires advanced tools for integrating multimodal data and clinical knowledge. Large language models (LLMs) offer a promising approach to address these challenges through sophisticated natural language processing and reasoning capabilities. Objective: This systematic review evaluates the current applications, performance, and practical implications of LLMs across the continuum of CRC care, from screening to treatment decision support. Objective: This systematic review evaluates the current applications, performance, and practical implications of LLMs across the continuum of CRC care, from screening to treatment decision support. Methods: We searched six databases (PubMed, Embase, Web of Science, Scopus, CINAHL, Cochrane) up to November 1, 2025, following PRISMA guidelines. Included studies were original research investigating LLM applications specific to CRC, with extractable outcome data. Quality was assessed using QUADAS-2, PROBAST, and ROBINS-I tools by two independent reviewers. Results: Following the screening of 1,261 records, 34 studies met the inclusion criteria, all published between 2023 and 2025. The synthesis highlighted the utility of LLMs in automating data extraction from clinical texts, supporting patient education, aiding diagnostic processes, and assisting in clinical decision-making, with growing evidence of their emerging visual interpretation and multimodal capacities. The effectiveness of these models was significantly influenced by prompt design, which varied from basic zero-shot queries to specialized fine-tuning techniques. While the overall methodological quality of the included studies was deemed adequate, assessments identified recurring concerns regarding insufficient control of biases and inadequate reporting on data security measures. Conclusions: LLMs demonstrate tangible potential to augment CRC care, particularly in structuring unstructured data and providing clinical decision support. However, translating this potential into practice requires solutions for domain adaptation, multimodal integration, and rigorous prospective validation to ensure reliability and safety in real-world settings. Clinical Trial: PROSPERO CRD420251248261; https://www.crd.york.ac.uk/PROSPERO/view/CRD420251248261.

  • Background: Anxiety disorders are highly prevalent among autistic adults, with 20%-65% experiencing at least one diagnosable anxiety disorder. While mindfulness-based interventions have demonstrated efficacy for anxiety reduction, treatment response varies considerably across individuals. Machine learning approaches offer potential for identifying who is most likely to benefit from smartphone-based mindfulness interventions, enabling more personalized treatment recommendations. Objective: This study aimed to develop and evaluate machine learning models to predict individual treatment response, in the form of reduced anxiety symptoms, to a smartphone-based mindfulness intervention for autistic adults. We sought to identify baseline characteristics that distinguish responders from non-responders, explore few-shot learning with large language models as a complementary approach for low-data clinical prediction, and implement a Personalized Advantage Index approach for individualized treatment recommendations. Methods: We conducted a secondary analysis of data from a randomized controlled trial comparing a 6-week smartphone-based mindfulness intervention (Healthy Minds Program) with a waitlist control condition in autistic adults. Among 73 participants who completed the intervention, we defined responders as those achieving ≥7-point reduction in State-Trait Anxiety Inventory state anxiety scores. Baseline predictors included demographic variables, autism trait measures, and self-report questionnaires assessing anxiety symptoms, perceived stress, affect, and mindfulness. We trained six machine learning models (logistic regression, Random Forest, XGBoost, TabNet, Tab-ICL, and TabPFN) using nested 10-fold cross-validation with inner 5-fold cross-validation for hyperparameter tuning. Additionally, we evaluated few-shot learning using GPT-4o models with tokenized baseline features at varying shot counts (20-70 examples). Model performance was evaluated using area under the receiver operating characteristic curve (AUC) for machine learning model and classification accuracy for few-shot learning. We examined feature importance and implemented Personalized Advantage Index analysis to estimate individualized treatment benefit. Results: Random Forest achieved the highest predictive performance for state anxiety response (AUC 0.79, 95% CI 0.66-0.91), followed by TabPFN (AUC 0.78, 95% CI 0.64-0.94) and logistic regression (AUC 0.77, 95% CI 0.73-0.81). Higher baseline state anxiety (coefficient 1.20, P<.001) predicted better treatment response, while higher trait anxiety (coefficient -0.17, P=.001), older age (coefficient -0.18, P=.02), and lower childhood pretend play scores (coefficient -0.93, P=.007) were associated with poorer response. Few-shot learning with 7-feature tokenization achieved accuracy of 0.867 (95% CI 0.81-0.92) at 70 shots, significantly outperforming Random Forest baseline (0.733, p<.001). Prediction of trait anxiety changes was substantially weaker (AUCs 0.57-0.68), likely reflecting the inherent stability of this personality dimension. The Personalized Advantage Index demonstrated significant moderation of treatment group differences (adjusted R²=0.29), with 75% of participants predicted to benefit more from the mindfulness intervention than the waitlist control. Conclusions: Machine learning models successfully identified baseline characteristics predicting treatment response to a smartphone-based mindfulness intervention in autistic adults. Few-shot learning with large language models demonstrated superior performance to traditional machine learning when provided with compact, high-signal feature representations, offering a promising approach for clinical prediction in small-sample settings. These findings demonstrate the feasibility of precision psychiatry approaches in digital mental health interventions for autistic adults. While modest sample size and limited demographic diversity warrant cautious interpretation, the stable cross-validation performance suggests robust predictive patterns within similar populations. Future research should validate these models in larger, more diverse samples and explore whether algorithm-guided treatment recommendations improve outcomes compared to standard care, through prospective randomized trials.

  • Background: The integration of large language models (LLMs) into healthcare holds promise to enhance clinical decision-making, yet their susceptibility to biases remains a critical concern. Gender has long influenced physician behaviors and patient outcomes, raising concerns that LLMs assuming human-like roles, such as clinicians or medical educators, may replicate or amplify gender-related biases. Objective: To evaluate the consistency of LLM responses across different assigned genders (personas) regarding both diagnostic outputs and model judgments on the clinical relevance or necessity of patient gender. Methods: Using case studies from the New England Journal of Medicine Challenge (NEJM), we assigned genders (female, male, or unspecified) to multiple open-source and proprietary LLMs. We evaluated their response consistency across LLM-gender assignments regarding both LLM-based diagnosis and models’ judgments on the clinical relevance or necessity of patient gender. For representative models with high diagnostic accuracy, we further evaluated consistency across question difficulty tiers and clinical specialties. Results: All models showed high diagnostic consistency across assigned LLM genders (range of consistency rates: 91.45%–97.44%), though this did not always correspond to diagnostic accuracy (e.g., GPT-4.1: 97.44% consistency, 0.943 accuracy; Gemma-2B: 97.44% consistency, 0.478 accuracy). In contrast, judgments on the clinical importance of patient gender showed marked inconsistency: consistency rates ranged from 58.97% to 90.6% for relevance judgements, 78.63% to 98.29% for necessity judgements. Stratified by difficulty tier and specialty, the open-source model (LLaMA-3.1-8B) particularly showed statistically significant differences across LLM genders regarding both relevance and necessity judgements. Conclusions: Despite stable diagnostic outputs, LLMs varied substantially in their assessments of patient gender’s clinical importance across gendered personas. These findings present an underexplored bias that could undermine the reliability of LLMs in clinical practice, underscoring the need for routine checks of identity-assignment consistency when interacting with LLMs to ensure reliable and equitable AI-supported clinical care. Clinical Trial: not applicable

  • Background: Despite the promising potential of artificial intelligence (AI) in the perioperative context, the rapid pace of development and diverse implementation warrants a systematic review to consolidate existing knowledge, identify gaps, and assess the utilization of trustworthiness principles of AI integration into the perioperative period for patients with serious illness. Objective: The purpose of this study was to address deficiencies in perioperative AI literature by elucidating the extent to which equity, ethics, and safety discussions are incorporated, thereby establishing a foundation for developing robust ethical guidelines for the safe and effective integration of AI in healthcare. This study also examined the utilization of AI enabled team augmentation in perioperative serious illness care. Methods: We searched PubMed, Embase, CENTRAL, and Scopus for studies published between 2010 to July 2024. We included studies that reported patient functional outcomes, occurred in the perioperative period (30 days before and up to 90 days after surgery), included AI integration, and included patients with serious illness (defined as: malignancy, advanced organ failure, frailty, dementia/neurodegenerative disease, or stroke). To ensure reliability and minimize bias, two independent reviewers screened all studies through the title/ abstract and full-text stage; conflicts were resolved through team consensus. The abstraction form was developed iteratively and was tested through pilot abstractions. Any discrepancies identified during data extraction were resolved through discussion and consensus among the reviewers. The ROBINS-I risk of bias tool in non-randomized studies was used to assess quality. Abstraction and risk assessment occurred through a blinded, independent dual review. A narrative review was compiled with the identified studies. Results: Of the 10,980 articles identified through the database searches, this review yielded 81 articles that met inclusion criteria. A majority of the studies were published in China (35), with the United States (9) and South Korea (7) having the subsequent most publications, and 80 out of 81 (98.8%) articles focused on patients with malignancy. Analysis of AI implementation strategies revealed foundational efforts toward equitable access, with six studies providing open-access tools and several more designing models with simple inputs suitable for low-resource settings (17). Seven studies mentioned their commitment to transparency (e.g., publishing code) to enhance safety and trust. However, significant ethical deficiencies persist, particularly around input data, as only two studies explicitly addressed racial or ethnic disparities, and concerns about lack of sample diversity (16) and the omission of socially relevant features (5) were frequently noted as limitations. Although no current studies considered AI enabled team augmentation, a majority of articles described how AI could be used to prompt a team member to make a tangible action. Conclusions: Machine learning for predictive analytics and other types of AI tools in surgical outcomes offers significant potential but requires adherence to trustworthiness and safety principles to be clinically viable. By leveraging longitudinal data and continuous performance tracking, these models have the potential positive impact on diverse patient needs and healthcare systems. Future research should prioritize adhering to guidelines for equity, ethics, and safety, conduct prospective studies, incorporate more external validation of AI models, and facilitate transparent monitoring and reporting of model performance to build clinician and patient trust and to encourage broader healthcare system adoption. Clinical Trial: PROSPERO CRD42024608387

  • Background: Over the past quarter-century, designers of digital behavior change tools have increasingly blended constructs from multiple theories, yet the extent to which such integration enhances intervention outcomes remains unclear. Objective: To clarify this relationship, this study systematically reviewed literature published between 1999 and 2025, focusing on IT-mediated interventions that explicitly combined at least two behavioral theories and reported intention or behavior outcomes. Methods: Following a registered protocol (PROSPERO CRD42022285741) and PRISMA guidelines, searches across seven databases identified 62 eligible studies. Results: Most investigations were quantitative (77%), featured sample sizes from 16 to 8840, and lasted under 6 months; only 9 applied randomized controlled designs. Twenty-nine theories appeared, with Self-Determination Theory (35%) and the Theory of Planned Behavior (29%) being the most prevalent, often paired with the Technology Acceptance Model or Task-Technology Fit. Integrated models consistently outperformed their single-theory counterparts. Health care and fitness interventions dominated (44%), followed by online learning (23%) and mobile commerce (11%), but long-term follow-ups and explicit mappings of theory to behavior change techniques were scarce, and overall risk-of-bias ratings were moderate. Conclusions: Findings indicate that integrated theoretical frameworks deliver measurably superior behavioral outcomes in digital environments, yet evidence remains short-term and health centric. Future research should extend evaluation horizons beyond 6 months, diversify application domains, apply more rigorous randomized designs, and articulate more transparently how theoretical constructs guide specific intervention techniques to advance replicable, theory-driven digital solutions.

  • Background: Digital health has the potential to mitigate health inequity for priority populations who are underserved or marginalised by the health system. However, there is a lack of practical guidance on how to include priority communities in the co-production of digital health technologies, particularly across the entire lifecycle of innovation including research, development, and evaluation. Objective: The aim of this scoping review was to systematically identify and assess published methods used during digital health innovation to promote equitable inclusion of priority communities at every stage of the CeHRes Roadmap for Digital Health Technologies. Methods: This review was based on the Arksey and O’Malley framework for scoping reviews. A 6-stage framework was used to execute the review. To increase the trustworthiness of the findings, an expert advisory group was consulted and their feedback incorporated into the final manuscript. The Participant, Concept and Context (PCC) framework was used to structure the inclusion criteria. Results: The review identified a total of 106 articles, 58 methods, 4 approaches, and 17 research adjustments utilised to co-produce digital health technologies with priority communities. Common methods across multiple stages included interviews, focus groups, surveys and workshops, however the most accessible way to make equity a practical reality during health technology innovation is to appoint a priority population community advisor, or advisory group, from project inception to project closure. Visual and creative methods like photovoice, home tours and body-mapping were also employed, often by priority population researchers themselves. Research adjustments that promote patient safety and comfort, enhanced literacy, peer-support and recognize socio-cultural and demographic considerations have been employed to increase the inclusion of priority populations during digital health innovation. Conclusions: Embedding equity is possible using the practical methods and research adjustments identified to promote inclusive co-production. Professionals working across healthcare, health informatics, research, digital health, and technology development can utilise these findings to centre digital health equity during technology innovation. This research also recognises that co-production must draw on epistemological frameworks, or ways of thinking, which support Indigenous and other priority population knowledge systems. A solely Western lens risks reinforcing structural barriers and overlooking essential knowledge, as demonstrated by this review when the search strategy missed key scholarly works by priority population authors themselves.