@Article{info:doi/10.2196/23458, author="Ikemura, Kenji and Bellin, Eran and Yagi, Yukako and Billett, Henny and Saada, Mahmoud and Simone, Katelyn and Stahl, Lindsay and Szymanski, James and Goldstein, D Y and Reyes Gil, Morayma", title="Using Automated Machine Learning to Predict the Mortality of Patients With COVID-19: Prediction Model Development Study", journal="J Med Internet Res", year="2021", month="Feb", day="26", volume="23", number="2", pages="e23458", keywords="automated machine learning; COVID-19; biomarker; ranking; decision support tool; machine learning; decision support; Shapley additive explanation; partial dependence plot; dimensionality reduction", abstract="Background: During a pandemic, it is important for clinicians to stratify patients and decide who receives limited medical resources. Machine learning models have been proposed to accurately predict COVID-19 disease severity. Previous studies have typically tested only one machine learning algorithm and limited performance evaluation to area under the curve analysis. To obtain the best results possible, it may be important to test different machine learning algorithms to find the best prediction model. Objective: In this study, we aimed to use automated machine learning (autoML) to train various machine learning algorithms. We selected the model that best predicted patients' chances of surviving a SARS-CoV-2 infection. In addition, we identified which variables (ie, vital signs, biomarkers, comorbidities, etc) were the most influential in generating an accurate model. Methods: Data were retrospectively collected from all patients who tested positive for COVID-19 at our institution between March 1 and July 3, 2020. We collected 48 variables from each patient within 36 hours before or after the index time (ie, real-time polymerase chain reaction positivity). Patients were followed for 30 days or until death. Patients' data were used to build 20 machine learning models with various algorithms via autoML. The performance of machine learning models was measured by analyzing the area under the precision-recall curve (AUPCR). Subsequently, we established model interpretability via Shapley additive explanation and partial dependence plots to identify and rank variables that drove model predictions. Afterward, we conducted dimensionality reduction to extract the 10 most influential variables. AutoML models were retrained by only using these 10 variables, and the output models were evaluated against the model that used 48 variables. Results: Data from 4313 patients were used to develop the models. The best model that was generated by using autoML and 48 variables was the stacked ensemble model (AUPRC=0.807). The two best independent models were the gradient boost machine and extreme gradient boost models, which had an AUPRC of 0.803 and 0.793, respectively. The deep learning model (AUPRC=0.73) was substantially inferior to the other models. The 10 most influential variables for generating high-performing models were systolic and diastolic blood pressure, age, pulse oximetry level, blood urea nitrogen level, lactate dehydrogenase level, D-dimer level, troponin level, respiratory rate, and Charlson comorbidity score. After the autoML models were retrained with these 10 variables, the stacked ensemble model still had the best performance (AUPRC=0.791). Conclusions: We used autoML to develop high-performing models that predicted the survival of patients with COVID-19. In addition, we identified important variables that correlated with mortality. This is proof of concept that autoML is an efficient, effective, and informative method for generating machine learning--based clinical decision support tools. ", doi="10.2196/23458", url="https://www.jmir.org/2021/2/e23458", url="https://doi.org/10.2196/23458", url="http://www.ncbi.nlm.nih.gov/pubmed/33539308" } @Article{info:doi/10.2196/22976, author="Rosado, Eduardo and Garcia-Remesal, Miguel and Paraiso-Medina, Sergio and Pazos, Alejandro and Maojo, Victor", title="Using Machine Learning to Collect and Facilitate Remote Access to Biomedical Databases: Development of the Biomedical Database Inventory", journal="JMIR Med Inform", year="2021", month="Feb", day="25", volume="9", number="2", pages="e22976", keywords="biomedical databases; natural language processing; deep learning; internet; biomedical knowledge", abstract="Background: Currently, existing biomedical literature repositories do not commonly provide users with specific means to locate and remotely access biomedical databases. Objective: To address this issue, we developed the Biomedical Database Inventory (BiDI), a repository linking to biomedical databases automatically extracted from the scientific literature. BiDI provides an index of data resources and a path to access them seamlessly. Methods: We designed an ensemble of deep learning methods to extract database mentions. To train the system, we annotated a set of 1242 articles that included mentions of database publications. Such a data set was used along with transfer learning techniques to train an ensemble of deep learning natural language processing models targeted at database publication detection. Results: The system obtained an F1 score of 0.929 on database detection, showing high precision and recall values. When applying this model to the PubMed and PubMed Central databases, we identified over 10,000 unique databases. The ensemble model also extracted the weblinks to the reported databases and discarded irrelevant links. For the extraction of weblinks, the model achieved a cross-validated F1 score of 0.908. We show two use cases: one related to ``omics'' and the other related to the COVID-19 pandemic. Conclusions: BiDI enables access to biomedical resources over the internet and facilitates data-driven research and other scientific initiatives. The repository is openly available online and will be regularly updated with an automatic text processing pipeline. The approach can be reused to create repositories of different types (ie, biomedical and others). ", doi="10.2196/22976", url="https://medinform.jmir.org/2021/2/e22976", url="https://doi.org/10.2196/22976", url="http://www.ncbi.nlm.nih.gov/pubmed/33629960" } @Article{info:doi/10.2196/20298, author="Hu, Mingyue and Shu, Xinhui and Yu, Gang and Wu, Xinyin and V{\"a}lim{\"a}ki, Maritta and Feng, Hui", title="A Risk Prediction Model Based on Machine Learning for Cognitive Impairment Among Chinese Community-Dwelling Elderly People With Normal Cognition: Development and Validation Study", journal="J Med Internet Res", year="2021", month="Feb", day="24", volume="23", number="2", pages="e20298", keywords="prediction model; cognitive impairment; machine learning; nomogram", abstract="Background: Identifying cognitive impairment early enough could support timely intervention that may hinder or delay the trajectory of cognitive impairment, thus increasing the chances for successful cognitive aging. Objective: We aimed to build a prediction model based on machine learning for cognitive impairment among Chinese community-dwelling elderly people with normal cognition. Methods: A prospective cohort of 6718 older people from the Chinese Longitudinal Healthy Longevity Survey (CLHLS) register, followed between 2008 and 2011, was used to develop and validate the prediction model. Participants were included if they were aged 60 years or above, were community-dwelling elderly people, and had a cognitive Mini-Mental State Examination (MMSE) score ≥18. They were excluded if they were diagnosed with a severe disease (eg, cancer and dementia) or were living in institutions. Cognitive impairment was identified using the Chinese version of the MMSE. Several machine learning algorithms (random forest, XGBoost, na{\"i}ve Bayes, and logistic regression) were used to assess the 3-year risk of developing cognitive impairment. Optimal cutoffs and adjusted parameters were explored in validation data, and the model was further evaluated in test data. A nomogram was established to vividly present the prediction model. Results: The mean age of the participants was 80.4 years (SD 10.3 years), and 50.85{\%} (3416/6718) were female. During a 3-year follow-up, 991 (14.8{\%}) participants were identified with cognitive impairment. Among 45 features, the following four features were finally selected to develop the model: age, instrumental activities of daily living, marital status, and baseline cognitive function. The concordance index of the model constructed by logistic regression was 0.814 (95{\%} CI 0.781-0.846). Older people with normal cognitive functioning having a nomogram score of less than 170 were considered to have a low 3-year risk of cognitive impairment, and those with a score of 170 or greater were considered to have a high 3-year risk of cognitive impairment. Conclusions: This simple and feasible cognitive impairment prediction model could identify community-dwelling elderly people at the greatest 3-year risk for cognitive impairment, which could help community nurses in the early identification of dementia. ", doi="10.2196/20298", url="https://www.jmir.org/2021/2/e20298", url="https://doi.org/10.2196/20298", url="http://www.ncbi.nlm.nih.gov/pubmed/33625369" } @Article{info:doi/10.2196/22841, author="Liu, Taoran and Tsang, Winghei and Huang, Fengqiu and Lau, Oi Ying and Chen, Yanhui and Sheng, Jie and Guo, Yiwei and Akinwunmi, Babatunde and Zhang, Casper JP and Ming, Wai-Kit", title="Patients' Preferences for Artificial Intelligence Applications Versus Clinicians in Disease Diagnosis During the SARS-CoV-2 Pandemic in China: Discrete Choice Experiment", journal="J Med Internet Res", year="2021", month="Feb", day="23", volume="23", number="2", pages="e22841", keywords="discrete choice experiment; artificial intelligence; patient preference; multinomial logit analysis; questionnaire; latent-class conditional logit; app; human clinicians; diagnosis; COVID-19; China", abstract="Background: Misdiagnosis, arbitrary charges, annoying queues, and clinic waiting times among others are long-standing phenomena in the medical industry across the world. These factors can contribute to patient anxiety about misdiagnosis by clinicians. However, with the increasing growth in use of big data in biomedical and health care communities, the performance of artificial intelligence (Al) techniques of diagnosis is improving and can help avoid medical practice errors, including under the current circumstance of COVID-19. Objective: This study aims to visualize and measure patients' heterogeneous preferences from various angles of AI diagnosis versus clinicians in the context of the COVID-19 epidemic in China. We also aim to illustrate the different decision-making factors of the latent class of a discrete choice experiment (DCE) and prospects for the application of AI techniques in judgment and management during the pandemic of SARS-CoV-2 and in the future. Methods: A DCE approach was the main analysis method applied in this paper. Attributes from different dimensions were hypothesized: diagnostic method, outpatient waiting time, diagnosis time, accuracy, follow-up after diagnosis, and diagnostic expense. After that, a questionnaire is formed. With collected data from the DCE questionnaire, we apply Sawtooth software to construct a generalized multinomial logit (GMNL) model, mixed logit model, and latent class model with the data sets. Moreover, we calculate the variables' coefficients, standard error, P value, and odds ratio (OR) and form a utility report to present the importance and weighted percentage of attributes. Results: A total of 55.8{\%} of the respondents (428 out of 767) opted for AI diagnosis regardless of the description of the clinicians. In the GMNL model, we found that people prefer the 100{\%} accuracy level the most (OR 4.548, 95{\%} CI 4.048-5.110, P<.001). For the latent class model, the most acceptable model consists of 3 latent classes of respondents. The attributes with the most substantial effects and highest percentage weights are the accuracy (39.29{\%} in general) and expense of diagnosis (21.69{\%} in general), especially the preferences for the diagnosis ``accuracy'' attribute, which is constant across classes. For class 1 and class 3, people prefer the AI + clinicians method (class 1: OR 1.247, 95{\%} CI 1.036-1.463, P<.001; class 3: OR 1.958, 95{\%} CI 1.769-2.167, P<.001). For class 2, people prefer the AI method (OR 1.546, 95{\%} CI 0.883-2.707, P=.37). The OR of levels of attributes increases with the increase of accuracy across all classes. Conclusions: Latent class analysis was prominent and useful in quantifying preferences for attributes of diagnosis choice. People's preferences for the ``accuracy'' and ``diagnostic expenses'' attributes are palpable. AI will have a potential market. However, accuracy and diagnosis expenses need to be taken into consideration. ", doi="10.2196/22841", url="https://www.jmir.org/2021/2/e22841", url="https://doi.org/10.2196/22841", url="http://www.ncbi.nlm.nih.gov/pubmed/33493130" } @Article{info:doi/10.2196/23026, author="Sang, Shengtian and Sun, Ran and Coquet, Jean and Carmichael, Harris and Seto, Tina and Hernandez-Boussard, Tina", title="Learning From Past Respiratory Infections to Predict COVID-19 Outcomes: Retrospective Study", journal="J Med Internet Res", year="2021", month="Feb", day="22", volume="23", number="2", pages="e23026", keywords="COVID-19; invasive mechanical ventilation; all-cause mortality; machine learning; artificial intelligence; respiratory; infection; outcome; data; feasibility; framework", abstract="Background: For the clinical care of patients with well-established diseases, randomized trials, literature, and research are supplemented with clinical judgment to understand disease prognosis and inform treatment choices. In the void created by a lack of clinical experience with COVID-19, artificial intelligence (AI) may be an important tool to bolster clinical judgment and decision making. However, a lack of clinical data restricts the design and development of such AI tools, particularly in preparation for an impending crisis or pandemic. Objective: This study aimed to develop and test the feasibility of a ``patients-like-me'' framework to predict the deterioration of patients with COVID-19 using a retrospective cohort of patients with similar respiratory diseases. Methods: Our framework used COVID-19--like cohorts to design and train AI models that were then validated on the COVID-19 population. The COVID-19--like cohorts included patients diagnosed with bacterial pneumonia, viral pneumonia, unspecified pneumonia, influenza, and acute respiratory distress syndrome (ARDS) at an academic medical center from 2008 to 2019. In total, 15 training cohorts were created using different combinations of the COVID-19--like cohorts with the ARDS cohort for exploratory purposes. In this study, two machine learning models were developed: one to predict invasive mechanical ventilation (IMV) within 48 hours for each hospitalized day, and one to predict all-cause mortality at the time of admission. Model performance was assessed using the area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, positive predictive value, and negative predictive value. We established model interpretability by calculating SHapley Additive exPlanations (SHAP) scores to identify important features. Results: Compared to the COVID-19--like cohorts (n=16,509), the patients hospitalized with COVID-19 (n=159) were significantly younger, with a higher proportion of patients of Hispanic ethnicity, a lower proportion of patients with smoking history, and fewer patients with comorbidities (P<.001). Patients with COVID-19 had a lower IMV rate (15.1 versus 23.2, P=.02) and shorter time to IMV (2.9 versus 4.1 days, P<.001) compared to the COVID-19--like patients. In the COVID-19--like training data, the top models achieved excellent performance (AUROC>0.90). Validating in the COVID-19 cohort, the top-performing model for predicting IMV was the XGBoost model (AUROC=0.826) trained on the viral pneumonia cohort. Similarly, the XGBoost model trained on all 4 COVID-19--like cohorts without ARDS achieved the best performance (AUROC=0.928) in predicting mortality. Important predictors included demographic information (age), vital signs (oxygen saturation), and laboratory values (white blood cell count, cardiac troponin, albumin, etc). Our models had class imbalance, which resulted in high negative predictive values and low positive predictive values. Conclusions: We provided a feasible framework for modeling patient deterioration using existing data and AI technology to address data limitations during the onset of a novel, rapidly changing pandemic. ", doi="10.2196/23026", url="https://www.jmir.org/2021/2/e23026", url="https://doi.org/10.2196/23026", url="http://www.ncbi.nlm.nih.gov/pubmed/33534724" } @Article{info:doi/10.2196/21037, author="Abrami, Avner and Gunzler, Steven and Kilbane, Camilla and Ostrand, Rachel and Ho, Bryan and Cecchi, Guillermo", title="Automated Computer Vision Assessment of Hypomimia in Parkinson Disease: Proof-of-Principle Pilot Study", journal="J Med Internet Res", year="2021", month="Feb", day="22", volume="23", number="2", pages="e21037", keywords="Parkinson disease; hypomimia; computer vision; telemedicine", abstract="Background: Facial expressions require the complex coordination of 43 different facial muscles. Parkinson disease (PD) affects facial musculature leading to ``hypomimia'' or ``masked facies.'' Objective: We aimed to determine whether modern computer vision techniques can be applied to detect masked facies and quantify drug states in PD. Methods: We trained a convolutional neural network on images extracted from videos of 107 self-identified people with PD, along with 1595 videos of controls, in order to detect PD hypomimia cues. This trained model was applied to clinical interviews of 35 PD patients in their on and off drug motor states, and seven journalist interviews of the actor Alan Alda obtained before and after he was diagnosed with PD. Results: The algorithm achieved a test set area under the receiver operating characteristic curve of 0.71 on 54 subjects to detect PD hypomimia, compared to a value of 0.75 for trained neurologists using the United Parkinson Disease Rating Scale-III Facial Expression score. Additionally, the model accuracy to classify the on and off drug states in the clinical samples was 63{\%} (22/35), in contrast to an accuracy of 46{\%} (16/35) when using clinical rater scores. Finally, each of Alan Alda's seven interviews were successfully classified as occurring before (versus after) his diagnosis, with 100{\%} accuracy (7/7). Conclusions: This proof-of-principle pilot study demonstrated that computer vision holds promise as a valuable tool for PD hypomimia and for monitoring a patient's motor state in an objective and noninvasive way, particularly given the increasing importance of telemedicine. ", doi="10.2196/21037", url="https://www.jmir.org/2021/2/e21037", url="https://doi.org/10.2196/21037", url="http://www.ncbi.nlm.nih.gov/pubmed/33616535" } @Article{info:doi/10.2196/26552, author="Lam, Kyle and Iqbal, Fahad M and Purkayastha, Sanjay and Kinross, James M", title="Investigating the Ethical and Data Governance Issues of Artificial Intelligence in Surgery: Protocol for a Delphi Study", journal="JMIR Res Protoc", year="2021", month="Feb", day="22", volume="10", number="2", pages="e26552", keywords="artificial intelligence; digital surgery; Delphi; ethics; data governance; digital technology; operating room; surgery", abstract="Background: The rapid uptake of digital technology into the operating room has the potential to improve patient outcomes, increase efficiency of the use of operating rooms, and allow surgeons to progress quickly up learning curves. These technologies are, however, dependent on huge amounts of data, and the consequences of their mismanagement are significant. While the field of artificial intelligence ethics is able to provide a broad framework for those designing and implementing these technologies into the operating room, there is a need to determine and address the ethical and data governance challenges of using digital technology in this unique environment. Objective: The objectives of this study are to define the term digital surgery and gain expert consensus on the key ethical and data governance issues, barriers, and future research goals of the use of artificial intelligence in surgery. Methods: Experts from the fields of surgery, ethics and law, policy, artificial intelligence, and industry will be invited to participate in a 4-round consensus Delphi exercise. In the first round, participants will supply free-text responses across 4 key domains: ethics, data governance, barriers, and future research goals. They will also be asked to provide their understanding of the term digital surgery. In subsequent rounds, statements will be grouped, and participants will be asked to rate the importance of each issue on a 9-point Likert scale ranging from 1 (not at all important) to 9 (critically important). Consensus is defined a priori as a score of 7 to 9 by 70{\%} of respondents and 1 to 3 by less than 30{\%} of respondents. A final online meeting round will be held to discuss inclusion of statements and draft a consensus document. Results: Full ethical approval has been obtained for the study by the local research ethics committee at Imperial College, London (20IC6136). We anticipate round 1 to commence in January 2021. Conclusions: The results of this study will define the term digital surgery, identify the key issues and barriers, and shape future research in this area. International Registered Report Identifier (IRRID): PRR1-10.2196/26552 ", doi="10.2196/26552", url="https://www.researchprotocols.org/2021/2/e26552", url="https://doi.org/10.2196/26552", url="http://www.ncbi.nlm.nih.gov/pubmed/33616543" } @Article{info:doi/10.2196/24221, author="Lennartz, Simon and Dratsch, Thomas and Zopfs, David and Persigehl, Thorsten and Maintz, David and Gro{\ss}e Hokamp, Nils and Pinto dos Santos, Daniel", title="Use and Control of Artificial Intelligence in Patients Across the Medical Workflow: Single-Center Questionnaire Study of Patient Perspectives", journal="J Med Internet Res", year="2021", month="Feb", day="17", volume="23", number="2", pages="e24221", keywords="artificial intelligence; clinical implementation; questionnaire; survey", abstract="Background: Artificial intelligence (AI) is gaining increasing importance in many medical specialties, yet data on patients' opinions on the use of AI in medicine are scarce. Objective: This study aimed to investigate patients' opinions on the use of AI in different aspects of the medical workflow and the level of control and supervision under which they would deem the application of AI in medicine acceptable. Methods: Patients scheduled for computed tomography or magnetic resonance imaging voluntarily participated in an anonymized questionnaire between February 10, 2020, and May 24, 2020. Patient information, confidence in physicians vs AI in different clinical tasks, opinions on the control of AI, preference in cases of disagreement between AI and physicians, and acceptance of the use of AI for diagnosing and treating diseases of different severity were recorded. Results: In total, 229 patients participated. Patients favored physicians over AI for all clinical tasks except for treatment planning based on current scientific evidence. In case of disagreement between physicians and AI regarding diagnosis and treatment planning, most patients preferred the physician's opinion to AI (96.2{\%} [153/159] vs 3.8{\%} [6/159] and 94.8{\%} [146/154] vs 5.2{\%} [8/154], respectively; P=.001). AI supervised by a physician was considered more acceptable than AI without physician supervision at diagnosis (confidence rating 3.90 [SD 1.20] vs 1.64 [SD 1.03], respectively; P=.001) and therapy (3.77 [SD 1.18] vs 1.57 [SD 0.96], respectively; P=.001). Conclusions: Patients favored physicians over AI in most clinical tasks and strongly preferred an application of AI with physician supervision. However, patients acknowledged that AI could help physicians integrate the most recent scientific evidence into medical care. Application of AI in medicine should be disclosed and controlled to protect patient interests and meet ethical standards. ", doi="10.2196/24221", url="http://www.jmir.org/2021/2/e24221/", url="https://doi.org/10.2196/24221", url="http://www.ncbi.nlm.nih.gov/pubmed/33595451" } @Article{info:doi/10.2196/24572, author="Quiroz, Juan Carlos and Feng, You-Zhen and Cheng, Zhong-Yuan and Rezazadegan, Dana and Chen, Ping-Kang and Lin, Qi-Ting and Qian, Long and Liu, Xiao-Fang and Berkovsky, Shlomo and Coiera, Enrico and Song, Lei and Qiu, Xiaoming and Liu, Sidong and Cai, Xiang-Ran", title="Development and Validation of a Machine Learning Approach for Automated Severity Assessment of COVID-19 Based on Clinical and Imaging Data: Retrospective Study", journal="JMIR Med Inform", year="2021", month="Feb", day="11", volume="9", number="2", pages="e24572", keywords="algorithm; clinical data; clinical features; COVID-19; CT scans; development; imaging; imbalanced data; machine learning; oversampling; severity assessment; validation", abstract="Background: COVID-19 has overwhelmed health systems worldwide. It is important to identify severe cases as early as possible, such that resources can be mobilized and treatment can be escalated. Objective: This study aims to develop a machine learning approach for automated severity assessment of COVID-19 based on clinical and imaging data. Methods: Clinical data---including demographics, signs, symptoms, comorbidities, and blood test results---and chest computed tomography scans of 346 patients from 2 hospitals in the Hubei Province, China, were used to develop machine learning models for automated severity assessment in diagnosed COVID-19 cases. We compared the predictive power of the clinical and imaging data from multiple machine learning models and further explored the use of four oversampling methods to address the imbalanced classification issue. Features with the highest predictive power were identified using the Shapley Additive Explanations framework. Results: Imaging features had the strongest impact on the model output, while a combination of clinical and imaging features yielded the best performance overall. The identified predictive features were consistent with those reported previously. Although oversampling yielded mixed results, it achieved the best model performance in our study. Logistic regression models differentiating between mild and severe cases achieved the best performance for clinical features (area under the curve [AUC] 0.848; sensitivity 0.455; specificity 0.906), imaging features (AUC 0.926; sensitivity 0.818; specificity 0.901), and a combination of clinical and imaging features (AUC 0.950; sensitivity 0.764; specificity 0.919). The synthetic minority oversampling method further improved the performance of the model using combined features (AUC 0.960; sensitivity 0.845; specificity 0.929). Conclusions: Clinical and imaging features can be used for automated severity assessment of COVID-19 and can potentially help triage patients with COVID-19 and prioritize care delivery to those at a higher risk of severe disease. ", doi="10.2196/24572", url="http://medinform.jmir.org/2021/2/e24572/", url="https://doi.org/10.2196/24572", url="http://www.ncbi.nlm.nih.gov/pubmed/33534723" } @Article{info:doi/10.2196/24246, author="Bolourani, Siavash and Brenner, Max and Wang, Ping and McGinn, Thomas and Hirsch, Jamie S and Barnaby, Douglas and Zanos, Theodoros P", title="A Machine Learning Prediction Model of Respiratory Failure Within 48 Hours of Patient Admission for COVID-19: Model Development and Validation", journal="J Med Internet Res", year="2021", month="Feb", day="10", volume="23", number="2", pages="e24246", keywords="artificial intelligence; prognostic; model; pandemic; severe acute respiratory syndrome coronavirus 2; modeling; development; validation; COVID-19; machine learning", abstract="Background: Predicting early respiratory failure due to COVID-19 can help triage patients to higher levels of care, allocate scarce resources, and reduce morbidity and mortality by appropriately monitoring and treating the patients at greatest risk for deterioration. Given the complexity of COVID-19, machine learning approaches may support clinical decision making for patients with this disease. Objective: Our objective is to derive a machine learning model that predicts respiratory failure within 48 hours of admission based on data from the emergency department. Methods: Data were collected from patients with COVID-19 who were admitted to Northwell Health acute care hospitals and were discharged, died, or spent a minimum of 48 hours in the hospital between March 1 and May 11, 2020. Of 11,525 patients, 933 (8.1{\%}) were placed on invasive mechanical ventilation within 48 hours of admission. Variables used by the models included clinical and laboratory data commonly collected in the emergency department. We trained and validated three predictive models (two based on XGBoost and one that used logistic regression) using cross-hospital validation. We compared model performance among all three models as well as an established early warning score (Modified Early Warning Score) using receiver operating characteristic curves, precision-recall curves, and other metrics. Results: The XGBoost model had the highest mean accuracy (0.919; area under the curve=0.77), outperforming the other two models as well as the Modified Early Warning Score. Important predictor variables included the type of oxygen delivery used in the emergency department, patient age, Emergency Severity Index level, respiratory rate, serum lactate, and demographic characteristics. Conclusions: The XGBoost model had high predictive accuracy, outperforming other early warning scores. The clinical plausibility and predictive ability of XGBoost suggest that the model could be used to predict 48-hour respiratory failure in admitted patients with COVID-19. ", doi="10.2196/24246", url="http://www.jmir.org/2021/2/e24246/", url="https://doi.org/10.2196/24246", url="http://www.ncbi.nlm.nih.gov/pubmed/33476281" } @Article{info:doi/10.2196/23693, author="Albahli, Saleh and Yar, Ghulam Nabi Ahmad Hassan", title="Fast and Accurate Detection of COVID-19 Along With 14 Other Chest Pathologies Using a Multi-Level Classification: Algorithm Development and Validation Study", journal="J Med Internet Res", year="2021", month="Feb", day="10", volume="23", number="2", pages="e23693", keywords="COVID-19; chest x-ray; convolutional neural network; data augmentation; biomedical imaging; automatic detection", abstract="Background: COVID-19 has spread very rapidly, and it is important to build a system that can detect it in order to help an overwhelmed health care system. Many research studies on chest diseases rely on the strengths of deep learning techniques. Although some of these studies used state-of-the-art techniques and were able to deliver promising results, these techniques are not very useful if they can detect only one type of disease without detecting the others. Objective: The main objective of this study was to achieve a fast and more accurate diagnosis of COVID-19. This study proposes a diagnostic technique that classifies COVID-19 x-ray images from normal x-ray images and those specific to 14 other chest diseases. Methods: In this paper, we propose a novel, multilevel pipeline, based on deep learning models, to detect COVID-19 along with other chest diseases based on x-ray images. This pipeline reduces the burden of a single network to classify a large number of classes. The deep learning models used in this study were pretrained on the ImageNet dataset, and transfer learning was used for fast training. The lungs and heart were segmented from the whole x-ray images and passed onto the first classifier that checks whether the x-ray is normal, COVID-19 affected, or characteristic of another chest disease. If it is neither a COVID-19 x-ray image nor a normal one, then the second classifier comes into action and classifies the image as one of the other 14 diseases. Results: We show how our model uses state-of-the-art deep neural networks to achieve classification accuracy for COVID-19 along with 14 other chest diseases and normal cases based on x-ray images, which is competitive with currently used state-of-the-art models. Due to the lack of data in some classes such as COVID-19, we applied 10-fold cross-validation through the ResNet50 model. Our classification technique thus achieved an average training accuracy of 96.04{\%} and test accuracy of 92.52{\%} for the first level of classification (ie, 3 classes). For the second level of classification (ie, 14 classes), our technique achieved a maximum training accuracy of 88.52{\%} and test accuracy of 66.634{\%} by using ResNet50. We also found that when all the 16 classes were classified at once, the overall accuracy for COVID-19 detection decreased, which in the case of ResNet50 was 88.92{\%} for training data and 71.905{\%} for test data. Conclusions: Our proposed pipeline can detect COVID-19 with a higher accuracy along with detecting 14 other chest diseases based on x-ray images. This is achieved by dividing the classification task into multiple steps rather than classifying them collectively. ", doi="10.2196/23693", url="http://www.jmir.org/2021/2/e23693/", url="https://doi.org/10.2196/23693", url="http://www.ncbi.nlm.nih.gov/pubmed/33529154" } @Article{info:doi/10.2196/22320, author="Pham, Quynh and Gamble, Anissa and Hearn, Jason and Cafazzo, Joseph A", title="The Need for Ethnoracial Equity in Artificial Intelligence for Diabetes Management: Review and Recommendations", journal="J Med Internet Res", year="2021", month="Feb", day="10", volume="23", number="2", pages="e22320", keywords="diabetes; artificial intelligence; digital health; ethnoracial equity; ethnicity; race", doi="10.2196/22320", url="http://www.jmir.org/2021/2/e22320/", url="https://doi.org/10.2196/22320", url="http://www.ncbi.nlm.nih.gov/pubmed/33565982" } @Article{info:doi/10.2196/22164, author="Bhalodiya, Jayendra Maganbhai and Palit, Arnab and Giblin, Gerard and Tiwari, Manoj Kumar and Prasad, Sanjay K and Bhudia, Sunil K and Arvanitis, Theodoros N and Williams, Mark A", title="Identifying Myocardial Infarction Using Hierarchical Template Matching--Based Myocardial Strain: Algorithm Development and Usability Study", journal="JMIR Med Inform", year="2021", month="Feb", day="10", volume="9", number="2", pages="e22164", keywords="left ventricle; myocardial infarction; myocardium; strain", abstract="Background: Myocardial infarction (MI; location and extent of infarction) can be determined by late enhancement cardiac magnetic resonance (CMR) imaging, which requires the injection of a potentially harmful gadolinium-based contrast agent (GBCA). Alternatively, emerging research in the area of myocardial strain has shown potential to identify MI using strain values. Objective: This study aims to identify the location of MI by developing an applied algorithmic method of circumferential strain (CS) values, which are derived through a novel hierarchical template matching (HTM) method. Methods: HTM-based CS H-spread from end-diastole to end-systole was used to develop an applied method. Grid-tagging magnetic resonance imaging was used to calculate strain values in the left ventricular (LV) myocardium, followed by the 16-segment American Heart Association model. The data set was used with k-fold cross-validation to estimate the percentage reduction of H-spread among infarcted and noninfarcted LV segments. A total of 43 participants (38 MI and 5 healthy) who underwent CMR imaging were retrospectively selected. Infarcted segments detected by using this method were validated by comparison with late enhancement CMR, and the diagnostic performance of the applied algorithmic method was evaluated with a receiver operating characteristic curve test. Results: The H-spread of the CS was reduced in infarcted segments compared with noninfarcted segments of the LV. The reductions were 30{\%} in basal segments, 30{\%} in midventricular segments, and 20{\%} in apical LV segments. The diagnostic accuracy of detection, using the reported method, was represented by area under the curve values, which were 0.85, 0.82, and 0.87 for basal, midventricular, and apical slices, respectively, demonstrating good agreement with the late-gadolinium enhancement--based detections. Conclusions: The proposed applied algorithmic method has the potential to accurately identify the location of infarcted LV segments without the administration of late-gadolinium enhancement. Such an approach adds the potential to safely identify MI, potentially reduce patient scanning time, and extend the utility of CMR in patients who are contraindicated for the use of GBCA. ", doi="10.2196/22164", url="https://medinform.jmir.org/2021/2/e22164", url="https://doi.org/10.2196/22164", url="http://www.ncbi.nlm.nih.gov/pubmed/33565992" } @Article{info:doi/10.2196/25935, author="Bernardo, Theresa and Sobkowich, Kurtis Edward and Forrest, Russell Othmer and Stewart, Luke Silva and D'Agostino, Marcelo and Perez Gutierrez, Enrique and Gillis, Daniel", title="Collaborating in the Time of COVID-19: The Scope and Scale of Innovative Responses to a Global Pandemic", journal="JMIR Public Health Surveill", year="2021", month="Feb", day="9", volume="7", number="2", pages="e25935", keywords="crowdsourcing; artificial intelligence; collaboration; personal protective equipment; big data; AI; COVID-19; innovation; information sharing; communication; teamwork; knowledge; dissemination", doi="10.2196/25935", url="http://publichealth.jmir.org/2021/2/e25935/", url="https://doi.org/10.2196/25935", url="http://www.ncbi.nlm.nih.gov/pubmed/33503001" } @Article{info:doi/10.2196/25184, author="Sato, Ann and Haneda, Eri and Suganuma, Nobuyasu and Narimatsu, Hiroto", title="Preliminary Screening for Hereditary Breast and Ovarian Cancer Using a Chatbot Augmented Intelligence Genetic Counselor: Development and Feasibility Study", journal="JMIR Form Res", year="2021", month="Feb", day="5", volume="5", number="2", pages="e25184", keywords="artificial intelligence; augmented intelligence; hereditary cancer; familial cancer; IBM Watson; preliminary screening; cancer; genetics; chatbot; screening; feasibility", abstract="Background: Breast cancer is the most common form of cancer in Japan; genetic background and hereditary breast and ovarian cancer (HBOC) are implicated. The key to HBOC diagnosis involves screening to identify high-risk individuals. However, genetic medicine is still developing; thus, many patients who may potentially benefit from genetic medicine have not yet been identified. Objective: This study's objective is to develop a chatbot system that uses augmented intelligence for HBOC screening to determine whether patients meet the National Comprehensive Cancer Network (NCCN) BRCA1/2 testing criteria. Methods: The system was evaluated by a doctor specializing in genetic medicine and certified genetic counselors. We prepared 3 scenarios and created a conversation with the chatbot to reflect each one. Then we evaluated chatbot feasibility, the required time, the medical accuracy of conversations and family history, and the final result. Results: The times required for the conversation were 7 minutes for scenario 1, 15 minutes for scenario 2, and 16 minutes for scenario 3. Scenarios 1 and 2 met the BRCA1/2 testing criteria, but scenario 3 did not, and this result was consistent with the findings of 3 experts who retrospectively reviewed conversations with the chatbot according to the 3 scenarios. A family history comparison ascertained by the chatbot with the actual scenarios revealed that each result was consistent with each scenario. From a genetic medicine perspective, no errors were noted by the 3 experts. Conclusions: This study demonstrated that chatbot systems could be applied to preliminary genetic medicine screening for HBOC. ", doi="10.2196/25184", url="https://formative.jmir.org/2021/2/e25184", url="https://doi.org/10.2196/25184", url="http://www.ncbi.nlm.nih.gov/pubmed/33544084" } @Article{info:doi/10.2196/25187, author="Muralitharan, Sankavi and Nelson, Walter and Di, Shuang and McGillion, Michael and Devereaux, PJ and Barr, Neil Grant and Petch, Jeremy", title="Machine Learning--Based Early Warning Systems for Clinical Deterioration: Systematic Scoping Review", journal="J Med Internet Res", year="2021", month="Feb", day="4", volume="23", number="2", pages="e25187", keywords="machine learning; early warning systems; clinical deterioration; ambulatory care; acute care; remote patient monitoring; vital signs; sepsis; cardiorespiratory instability; risk prediction", abstract="Background: Timely identification of patients at a high risk of clinical deterioration is key to prioritizing care, allocating resources effectively, and preventing adverse outcomes. Vital signs--based, aggregate-weighted early warning systems are commonly used to predict the risk of outcomes related to cardiorespiratory instability and sepsis, which are strong predictors of poor outcomes and mortality. Machine learning models, which can incorporate trends and capture relationships among parameters that aggregate-weighted models cannot, have recently been showing promising results. Objective: This study aimed to identify, summarize, and evaluate the available research, current state of utility, and challenges with machine learning--based early warning systems using vital signs to predict the risk of physiological deterioration in acutely ill patients, across acute and ambulatory care settings. Methods: PubMed, CINAHL, Cochrane Library, Web of Science, Embase, and Google Scholar were searched for peer-reviewed, original studies with keywords related to ``vital signs,'' ``clinical deterioration,'' and ``machine learning.'' Included studies used patient vital signs along with demographics and described a machine learning model for predicting an outcome in acute and ambulatory care settings. Data were extracted following PRISMA, TRIPOD, and Cochrane Collaboration guidelines. Results: We identified 24 peer-reviewed studies from 417 articles for inclusion; 23 studies were retrospective, while 1 was prospective in nature. Care settings included general wards, intensive care units, emergency departments, step-down units, medical assessment units, postanesthetic wards, and home care. Machine learning models including logistic regression, tree-based methods, kernel-based methods, and neural networks were most commonly used to predict the risk of deterioration. The area under the curve for models ranged from 0.57 to 0.97. Conclusions: In studies that compared performance, reported results suggest that machine learning--based early warning systems can achieve greater accuracy than aggregate-weighted early warning systems but several areas for further research were identified. While these models have the potential to provide clinical decision support, there is a need for standardized outcome measures to allow for rigorous evaluation of performance across models. Further research needs to address the interpretability of model outputs by clinicians, clinical efficacy of these systems through prospective study design, and their potential impact in different clinical settings. ", doi="10.2196/25187", url="https://www.jmir.org/2021/2/e25187", url="https://doi.org/10.2196/25187", url="http://www.ncbi.nlm.nih.gov/pubmed/33538696" } @Article{info:doi/10.2196/23436, author="Schmitt, Max and Maron, Roman Christoph and Hekler, Achim and Stenzinger, Albrecht and Hauschild, Axel and Weichenthal, Michael and Tiemann, Markus and Krahl, Dieter and Kutzner, Heinz and Utikal, Jochen Sven and Haferkamp, Sebastian and Kather, Jakob Nikolas and Klauschen, Frederick and Krieghoff-Henning, Eva and Fr{\"o}hling, Stefan and von Kalle, Christof and Brinker, Titus Josef", title="Hidden Variables in Deep Learning Digital Pathology and Their Potential to Cause Batch Effects: Prediction Model Study", journal="J Med Internet Res", year="2021", month="Feb", day="2", volume="23", number="2", pages="e23436", keywords="artificial intelligence; machine learning; deep learning; neural networks; convolutional neural networks; pathology; clinical pathology; digital pathology; pitfalls; artifacts", abstract="Background: An increasing number of studies within digital pathology show the potential of artificial intelligence (AI) to diagnose cancer using histological whole slide images, which requires large and diverse data sets. While diversification may result in more generalizable AI-based systems, it can also introduce hidden variables. If neural networks are able to distinguish/learn hidden variables, these variables can introduce batch effects that compromise the accuracy of classification systems. Objective: The objective of the study was to analyze the learnability of an exemplary selection of hidden variables (patient age, slide preparation date, slide origin, and scanner type) that are commonly found in whole slide image data sets in digital pathology and could create batch effects. Methods: We trained four separate convolutional neural networks (CNNs) to learn four variables using a data set of digitized whole slide melanoma images from five different institutes. For robustness, each CNN training and evaluation run was repeated multiple times, and a variable was only considered learnable if the lower bound of the 95{\%} confidence interval of its mean balanced accuracy was above 50.0{\%}. Results: A mean balanced accuracy above 50.0{\%} was achieved for all four tasks, even when considering the lower bound of the 95{\%} confidence interval. Performance between tasks showed wide variation, ranging from 56.1{\%} (slide preparation date) to 100{\%} (slide origin). Conclusions: Because all of the analyzed hidden variables are learnable, they have the potential to create batch effects in dermatopathology data sets, which negatively affect AI-based classification systems. Practitioners should be aware of these and similar pitfalls when developing and evaluating such systems and address these and potentially other batch effect variables in their data sets through sufficient data set stratification. ", doi="10.2196/23436", url="https://www.jmir.org/2021/2/e23436", url="https://doi.org/10.2196/23436", url="http://www.ncbi.nlm.nih.gov/pubmed/33528370" } @Article{info:doi/10.2196/23933, author="Buchanan, Christine and Howitt, M Lyndsay and Wilson, Rita and Booth, Richard G and Risling, Tracie and Bamford, Megan", title="Predicted Influences of Artificial Intelligence on Nursing Education: Scoping Review", journal="JMIR Nursing", year="2021", month="Jan", day="28", volume="4", number="1", pages="e23933", keywords="nursing; artificial intelligence; education; review", abstract="Background: It is predicted that artificial intelligence (AI) will transform nursing across all domains of nursing practice, including administration, clinical care, education, policy, and research. Increasingly, researchers are exploring the potential influences of AI health technologies (AIHTs) on nursing in general and on nursing education more specifically. However, little emphasis has been placed on synthesizing this body of literature. Objective: A scoping review was conducted to summarize the current and predicted influences of AIHTs on nursing education over the next 10 years and beyond. Methods: This scoping review followed a previously published protocol from April 2020. Using an established scoping review methodology, the databases of MEDLINE, Cumulative Index to Nursing and Allied Health Literature, Embase, PsycINFO, Cochrane Database of Systematic Reviews, Cochrane Central, Education Resources Information Centre, Scopus, Web of Science, and Proquest were searched. In addition to the use of these electronic databases, a targeted website search was performed to access relevant grey literature. Abstracts and full-text studies were independently screened by two reviewers using prespecified inclusion and exclusion criteria. Included literature focused on nursing education and digital health technologies that incorporate AI. Data were charted using a structured form and narratively summarized into categories. Results: A total of 27 articles were identified (20 expository papers, six studies with quantitative or prototyping methods, and one qualitative study). The population included nurses, nurse educators, and nursing students at the entry-to-practice, undergraduate, graduate, and doctoral levels. A variety of AIHTs were discussed, including virtual avatar apps, smart homes, predictive analytics, virtual or augmented reality, and robots. The two key categories derived from the literature were (1) influences of AI on nursing education in academic institutions and (2) influences of AI on nursing education in clinical practice. Conclusions: Curricular reform is urgently needed within nursing education programs in academic institutions and clinical practice settings to prepare nurses and nursing students to practice safely and efficiently in the age of AI. Additionally, nurse educators need to adopt new and evolving pedagogies that incorporate AI to better support students at all levels of education. Finally, nursing students and practicing nurses must be equipped with the requisite knowledge and skills to effectively assess AIHTs and safely integrate those deemed appropriate to support person-centered compassionate nursing care in practice settings. International Registered Report Identifier (IRRID): RR2-10.2196/17490 ", doi="10.2196/23933", url="https://nursing.jmir.org/2021/1/e23933/", url="https://doi.org/10.2196/23933" } @Article{info:doi/10.2196/24973, author="Ho, Thao Thi and Park, Jongmin and Kim, Taewoo and Park, Byunggeon and Lee, Jaehee and Kim, Jin Young and Kim, Ki Beom and Choi, Sooyoung and Kim, Young Hwan and Lim, Jae-Kwang and Choi, Sanghun", title="Deep Learning Models for Predicting Severe Progression in COVID-19-Infected Patients: Retrospective Study", journal="JMIR Med Inform", year="2021", month="Jan", day="28", volume="9", number="1", pages="e24973", keywords="COVID-19; deep learning; artificial neural network; convolutional neural network; lung CT", abstract="Background: Many COVID-19 patients rapidly progress to respiratory failure with a broad range of severities. Identification of high-risk cases is critical for early intervention. Objective: The aim of this study is to develop deep learning models that can rapidly identify high-risk COVID-19 patients based on computed tomography (CT) images and clinical data. Methods: We analyzed 297 COVID-19 patients from five hospitals in Daegu, South Korea. A mixed artificial convolutional neural network (ACNN) model, combining an artificial neural network for clinical data and a convolutional neural network for 3D CT imaging data, was developed to classify these cases as either high risk of severe progression (ie, event) or low risk (ie, event-free). Results: Using the mixed ACNN model, we were able to obtain high classification performance using novel coronavirus pneumonia lesion images (ie, 93.9{\%} accuracy, 80.8{\%} sensitivity, 96.9{\%} specificity, and 0.916 area under the curve [AUC] score) and lung segmentation images (ie, 94.3{\%} accuracy, 74.7{\%} sensitivity, 95.9{\%} specificity, and 0.928 AUC score) for event versus event-free groups. Conclusions: Our study successfully differentiated high-risk cases among COVID-19 patients using imaging and clinical features. The developed model can be used as a predictive tool for interventions in aggressive therapies. ", doi="10.2196/24973", url="http://medinform.jmir.org/2021/1/e24973/", url="https://doi.org/10.2196/24973", url="http://www.ncbi.nlm.nih.gov/pubmed/33455900" } @Article{info:doi/10.2196/24924, author="Wang, Hanxue and Cui, Wenjuan and Guo, Yunchang and Du, Yi and Zhou, Yuanchun", title="Machine Learning Prediction of Foodborne Disease Pathogens: Algorithm Development and Validation Study", journal="JMIR Med Inform", year="2021", month="Jan", day="26", volume="9", number="1", pages="e24924", keywords="foodborne disease; pathogens prediction; machine learning", abstract="Background: Foodborne diseases have a high global incidence; thus, they place a heavy burden on public health and the social economy. Foodborne pathogens, as the main factor of foodborne diseases, play an important role in the treatment and prevention of foodborne diseases; however, foodborne diseases caused by different pathogens lack specificity in their clinical features, and there is a low proportion of actual clinical pathogen detection in real life. Objective: We aimed to analyze foodborne disease case data, select appropriate features based on analysis results, and use machine learning methods to classify foodborne disease pathogens to predict foodborne disease pathogens for cases where the pathogen is not known or tested. Methods: We extracted features such as space, time, and exposed food from foodborne disease case data and analyzed the relationships between these features and the foodborne disease pathogens using a variety of machine learning methods to classify foodborne disease pathogens. We compared the results of four models to obtain the pathogen prediction model with the highest accuracy. Results: The gradient boost decision tree model obtained the highest accuracy, with accuracy approaching 69{\%} in identifying 4 pathogens: Salmonella, Norovirus, Escherichia coli, and Vibrio parahaemolyticus. By evaluating the importance of features such as time of illness, geographical longitude and latitude, and diarrhea frequency, we found that these features play important roles in classifying foodborne disease pathogens. Conclusions: Data analysis can reflect the distribution of some features of foodborne diseases and the relationships among the features. The classification of pathogens based on the analysis results and machine learning methods can provide beneficial support for clinical auxiliary diagnosis and treatment of foodborne diseases. ", doi="10.2196/24924", url="http://medinform.jmir.org/2021/1/e24924/", url="https://doi.org/10.2196/24924", url="http://www.ncbi.nlm.nih.gov/pubmed/33496675" } @Article{info:doi/10.2196/19739, author="Diao, Xiaolin and Huo, Yanni and Yan, Zhanzheng and Wang, Haibin and Yuan, Jing and Wang, Yuxin and Cai, Jun and Zhao, Wei", title="An Application of Machine Learning to Etiological Diagnosis of Secondary Hypertension: Retrospective Study Using Electronic Medical Records", journal="JMIR Med Inform", year="2021", month="Jan", day="25", volume="9", number="1", pages="e19739", keywords="secondary hypertension; etiological diagnosis; machine learning; prediction model", abstract="Background: Secondary hypertension is a kind of hypertension with a definite etiology and may be cured. Patients with suspected secondary hypertension can benefit from timely detection and treatment and, conversely, will have a higher risk of morbidity and mortality than those with primary hypertension. Objective: The aim of this study was to develop and validate machine learning (ML) prediction models of common etiologies in patients with suspected secondary hypertension. Methods: The analyzed data set was retrospectively extracted from electronic medical records of patients discharged from Fuwai Hospital between January 1, 2016, and June 30, 2019. A total of 7532 unique patients were included and divided into 2 data sets by time: 6302 patients in 2016-2018 as the training data set for model building and 1230 patients in 2019 as the validation data set for further evaluation. Extreme Gradient Boosting (XGBoost) was adopted to develop 5 models to predict 4 etiologies of secondary hypertension and occurrence of any of them (named as composite outcome), including renovascular hypertension (RVH), primary aldosteronism (PA), thyroid dysfunction, and aortic stenosis. Both univariate logistic analysis and Gini Impurity were used for feature selection. Grid search and 10-fold cross-validation were used to select the optimal hyperparameters for each model. Results: Validation of the composite outcome prediction model showed good performance with an area under the receiver-operating characteristic curve (AUC) of 0.924 in the validation data set, while the 4 prediction models of RVH, PA, thyroid dysfunction, and aortic stenosis achieved AUC of 0.938, 0.965, 0.959, and 0.946, respectively, in the validation data set. A total of 79 clinical indicators were identified in all and finally used in our prediction models. The result of subgroup analysis on the composite outcome prediction model demonstrated high discrimination with AUCs all higher than 0.890 among all age groups of adults. Conclusions: The ML prediction models in this study showed good performance in detecting 4 etiologies of patients with suspected secondary hypertension; thus, they may potentially facilitate clinical diagnosis decision making of secondary hypertension in an intelligent way. ", doi="10.2196/19739", url="http://medinform.jmir.org/2021/1/e19739/", url="https://doi.org/10.2196/19739", url="http://www.ncbi.nlm.nih.gov/pubmed/33492233" } @Article{info:doi/10.2196/20123, author="Boutilier, Justin J and Chan, Timothy C Y and Ranjan, Manish and Deo, Sarang", title="Risk Stratification for Early Detection of Diabetes and Hypertension in Resource-Limited Settings: Machine Learning Analysis", journal="J Med Internet Res", year="2021", month="Jan", day="21", volume="23", number="1", pages="e20123", keywords="machine learning; diabetes; hypertension; screening; global health", abstract="Background: The impending scale up of noncommunicable disease screening programs in low- and middle-income countries coupled with limited health resources require that such programs be as accurate as possible at identifying patients at high risk. Objective: The aim of this study was to develop machine learning--based risk stratification algorithms for diabetes and hypertension that are tailored for the at-risk population served by community-based screening programs in low-resource settings. Methods: We trained and tested our models by using data from 2278 patients collected by community health workers through door-to-door and camp-based screenings in the urban slums of Hyderabad, India between July 14, 2015 and April 21, 2018. We determined the best models for predicting short-term (2-month) risk of diabetes and hypertension (a model for diabetes and a model for hypertension) and compared these models to previously developed risk scores from the United States and the United Kingdom by using prediction accuracy as characterized by the area under the receiver operating characteristic curve (AUC) and the number of false negatives. Results: We found that models based on random forest had the highest prediction accuracy for both diseases and were able to outperform the US and UK risk scores in terms of AUC by 35.5{\%} for diabetes (improvement of 0.239 from 0.671 to 0.910) and 13.5{\%} for hypertension (improvement of 0.094 from 0.698 to 0.792). For a fixed screening specificity of 0.9, the random forest model was able to reduce the expected number of false negatives by 620 patients per 1000 screenings for diabetes and 220 patients per 1000 screenings for hypertension. This improvement reduces the cost of incorrect risk stratification by US {\$}1.99 (or 35{\%}) per screening for diabetes and US {\$}1.60 (or 21{\%}) per screening for hypertension. Conclusions: In the next decade, health systems in many countries are planning to spend significant resources on noncommunicable disease screening programs and our study demonstrates that machine learning models can be leveraged by these programs to effectively utilize limited resources by improving risk stratification. ", doi="10.2196/20123", url="http://www.jmir.org/2021/1/e20123/", url="https://doi.org/10.2196/20123", url="http://www.ncbi.nlm.nih.gov/pubmed/33475518" } @Article{info:doi/10.2196/24618, author="Lu, Yingjie and Luo, Shuwen and Liu, Xuan", title="Development of Social Support Networks by Patients With Depression Through Online Health Communities: Social Network Analysis", journal="JMIR Med Inform", year="2021", month="Jan", day="7", volume="9", number="1", pages="e24618", keywords="online depression community; social support network; exponential random graph model; informational support; emotional support; mental health; depression; social network", abstract="Background: In recent years, people with mental health problems are increasingly using online social networks to receive social support. For example, in online depression communities, patients can share their experiences, exchange valuable information, and receive emotional support to help them cope with their disease. Therefore, it is critical to understand how patients with depression develop online social support networks to exchange informational and emotional support. Objective: Our aim in this study was to investigate which user attributes have significant effects on the formation of informational and emotional support networks in online depression communities and to further examine whether there is an association between the two social networks. Methods: We used social network theory and constructed exponential random graph models to help understand the informational and emotional support networks in online depression communities. A total of 74,986 original posts were retrieved from 1077 members in an online depression community in China from April 2003 to September 2017 and the available data were extracted. An informational support network of 1077 participant nodes and 6557 arcs and an emotional support network of 1077 participant nodes and 6430 arcs were constructed to examine the endogenous (purely structural) effects and exogenous (actor-relation) effects on each support network separately, as well as the cross-network effects between the two networks. Results: We found significant effects of two important structural features, reciprocity and transitivity, on the formation of both the informational support network (r=3.6247, P<.001, and r=1.6232, P<.001, respectively) and the emotional support network (r=4.4111, P<.001, and r=0.0177, P<.001, respectively). The results also showed significant effects of some individual factors on the formation of the two networks. No significant effects of homophily were found for gender (r=0.0783, P=.20, and r=0.1122, P=.25, respectively) in the informational or emotional support networks. There was no tendency for users who had great influence (r=0.3253, P=.05) or wrote more posts (r=0.3896, P=.07) or newcomers (r=--0.0452, P=.66) to form informational support ties more easily. However, users who spent more time online (r=0.6680, P<.001) or provided more replies to other posts (r=0.5026, P<.001) were more likely to form informational support ties. Users who had a big influence (r=0.8325, P<.001), spent more time online (r=0.5839, P<.001), wrote more posts (r=2.4025, P<.001), or provided more replies to other posts (r=0.2259, P<.001) were more likely to form emotional support ties, and newcomers (r=--0.4224, P<.001) were less likely than old-timers to receive emotional support. In addition, we found that there was a significant entrainment effect (r=0.7834, P<.001) and a nonsignificant exchange effect (r=--0.2757, P=.32) between the two networks. Conclusions: This study makes several important theoretical contributions to the research on online depression communities and has important practical implications for the managers of online depression communities and the users involved in these communities. ", doi="10.2196/24618", url="http://medinform.jmir.org/2021/1/e24618/", url="https://doi.org/10.2196/24618", url="http://www.ncbi.nlm.nih.gov/pubmed/33279878" } @Article{info:doi/10.2196/21453, author="Leung, Yvonne W and Wouterloot, Elise and Adikari, Achini and Hirst, Graeme and de Silva, Daswin and Wong, Jiahui and Bender, Jacqueline L and Gancarz, Mathew and Gratzer, David and Alahakoon, Damminda and Esplen, Mary Jane", title="Natural Language Processing--Based Virtual Cofacilitator for Online Cancer Support Groups: Protocol for an Algorithm Development and Validation Study", journal="JMIR Res Protoc", year="2021", month="Jan", day="7", volume="10", number="1", pages="e21453", keywords="artificial intelligence; cancer; online support groups; emotional distress; natural language processing; participant engagement", abstract="Background: Cancer and its treatment can significantly impact the short- and long-term psychological well-being of patients and families. Emotional distress and depressive symptomatology are often associated with poor treatment adherence, reduced quality of life, and higher mortality. Cancer support groups, especially those led by health care professionals, provide a safe place for participants to discuss fear, normalize stress reactions, share solidarity, and learn about effective strategies to build resilience and enhance coping. However, in-person support groups may not always be accessible to individuals; geographic distance is one of the barriers for access, and compromised physical condition (eg, fatigue, pain) is another. Emerging evidence supports the effectiveness of online support groups in reducing access barriers. Text-based and professional-led online support groups have been offered by Cancer Chat Canada. Participants join the group discussion using text in real time. However, therapist leaders report some challenges leading text-based online support groups in the absence of visual cues, particularly in tracking participant distress. With multiple participants typing at the same time, the nuances of the text messages or red flags for distress can sometimes be missed. Recent advances in artificial intelligence such as deep learning--based natural language processing offer potential solutions. This technology can be used to analyze online support group text data to track participants' expressed emotional distress, including fear, sadness, and hopelessness. Artificial intelligence allows session activities to be monitored in real time and alerts the therapist to participant disengagement. Objective: We aim to develop and evaluate an artificial intelligence--based cofacilitator prototype to track and monitor online support group participants' distress through real-time analysis of text-based messages posted during synchronous sessions. Methods: An artificial intelligence--based cofacilitator will be developed to identify participants who are at-risk for increased emotional distress and track participant engagement and in-session group cohesion levels, providing real-time alerts for therapist to follow-up; generate postsession participant profiles that contain discussion content keywords and emotion profiles for each session; and automatically suggest tailored resources to participants according to their needs. The study is designed to be conducted in 4 phases consisting of (1) development based on a subset of data and an existing natural language processing framework, (2) performance evaluation using human scoring, (3) beta testing, and (4) user experience evaluation. Results: This study received ethics approval in August 2019. Phase 1, development of an artificial intelligence--based cofacilitator, was completed in January 2020. As of December 2020, phase 2 is underway. The study is expected to be completed by September 2021. Conclusions: An artificial intelligence--based cofacilitator offers a promising new mode of delivery of person-centered online support groups tailored to individual needs. International Registered Report Identifier (IRRID): DERR1-10.2196/21453 ", doi="10.2196/21453", url="https://www.researchprotocols.org/2021/1/e21453", url="https://doi.org/10.2196/21453", url="http://www.ncbi.nlm.nih.gov/pubmed/33410754" } @Article{info:doi/10.2196/19928, author="Fan, Xiangmin and Chao, Daren and Zhang, Zhan and Wang, Dakuo and Li, Xiaohua and Tian, Feng", title="Utilization of Self-Diagnosis Health Chatbots in Real-World Settings: Case Study", journal="J Med Internet Res", year="2021", month="Jan", day="6", volume="23", number="1", pages="e19928", keywords="self-diagnosis; chatbot; conversational agent; human--artificial intelligence interaction; artificial intelligence; diagnosis; case study; eHealth; real world; user experience", abstract="Background: Artificial intelligence (AI)-driven chatbots are increasingly being used in health care, but most chatbots are designed for a specific population and evaluated in controlled settings. There is little research documenting how health consumers (eg, patients and caregivers) use chatbots for self-diagnosis purposes in real-world scenarios. Objective: The aim of this research was to understand how health chatbots are used in a real-world context, what issues and barriers exist in their usage, and how the user experience of this novel technology can be improved. Methods: We employed a data-driven approach to analyze the system log of a widely deployed self-diagnosis chatbot in China. Our data set consisted of 47,684 consultation sessions initiated by 16,519 users over 6 months. The log data included a variety of information, including users' nonidentifiable demographic information, consultation details, diagnostic reports, and user feedback. We conducted both statistical analysis and content analysis on this heterogeneous data set. Results: The chatbot users spanned all age groups, including middle-aged and older adults. Users consulted the chatbot on a wide range of medical conditions, including those that often entail considerable privacy and social stigma issues. Furthermore, we distilled 2 prominent issues in the use of the chatbot: (1) a considerable number of users dropped out in the middle of their consultation sessions, and (2) some users pretended to have health concerns and used the chatbot for nontherapeutic purposes. Finally, we identified a set of user concerns regarding the use of the chatbot, including insufficient actionable information and perceived inaccurate diagnostic suggestions. Conclusions: Although health chatbots are considered to be convenient tools for enhancing patient-centered care, there are issues and barriers impeding the optimal use of this novel technology. Designers and developers should employ user-centered approaches to address the issues and user concerns to achieve the best uptake and utilization. We conclude the paper by discussing several design implications, including making the chatbots more informative, easy-to-use, and trustworthy, as well as improving the onboarding experience to enhance user engagement. ", doi="10.2196/19928", url="https://www.jmir.org/2021/1/e19928", url="https://doi.org/10.2196/19928", url="http://www.ncbi.nlm.nih.gov/pubmed/33404508" } @Article{info:doi/10.2196/21965, author="Luo, Gang and Johnson, Michael D and Nkoy, Flory L and He, Shan and Stone, Bryan L", title="Automatically Explaining Machine Learning Prediction Results on Asthma Hospital Visits in Patients With Asthma: Secondary Analysis", journal="JMIR Med Inform", year="2020", month="Dec", day="31", volume="8", number="12", pages="e21965", keywords="asthma; forecasting; machine learning; patient care management", abstract="Background: Asthma is a major chronic disease that poses a heavy burden on health care. To facilitate the allocation of care management resources aimed at improving outcomes for high-risk patients with asthma, we recently built a machine learning model to predict asthma hospital visits in the subsequent year in patients with asthma. Our model is more accurate than previous models. However, like most machine learning models, it offers no explanation of its prediction results. This creates a barrier for use in care management, where interpretability is desired. Objective: This study aims to develop a method to automatically explain the prediction results of the model and recommend tailored interventions without lowering the performance measures of the model. Methods: Our data were imbalanced, with only a small portion of data instances linking to future asthma hospital visits. To handle imbalanced data, we extended our previous method of automatically offering rule-formed explanations for the prediction results of any machine learning model on tabular data without lowering the model's performance measures. In a secondary analysis of the 334,564 data instances from Intermountain Healthcare between 2005 and 2018 used to form our model, we employed the extended method to automatically explain the prediction results of our model and recommend tailored interventions. The patient cohort consisted of all patients with asthma who received care at Intermountain Healthcare between 2005 and 2018, and resided in Utah or Idaho as recorded at the visit. Results: Our method explained the prediction results for 89.7{\%} (391/436) of the patients with asthma who, per our model's correct prediction, were likely to incur asthma hospital visits in the subsequent year. Conclusions: This study is the first to demonstrate the feasibility of automatically offering rule-formed explanations for the prediction results of any machine learning model on imbalanced tabular data without lowering the performance measures of the model. After further improvement, our asthma outcome prediction model coupled with the automatic explanation function could be used by clinicians to guide the allocation of limited asthma care management resources and the identification of appropriate interventions. ", doi="10.2196/21965", url="http://medinform.jmir.org/2020/12/e21965/", url="https://doi.org/10.2196/21965", url="http://www.ncbi.nlm.nih.gov/pubmed/33382379" } @Article{info:doi/10.2196/22422, author="Yamada, Tomohide and Yoneoka, Daisuke and Hiraike, Yuta and Hino, Kimihiro and Toyoshiba, Hiroyoshi and Shishido, Akira and Noma, Hisashi and Shojima, Nobuhiro and Yamauchi, Toshimasa", title="Deep Neural Network for Reducing the Screening Workload in Systematic Reviews for Clinical Guidelines: Algorithm Validation Study", journal="J Med Internet Res", year="2020", month="Dec", day="30", volume="22", number="12", pages="e22422", keywords="machine learning; evidence-based medicine; systematic review; meta-analysis; clinical guideline; deep learning; neural network", abstract="Background: Performing systematic reviews is a time-consuming and resource-intensive process. Objective: We investigated whether a machine learning system could perform systematic reviews more efficiently. Methods: All systematic reviews and meta-analyses of interventional randomized controlled trials cited in recent clinical guidelines from the American Diabetes Association, American College of Cardiology, American Heart Association (2 guidelines), and American Stroke Association were assessed. After reproducing the primary screening data set according to the published search strategy of each, we extracted correct articles (those actually reviewed) and incorrect articles (those not reviewed) from the data set. These 2 sets of articles were used to train a neural network--based artificial intelligence engine (Concept Encoder, Fronteo Inc). The primary endpoint was work saved over sampling at 95{\%} recall (WSS@95{\%}). Results: Among 145 candidate reviews of randomized controlled trials, 8 reviews fulfilled the inclusion criteria. For these 8 reviews, the machine learning system significantly reduced the literature screening workload by at least 6-fold versus that of manual screening based on WSS@95{\%}. When machine learning was initiated using 2 correct articles that were randomly selected by a researcher, a 10-fold reduction in workload was achieved versus that of manual screening based on the WSS@95{\%} value, with high sensitivity for eligible studies. The area under the receiver operating characteristic curve increased dramatically every time the algorithm learned a correct article. Conclusions: Concept Encoder achieved a 10-fold reduction of the screening workload for systematic review after learning from 2 randomly selected studies on the target topic. However, few meta-analyses of randomized controlled trials were included. Concept Encoder could facilitate the acquisition of evidence for clinical guidelines. ", doi="10.2196/22422", url="https://www.jmir.org/2020/12/e22422", url="https://doi.org/10.2196/22422", url="http://www.ncbi.nlm.nih.gov/pubmed/33262102" } @Article{info:doi/10.2196/25442, author="Ko, Hoon and Chung, Heewon and Kang, Wu Seong and Park, Chul and Kim, Do Wan and Kim, Seong Eun and Chung, Chi Ryang and Ko, Ryoung Eun and Lee, Hooseok and Seo, Jae Ho and Choi, Tae-Young and Jaimes, Rafael and Kim, Kyung Won and Lee, Jinseok", title="An Artificial Intelligence Model to Predict the Mortality of COVID-19 Patients at Hospital Admission Time Using Routine Blood Samples: Development and Validation of an Ensemble Model", journal="J Med Internet Res", year="2020", month="Dec", day="23", volume="22", number="12", pages="e25442", keywords="COVID-19; artificial intelligence; blood samples; mortality prediction", abstract="Background: COVID-19, which is accompanied by acute respiratory distress, multiple organ failure, and death, has spread worldwide much faster than previously thought. However, at present, it has limited treatments. Objective: To overcome this issue, we developed an artificial intelligence (AI) model of COVID-19, named EDRnet (ensemble learning model based on deep neural network and random forest models), to predict in-hospital mortality using a routine blood sample at the time of hospital admission. Methods: We selected 28 blood biomarkers and used the age and gender information of patients as model inputs. To improve the mortality prediction, we adopted an ensemble approach combining deep neural network and random forest models. We trained our model with a database of blood samples from 361 COVID-19 patients in Wuhan, China, and applied it to 106 COVID-19 patients in three Korean medical institutions. Results: In the testing data sets, EDRnet provided high sensitivity (100{\%}), specificity (91{\%}), and accuracy (92{\%}). To extend the number of patient data points, we developed a web application (BeatCOVID19) where anyone can access the model to predict mortality and can register his or her own blood laboratory results. Conclusions: Our new AI model, EDRnet, accurately predicts the mortality rate for COVID-19. It is publicly available and aims to help health care providers fight COVID-19 and improve patients' outcomes. ", doi="10.2196/25442", url="http://www.jmir.org/2020/12/e25442/", url="https://doi.org/10.2196/25442", url="http://www.ncbi.nlm.nih.gov/pubmed/33301414" } @Article{info:doi/10.2196/23082, author="Geng, Wenye and Qin, Xuanfeng and Yang, Tao and Cong, Zhilei and Wang, Zhuo and Kong, Qing and Tang, Zihui and Jiang, Lin", title="Model-Based Reasoning of Clinical Diagnosis in Integrative Medicine: Real-World Methodological Study of Electronic Medical Records and Natural Language Processing Methods", journal="JMIR Med Inform", year="2020", month="Dec", day="21", volume="8", number="12", pages="e23082", keywords="model-based reasoning; integrative medicine; electronic medical records; natural language processing", abstract="Background: Integrative medicine is a form of medicine that combines practices and treatments from alternative medicine with conventional medicine. The diagnosis in integrative medicine involves the clinical diagnosis based on modern medicine and syndrome pattern diagnosis. Electronic medical records (EMRs) are the systematized collection of patients health information stored in a digital format that can be shared across different health care settings. Although syndrome and sign information or relative information can be extracted from the EMR and content texts can be mapped to computability vectors using natural language processing techniques, application of artificial intelligence techniques to support physicians in medical practices remains a major challenge. Objective: The purpose of this study was to investigate model-based reasoning (MBR) algorithms for the clinical diagnosis in integrative medicine based on EMRs and natural language processing. We also estimated the associations among the factors of sample size, number of syndrome pattern type, and diagnosis in modern medicine using the MBR algorithms. Methods: A total of 14,075 medical records of clinical cases were extracted from the EMRs as the development data set, and an external test data set consisting of 1000 medical records of clinical cases was extracted from independent EMRs. MBR methods based on word embedding, machine learning, and deep learning algorithms were developed for the automatic diagnosis of syndrome pattern in integrative medicine. MBR algorithms combining rule-based reasoning (RBR) were also developed. A standard evaluation metrics consisting of accuracy, precision, recall, and F1 score was used for the performance estimation of the methods. The association analyses were conducted on the sample size, number of syndrome pattern type, and diagnosis of lung diseases with the best algorithms. Results: The Word2Vec convolutional neural network (CNN) MBR algorithms showed high performance (accuracy of 0.9586 in the test data set) in the syndrome pattern diagnosis of lung diseases. The Word2Vec CNN MBR combined with RBR also showed high performance (accuracy of 0.9229 in the test data set). The diagnosis of lung diseases could enhance the performance of the Word2Vec CNN MBR algorithms. Each group sample size and syndrome pattern type affected the performance of these algorithms. Conclusions: The MBR methods based on Word2Vec and CNN showed high performance in the syndrome pattern diagnosis of lung diseases in integrative medicine. The parameters of each group's sample size, syndrome pattern type, and diagnosis of lung diseases were associated with the performance of the methods. Trial Registration: ClinicalTrials.gov NCT03274908; https://clinicaltrials.gov/ct2/show/NCT03274908 ", doi="10.2196/23082", url="http://medinform.jmir.org/2020/12/e23082/", url="https://doi.org/10.2196/23082", url="http://www.ncbi.nlm.nih.gov/pubmed/33346740" } @Article{info:doi/10.2196/19127, author="Safi, Zeineb and Abd-Alrazaq, Alaa and Khalifa, Mohamed and Househ, Mowafa", title="Technical Aspects of Developing Chatbots for Medical Applications: Scoping Review", journal="J Med Internet Res", year="2020", month="Dec", day="18", volume="22", number="12", pages="e19127", keywords="chatbots; conversational agents; medical applications; scoping review; technical aspects", abstract="Background: Chatbots are applications that can conduct natural language conversations with users. In the medical field, chatbots have been developed and used to serve different purposes. They provide patients with timely information that can be critical in some scenarios, such as access to mental health resources. Since the development of the first chatbot, ELIZA, in the late 1960s, much effort has followed to produce chatbots for various health purposes developed in different ways. Objective: This study aimed to explore the technical aspects and development methodologies associated with chatbots used in the medical field to explain the best methods of development and support chatbot development researchers on their future work. Methods: We searched for relevant articles in 8 literature databases (IEEE, ACM, Springer, ScienceDirect, Embase, MEDLINE, PsycINFO, and Google Scholar). We also performed forward and backward reference checking of the selected articles. Study selection was performed by one reviewer, and 50{\%} of the selected studies were randomly checked by a second reviewer. A narrative approach was used for result synthesis. Chatbots were classified based on the different technical aspects of their development. The main chatbot components were identified in addition to the different techniques for implementing each module. Results: The original search returned 2481 publications, of which we identified 45 studies that matched our inclusion and exclusion criteria. The most common language of communication between users and chatbots was English (n=23). We identified 4 main modules: text understanding module, dialog management module, database layer, and text generation module. The most common technique for developing text understanding and dialogue management is the pattern matching method (n=18 and n=25, respectively). The most common text generation is fixed output (n=36). Very few studies relied on generating original output. Most studies kept a medical knowledge base to be used by the chatbot for different purposes throughout the conversations. A few studies kept conversation scripts and collected user data and previous conversations. Conclusions: Many chatbots have been developed for medical use, at an increasing rate. There is a recent, apparent shift in adopting machine learning--based approaches for developing chatbot systems. Further research can be conducted to link clinical outcomes to different chatbot development techniques and technical characteristics. ", doi="10.2196/19127", url="http://www.jmir.org/2020/12/e19127/", url="https://doi.org/10.2196/19127", url="http://www.ncbi.nlm.nih.gov/pubmed/33337337" } @Article{info:doi/10.2196/22649, author="Rashidian, Sina and Abell-Hart, Kayley and Hajagos, Janos and Moffitt, Richard and Lingam, Veena and Garcia, Victor and Tsai, Chao-Wei and Wang, Fusheng and Dong, Xinyu and Sun, Siao and Deng, Jianyuan and Gupta, Rajarsi and Miller, Joshua and Saltz, Joel and Saltz, Mary", title="Detecting Miscoded Diabetes Diagnosis Codes in Electronic Health Records for Quality Improvement: Temporal Deep Learning Approach", journal="JMIR Med Inform", year="2020", month="Dec", day="17", volume="8", number="12", pages="e22649", keywords="electronic health records; diabetes; deep learning", abstract="Background: Diabetes affects more than 30 million patients across the United States. With such a large disease burden, even a small error in classification can be significant. Currently billing codes, assigned at the time of a medical encounter, are the ``gold standard'' reflecting the actual diseases present in an individual, and thus in aggregate reflect disease prevalence in the population. These codes are generated by highly trained coders and by health care providers but are not always accurate. Objective: This work provides a scalable deep learning methodology to more accurately classify individuals with diabetes across multiple health care systems. Methods: We leveraged a long short-term memory-dense neural network (LSTM-DNN) model to identify patients with or without diabetes using data from 5 acute care facilities with 187,187 patients and 275,407 encounters, incorporating data elements including laboratory test results, diagnostic/procedure codes, medications, demographic data, and admission information. Furthermore, a blinded physician panel reviewed discordant cases, providing an estimate of the total impact on the population. Results: When predicting the documented diagnosis of diabetes, our model achieved an 84{\%} F1 score, 96{\%} area under the curve--receiver operating characteristic curve, and 91{\%} average precision on a heterogeneous data set from 5 distinct health facilities. However, in 81{\%} of cases where the model disagreed with the documented phenotype, a blinded physician panel agreed with the model. Taken together, this suggests that 4.3{\%} of our studied population have either missing or improper diabetes diagnosis. Conclusions: This study demonstrates that deep learning methods can improve clinical phenotyping even when patient data are noisy, sparse, and heterogeneous. ", doi="10.2196/22649", url="http://medinform.jmir.org/2020/12/e22649/", url="https://doi.org/10.2196/22649", url="http://www.ncbi.nlm.nih.gov/pubmed/33331828" } @Article{info:doi/10.2196/23939, author="Buchanan, Christine and Howitt, M Lyndsay and Wilson, Rita and Booth, Richard G and Risling, Tracie and Bamford, Megan", title="Predicted Influences of Artificial Intelligence on the Domains of Nursing: Scoping Review", journal="JMIR Nursing", year="2020", month="Dec", day="17", volume="3", number="1", pages="e23939", keywords="nursing; artificial intelligence; machine learning; robotics; patient-centered care; review", abstract="Background: Artificial intelligence (AI) is set to transform the health system, yet little research to date has explored its influence on nurses---the largest group of health professionals. Furthermore, there has been little discussion on how AI will influence the experience of person-centered compassionate care for patients, families, and caregivers. Objective: This review aims to summarize the extant literature on the emerging trends in health technologies powered by AI and their implications on the following domains of nursing: administration, clinical practice, policy, and research. This review summarizes the findings from 3 research questions, examining how these emerging trends might influence the roles and functions of nurses and compassionate nursing care over the next 10 years and beyond. Methods: Using an established scoping review methodology, MEDLINE, CINAHL, EMBASE, PsycINFO, Cochrane Database of Systematic Reviews, Cochrane Central, Education Resources Information Center, Scopus, Web of Science, and ProQuest databases were searched. In addition to the electronic database searches, a targeted website search was performed to access relevant gray literature. Abstracts and full-text studies were independently screened by 2 reviewers using prespecified inclusion and exclusion criteria. Included articles focused on nursing and digital health technologies that incorporate AI. Data were charted using structured forms and narratively summarized. Results: A total of 131 articles were retrieved from the scoping review for the 3 research questions that were the focus of this manuscript (118 from database sources and 13 from targeted websites). Emerging AI technologies discussed in the review included predictive analytics, smart homes, virtual health care assistants, and robots. The results indicated that AI has already begun to influence nursing roles, workflows, and the nurse-patient relationship. In general, robots are not viewed as replacements for nurses. There is a consensus that health technologies powered by AI may have the potential to enhance nursing practice. Consequently, nurses must proactively define how person-centered compassionate care will be preserved in the age of AI. Conclusions: Nurses have a shared responsibility to influence decisions related to the integration of AI into the health system and to ensure that this change is introduced in a way that is ethical and aligns with core nursing values such as compassionate care. Furthermore, nurses must advocate for patient and nursing involvement in all aspects of the design, implementation, and evaluation of these technologies. International Registered Report Identifier (IRRID): RR2-10.2196/17490 ", doi="10.2196/23939", url="https://nursing.jmir.org/2020/1/e23939/", url="https://doi.org/10.2196/23939" } @Article{info:doi/10.2196/24478, author="D'Ambrosia, Christopher and Christensen, Henrik and Aronoff-Spencer, Eliah", title="Computing SARS-CoV-2 Infection Risk From Symptoms, Imaging, and Test Data: Diagnostic Model Development", journal="J Med Internet Res", year="2020", month="Dec", day="16", volume="22", number="12", pages="e24478", keywords="health; informatics; computation; COVID-19; infection; risk; symptom; imaging; diagnostic; probability; machine learning; Bayesian; model", abstract="Background: Assigning meaningful probabilities of SARS-CoV-2 infection risk presents a diagnostic challenge across the continuum of care. Objective: The aim of this study was to develop and clinically validate an adaptable, personalized diagnostic model to assist clinicians in ruling in and ruling out COVID-19 in potential patients. We compared the diagnostic performance of probabilistic, graphical, and machine learning models against a previously published benchmark model. Methods: We integrated patient symptoms and test data using machine learning and Bayesian inference to quantify individual patient risk of SARS-CoV-2 infection. We trained models with 100,000 simulated patient profiles based on 13 symptoms and estimated local prevalence, imaging, and molecular diagnostic performance from published reports. We tested these models with consecutive patients who presented with a COVID-19--compatible illness at the University of California San Diego Medical Center over the course of 14 days starting in March 2020. Results: We included 55 consecutive patients with fever (n=43, 78{\%}) or cough (n=42, 77{\%}) presenting for ambulatory (n=11, 20{\%}) or hospital care (n=44, 80{\%}). In total, 51{\%} (n=28) were female and 49{\%} (n=27) were aged <60 years. Common comorbidities included diabetes (n=12, 22{\%}), hypertension (n=15, 27{\%}), cancer (n=9, 16{\%}), and cardiovascular disease (n=7, 13{\%}). Of these, 69{\%} (n=38) were confirmed via reverse transcription-polymerase chain reaction (RT-PCR) to be positive for SARS-CoV-2 infection, and 20{\%} (n=11) had repeated negative nucleic acid testing and an alternate diagnosis. Bayesian inference network, distance metric learning, and ensemble models discriminated between patients with SARS-CoV-2 infection and alternate diagnoses with sensitivities of 81.6{\%}-84.2{\%}, specificities of 58.8{\%}-70.6{\%}, and accuracies of 61.4{\%}-71.8{\%}. After integrating imaging and laboratory test statistics with the predictions of the Bayesian inference network, changes in diagnostic uncertainty at each step in the simulated clinical evaluation process were highly sensitive to location, symptom, and diagnostic test choices. Conclusions: Decision support models that incorporate symptoms and available test results can help providers diagnose SARS-CoV-2 infection in real-world settings. ", doi="10.2196/24478", url="http://www.jmir.org/2020/12/e24478/", url="https://doi.org/10.2196/24478", url="http://www.ncbi.nlm.nih.gov/pubmed/33301417" } @Article{info:doi/10.2196/18418, author="Kim, Junetae and Lee, Sangwon and Hwang, Eugene and Ryu, Kwang Sun and Jeong, Hanseok and Lee, Jae Wook and Hwangbo, Yul and Choi, Kui Son and Cha, Hyo Soung", title="Limitations of Deep Learning Attention Mechanisms in Clinical Research: Empirical Case Study Based on the Korean Diabetic Disease Setting", journal="J Med Internet Res", year="2020", month="Dec", day="16", volume="22", number="12", pages="e18418", keywords="attention; deep learning; explainable artificial intelligence; uncertainty awareness; Bayesian deep learning; artificial intelligence; health data", abstract="Background: Despite excellent prediction performance, noninterpretability has undermined the value of applying deep-learning algorithms in clinical practice. To overcome this limitation, attention mechanism has been introduced to clinical research as an explanatory modeling method. However, potential limitations of using this attractive method have not been clarified to clinical researchers. Furthermore, there has been a lack of introductory information explaining attention mechanisms to clinical researchers. Objective: The aim of this study was to introduce the basic concepts and design approaches of attention mechanisms. In addition, we aimed to empirically assess the potential limitations of current attention mechanisms in terms of prediction and interpretability performance. Methods: First, the basic concepts and several key considerations regarding attention mechanisms were identified. Second, four approaches to attention mechanisms were suggested according to a two-dimensional framework based on the degrees of freedom and uncertainty awareness. Third, the prediction performance, probability reliability, concentration of variable importance, consistency of attention results, and generalizability of attention results to conventional statistics were assessed in the diabetic classification modeling setting. Fourth, the potential limitations of attention mechanisms were considered. Results: Prediction performance was very high for all models. Probability reliability was high in models with uncertainty awareness. Variable importance was concentrated in several variables when uncertainty awareness was not considered. The consistency of attention results was high when uncertainty awareness was considered. The generalizability of attention results to conventional statistics was poor regardless of the modeling approach. Conclusions: The attention mechanism is an attractive technique with potential to be very promising in the future. However, it may not yet be desirable to rely on this method to assess variable importance in clinical settings. Therefore, along with theoretical studies enhancing attention mechanisms, more empirical studies investigating potential limitations should be encouraged. ", doi="10.2196/18418", url="http://www.jmir.org/2020/12/e18418/", url="https://doi.org/10.2196/18418", url="http://www.ncbi.nlm.nih.gov/pubmed/33325832" } @Article{info:doi/10.2196/20756, author="Abd-Alrazaq, Alaa and Alajlani, Mohannad and Alhuwail, Dari and Schneider, Jens and Al-Kuwari, Saif and Shah, Zubair and Hamdi, Mounir and Househ, Mowafa", title="Artificial Intelligence in the Fight Against COVID-19: Scoping Review", journal="J Med Internet Res", year="2020", month="Dec", day="15", volume="22", number="12", pages="e20756", keywords="artificial intelligence; machine learning; deep learning; natural language processing; coronavirus; COVID-19; 2019-nCoV; SARS-CoV-2", abstract="Background: In December 2019, COVID-19 broke out in Wuhan, China, leading to national and international disruptions in health care, business, education, transportation, and nearly every aspect of our daily lives. Artificial intelligence (AI) has been leveraged amid the COVID-19 pandemic; however, little is known about its use for supporting public health efforts. Objective: This scoping review aims to explore how AI technology is being used during the COVID-19 pandemic, as reported in the literature. Thus, it is the first review that describes and summarizes features of the identified AI techniques and data sets used for their development and validation. Methods: A scoping review was conducted following the guidelines of PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews). We searched the most commonly used electronic databases (eg, MEDLINE, EMBASE, and PsycInfo) between April 10 and 12, 2020. These terms were selected based on the target intervention (ie, AI) and the target disease (ie, COVID-19). Two reviewers independently conducted study selection and data extraction. A narrative approach was used to synthesize the extracted data. Results: We considered 82 studies out of the 435 retrieved studies. The most common use of AI was diagnosing COVID-19 cases based on various indicators. AI was also employed in drug and vaccine discovery or repurposing and for assessing their safety. Further, the included studies used AI for forecasting the epidemic development of COVID-19 and predicting its potential hosts and reservoirs. Researchers used AI for patient outcome--related tasks such as assessing the severity of COVID-19, predicting mortality risk, its associated factors, and the length of hospital stay. AI was used for infodemiology to raise awareness to use water, sanitation, and hygiene. The most prominent AI technique used was convolutional neural network, followed by support vector machine. Conclusions: The included studies showed that AI has the potential to fight against COVID-19. However, many of the proposed methods are not yet clinically accepted. Thus, the most rewarding research will be on methods promising value beyond COVID-19. More efforts are needed for developing standardized reporting protocols or guidelines for studies on AI. ", doi="10.2196/20756", url="http://www.jmir.org/2020/12/e20756/", url="https://doi.org/10.2196/20756", url="http://www.ncbi.nlm.nih.gov/pubmed/33284779" } @Article{info:doi/10.2196/24049, author="Duca Iliescu, Delia Monica", title="The Impact of Artificial Intelligence on the Chess World", journal="JMIR Serious Games", year="2020", month="Dec", day="10", volume="8", number="4", pages="e24049", keywords="artificial intelligence; games; chess; AlphaZero; MuZero; cheat detection; coronavirus", doi="10.2196/24049", url="http://games.jmir.org/2020/4/e24049/", url="https://doi.org/10.2196/24049", url="http://www.ncbi.nlm.nih.gov/pubmed/33300493" } @Article{info:doi/10.2196/18097, author="{\'{C}}irkovi{\'{c}}, Aleksandar", title="Evaluation of Four Artificial Intelligence--Assisted Self-Diagnosis Apps on Three Diagnoses: Two-Year Follow-Up Study", journal="J Med Internet Res", year="2020", month="Dec", day="4", volume="22", number="12", pages="e18097", keywords="artificial intelligence; machine learning; mobile apps; medical diagnosis; mHealth", abstract="Background: Consumer-oriented mobile self-diagnosis apps have been developed using undisclosed algorithms, presumably based on machine learning and other artificial intelligence (AI) technologies. The US Food and Drug Administration now discerns apps with learning AI algorithms from those with stable ones and treats the former as medical devices. To the author's knowledge, no self-diagnosis app testing has been performed in the field of ophthalmology so far. Objective: The objective of this study was to test apps that were previously mentioned in the scientific literature on a set of diagnoses in a deliberate time interval, comparing the results and looking for differences that hint at ``nonlocked'' learning algorithms. Methods: Four apps from the literature were chosen (Ada, Babylon, Buoy, and Your.MD). A set of three ophthalmology diagnoses (glaucoma, retinal tear, dry eye syndrome) representing three levels of urgency was used to simultaneously test the apps' diagnostic efficiency and treatment recommendations in this specialty. Two years was the chosen time interval between the tests (2018 and 2020). Scores were awarded by one evaluating physician using a defined scheme. Results: Two apps (Ada and Your.MD) received significantly higher scores than the other two. All apps either worsened in their results between 2018 and 2020 or remained unchanged at a low level. The variation in the results over time indicates ``nonlocked'' learning algorithms using AI technologies. None of the apps provided correct diagnoses and treatment recommendations for all three diagnoses in 2020. Two apps (Babylon and Your.MD) asked significantly fewer questions than the other two (P<.001). Conclusions: ``Nonlocked'' algorithms are used by self-diagnosis apps. The diagnostic efficiency of the tested apps seems to worsen over time, with some apps being more capable than others. Systematic studies on a wider scale are necessary for health care providers and patients to correctly assess the safety and efficacy of such apps and for correct classification by health care regulating authorities. ", doi="10.2196/18097", url="https://www.jmir.org/2020/12/e18097", url="https://doi.org/10.2196/18097", url="http://www.ncbi.nlm.nih.gov/pubmed/33275113" } @Article{info:doi/10.2196/22996, author="Kazi, Abdul Momin and Qazi, Saad Ahmed and Khawaja, Sadori and Ahsan, Nazia and Ahmed, Rao Moueed and Sameen, Fareeha and Khan Mughal, Muhammad Ayub and Saqib, Muhammad and Ali, Sikander and Kaleemuddin, Hussain and Rauf, Yasir and Raza, Mehreen and Jamal, Saima and Abbasi, Munir and Stergioulas, Lampros K", title="An Artificial Intelligence--Based, Personalized Smartphone App to Improve Childhood Immunization Coverage and Timelines Among Children in Pakistan: Protocol for a Randomized Controlled Trial", journal="JMIR Res Protoc", year="2020", month="Dec", day="4", volume="9", number="12", pages="e22996", keywords="artificial intelligence; AI; routine childhood immunization; EPI; LMICs; mHealth; Pakistan; personalized messages; routine immunization; smartphone apps; vaccine-preventable illnesses", abstract="Background: The immunization uptake rates in Pakistan are much lower than desired. Major reasons include lack of awareness, parental forgetfulness regarding schedules, and misinformation regarding vaccines. In light of the COVID-19 pandemic and distancing measures, routine childhood immunization (RCI) coverage has been adversely affected, as caregivers avoid tertiary care hospitals or primary health centers. Innovative and cost-effective measures must be taken to understand and deal with the issue of low immunization rates. However, only a few smartphone-based interventions have been carried out in low- and middle-income countries (LMICs) to improve RCI. Objective: The primary objectives of this study are to evaluate whether a personalized mobile app can improve children's on-time visits at 10 and 14 weeks of age for RCI as compared with standard care and to determine whether an artificial intelligence model can be incorporated into the app. Secondary objectives are to determine the perceptions and attitudes of caregivers regarding childhood vaccinations and to understand the factors that might influence the effect of a mobile phone--based app on vaccination improvement. Methods: A mixed methods randomized controlled trial was designed with intervention and control arms. The study will be conducted at the Aga Khan University Hospital vaccination center. Caregivers of newborns or infants visiting the center for their children's 6-week vaccination will be recruited. The intervention arm will have access to a smartphone app with text, voice, video, and pictorial messages regarding RCI. This app will be developed based on the findings of the pretrial qualitative component of the study, in addition to no-show study findings, which will explore caregivers' perceptions about RCI and a mobile phone--based app in improving RCI coverage. Results: Pretrial qualitative in-depth interviews were conducted in February 2020. Enrollment of study participants for the randomized controlled trial is in process. Study exit interviews will be conducted at the 14-week immunization visits, provided the caregivers visit the immunization facility at that time, or over the phone when the children are 18 weeks of age. Conclusions: This study will generate useful insights into the feasibility, acceptability, and usability of an Android-based smartphone app for improving RCI in Pakistan and in LMICs. Trial Registration: ClinicalTrials.gov NCT04449107; https://clinicaltrials.gov/ct2/show/NCT04449107 International Registered Report Identifier (IRRID): DERR1-10.2196/22996 ", doi="10.2196/22996", url="https://www.researchprotocols.org/2020/12/e22996", url="https://doi.org/10.2196/22996", url="http://www.ncbi.nlm.nih.gov/pubmed/33274726" } @Article{info:doi/10.2196/24048, author="Plante, Timothy B and Blau, Aaron M and Berg, Adrian N and Weinberg, Aaron S and Jun, Ik C and Tapson, Victor F and Kanigan, Tanya S and Adib, Artur B", title="Development and External Validation of a Machine Learning Tool to Rule Out COVID-19 Among Adults in the Emergency Department Using Routine Blood Tests: A Large, Multicenter, Real-World Study", journal="J Med Internet Res", year="2020", month="Dec", day="2", volume="22", number="12", pages="e24048", keywords="COVID-19; SARS-CoV-2; machine learning; artificial intelligence; electronic medical records; laboratory results; development; validation; testing; model; emergency department", abstract="Background: Conventional diagnosis of COVID-19 with reverse transcription polymerase chain reaction (RT-PCR) testing (hereafter, PCR) is associated with prolonged time to diagnosis and significant costs to run the test. The SARS-CoV-2 virus might lead to characteristic patterns in the results of widely available, routine blood tests that could be identified with machine learning methodologies. Machine learning modalities integrating findings from these common laboratory test results might accelerate ruling out COVID-19 in emergency department patients. Objective: We sought to develop (ie, train and internally validate with cross-validation techniques) and externally validate a machine learning model to rule out COVID 19 using only routine blood tests among adults in emergency departments. Methods: Using clinical data from emergency departments (EDs) from 66 US hospitals before the pandemic (before the end of December 2019) or during the pandemic (March-July 2020), we included patients aged ≥20 years in the study time frame. We excluded those with missing laboratory results. Model training used 2183 PCR-confirmed cases from 43 hospitals during the pandemic; negative controls were 10,000 prepandemic patients from the same hospitals. External validation used 23 hospitals with 1020 PCR-confirmed cases and 171,734 prepandemic negative controls. The main outcome was COVID 19 status predicted using same-day routine laboratory results. Model performance was assessed with area under the receiver operating characteristic (AUROC) curve as well as sensitivity, specificity, and negative predictive value (NPV). Results: Of 192,779 patients included in the training, external validation, and sensitivity data sets (median age decile 50 [IQR 30-60] years, 40.5{\%} male [78,249/192,779]), AUROC for training and external validation was 0.91 (95{\%} CI 0.90-0.92). Using a risk score cutoff of 1.0 (out of 100) in the external validation data set, the model achieved sensitivity of 95.9{\%} and specificity of 41.7{\%}; with a cutoff of 2.0, sensitivity was 92.6{\%} and specificity was 59.9{\%}. At the cutoff of 2.0, the NPVs at a prevalence of 1{\%}, 10{\%}, and 20{\%} were 99.9{\%}, 98.6{\%}, and 97{\%}, respectively. Conclusions: A machine learning model developed with multicenter clinical data integrating commonly collected ED laboratory data demonstrated high rule-out accuracy for COVID-19 status, and might inform selective use of PCR-based testing. ", doi="10.2196/24048", url="https://www.jmir.org/2020/12/e24048", url="https://doi.org/10.2196/24048", url="http://www.ncbi.nlm.nih.gov/pubmed/33226957" } @Article{info:doi/10.2196/23930, author="Maarseveen, Tjardo D and Meinderink, Timo and Reinders, Marcel J T and Knitza, Johannes and Huizinga, Tom W J and Kleyer, Arnd and Simon, David and van den Akker, Erik B and Knevel, Rachel", title="Machine Learning Electronic Health Record Identification of Patients with Rheumatoid Arthritis: Algorithm Pipeline Development and Validation Study", journal="JMIR Med Inform", year="2020", month="Nov", day="30", volume="8", number="11", pages="e23930", keywords="Supervised machine learning; Electronic Health Records; Natural Language Processing; Support Vector Machine; Gradient Boosting; Rheumatoid Arthritis", abstract="Background: Financial codes are often used to extract diagnoses from electronic health records. This approach is prone to false positives. Alternatively, queries are constructed, but these are highly center and language specific. A tantalizing alternative is the automatic identification of patients by employing machine learning on format-free text entries. Objective: The aim of this study was to develop an easily implementable workflow that builds a machine learning algorithm capable of accurately identifying patients with rheumatoid arthritis from format-free text fields in electronic health records. Methods: Two electronic health record data sets were employed: Leiden (n=3000) and Erlangen (n=4771). Using a portion of the Leiden data (n=2000), we compared 6 different machine learning methods and a na{\"i}ve word-matching algorithm using 10-fold cross-validation. Performances were compared using the area under the receiver operating characteristic curve (AUROC) and the area under the precision recall curve (AUPRC), and F1 score was used as the primary criterion for selecting the best method to build a classifying algorithm. We selected the optimal threshold of positive predictive value for case identification based on the output of the best method in the training data. This validation workflow was subsequently applied to a portion of the Erlangen data (n=4293). For testing, the best performing methods were applied to remaining data (Leiden n=1000; Erlangen n=478) for an unbiased evaluation. Results: For the Leiden data set, the word-matching algorithm demonstrated mixed performance (AUROC 0.90; AUPRC 0.33; F1 score 0.55), and 4 methods significantly outperformed word-matching, with support vector machines performing best (AUROC 0.98; AUPRC 0.88; F1 score 0.83). Applying this support vector machine classifier to the test data resulted in a similarly high performance (F1 score 0.81; positive predictive value [PPV] 0.94), and with this method, we could identify 2873 patients with rheumatoid arthritis in less than 7 seconds out of the complete collection of 23,300 patients in the Leiden electronic health record system. For the Erlangen data set, gradient boosting performed best (AUROC 0.94; AUPRC 0.85; F1 score 0.82) in the training set, and applied to the test data, resulted once again in good results (F1 score 0.67; PPV 0.97). Conclusions: We demonstrate that machine learning methods can extract the records of patients with rheumatoid arthritis from electronic health record data with high precision, allowing research on very large populations for limited costs. Our approach is language and center independent and could be applied to any type of diagnosis. We have developed our pipeline into a universally applicable and easy-to-implement workflow to equip centers with their own high-performing algorithm. This allows the creation of observational studies of unprecedented size covering different countries for low cost from already available data in electronic health record systems. ", doi="10.2196/23930", url="http://medinform.jmir.org/2020/11/e23930/", url="https://doi.org/10.2196/23930", url="http://www.ncbi.nlm.nih.gov/pubmed/33252349" } @Article{info:doi/10.2196/20549, author="Morse, Keith E and Ostberg, Nicolai P and Jones, Veena G and Chan, Albert S", title="Use Characteristics and Triage Acuity of a Digital Symptom Checker in a Large Integrated Health System: Population-Based Descriptive Study", journal="J Med Internet Res", year="2020", month="Nov", day="30", volume="22", number="11", pages="e20549", keywords="symptom checker; chatbot; computer-assisted diagnosis; diagnostic self-evaluation; artificial intelligence; self-care; COVID-19", abstract="Background: Pressure on the US health care system has been increasing due to a combination of aging populations, rising health care expenditures, and most recently, the COVID-19 pandemic. Responses to this pressure are hindered in part by reliance on a limited supply of highly trained health care professionals, creating a need for scalable technological solutions. Digital symptom checkers are artificial intelligence--supported software tools that use a conversational ``chatbot'' format to support rapid diagnosis and consistent triage. The COVID-19 pandemic has brought new attention to these tools due to the need to avoid face-to-face contact and preserve urgent care capacity. However, evidence-based deployment of these chatbots requires an understanding of user demographics and associated triage recommendations generated by a large general population. Objective: In this study, we evaluate the user demographics and levels of triage acuity provided by a symptom checker chatbot deployed in partnership with a large integrated health system in the United States. Methods: This population-based descriptive study included all web-based symptom assessments completed on the website and patient portal of the Sutter Health system (24 hospitals in Northern California) from April 24, 2019, to February 1, 2020. User demographics were compared to relevant US Census population data. Results: A total of 26,646 symptom assessments were completed during the study period. Most assessments (17,816/26,646, 66.9{\%}) were completed by female users. The mean user age was 34.3 years (SD 14.4 years), compared to a median age of 37.3 years of the general population. The most common initial symptom was abdominal pain (2060/26,646, 7.7{\%}). A substantial number of assessments (12,357/26,646, 46.4{\%}) were completed outside of typical physician office hours. Most users were advised to seek medical care on the same day (7299/26,646, 27.4{\%}) or within 2-3 days (6301/26,646, 23.6{\%}). Over a quarter of the assessments indicated a high degree of urgency (7723/26,646, 29.0{\%}). Conclusions: Users of the symptom checker chatbot were broadly representative of our patient population, although they skewed toward younger and female users. The triage recommendations were comparable to those of nurse-staffed telephone triage lines. Although the emergence of COVID-19 has increased the interest in remote medical assessment tools, it is important to take an evidence-based approach to their deployment. ", doi="10.2196/20549", url="https://www.jmir.org/2020/11/e20549", url="https://doi.org/10.2196/20549", url="http://www.ncbi.nlm.nih.gov/pubmed/33170799" } @Article{info:doi/10.2196/19416, author="Cheng, Chi-Tung and Chen, Chih-Chi and Cheng, Fu-Jen and Chen, Huan-Wu and Su, Yi-Siang and Yeh, Chun-Nan and Chung, I-Fang and Liao, Chien-Hung", title="A Human-Algorithm Integration System for Hip Fracture Detection on Plain Radiography: System Development and Validation Study", journal="JMIR Med Inform", year="2020", month="Nov", day="27", volume="8", number="11", pages="e19416", keywords="hip fracture; neural network; computer; artificial intelligence; algorithms; human augmentation; deep learning; diagnosis", abstract="Background: Hip fracture is the most common type of fracture in elderly individuals. Numerous deep learning (DL) algorithms for plain pelvic radiographs (PXRs) have been applied to improve the accuracy of hip fracture diagnosis. However, their efficacy is still undetermined. Objective: The objective of this study is to develop and validate a human-algorithm integration (HAI) system to improve the accuracy of hip fracture diagnosis in a real clinical environment. Methods: The HAI system with hip fracture detection ability was developed using a deep learning algorithm trained on trauma registry data and 3605 PXRs from August 2008 to December 2016. To compare their diagnostic performance before and after HAI system assistance using an independent testing dataset, 34 physicians were recruited. We analyzed the physicians' accuracy, sensitivity, specificity, and agreement with the algorithm; we also performed subgroup analyses according to physician specialty and experience. Furthermore, we applied the HAI system in the emergency departments of different hospitals to validate its value in the real world. Results: With the support of the algorithm, which achieved 91{\%} accuracy, the diagnostic performance of physicians was significantly improved in the independent testing dataset, as was revealed by the sensitivity (physician alone, median 95{\%}; HAI, median 99{\%}; P<.001), specificity (physician alone, median 90{\%}; HAI, median 95{\%}; P<.001), accuracy (physician alone, median 90{\%}; HAI, median 96{\%}; P<.001), and human-algorithm agreement [physician alone $\kappa$, median 0.69 (IQR 0.63-0.74); HAI $\kappa$, median 0.80 (IQR 0.76-0.82); P<.001. With the help of the HAI system, the primary physicians showed significant improvement in their diagnostic performance to levels comparable to those of consulting physicians, and both the experienced and less-experienced physicians benefited from the HAI system. After the HAI system had been applied in 3 departments for 5 months, 587 images were examined. The sensitivity, specificity, and accuracy of the HAI system for detecting hip fractures were 97{\%}, 95.7{\%}, and 96.08{\%}, respectively. Conclusions: HAI currently impacts health care, and integrating this technology into emergency departments is feasible. The developed HAI system can enhance physicians' hip fracture diagnostic performance. ", doi="10.2196/19416", url="http://medinform.jmir.org/2020/11/e19416/", url="https://doi.org/10.2196/19416", url="http://www.ncbi.nlm.nih.gov/pubmed/33245279" } @Article{info:doi/10.2196/23472, author="Kang, Eugene Yu-Chuan and Hsieh, Yi-Ting and Li, Chien-Hung and Huang, Yi-Jin and Kuo, Chang-Fu and Kang, Je-Ho and Chen, Kuan-Jen and Lai, Chi-Chun and Wu, Wei-Chi and Hwang, Yih-Shiou", title="Deep Learning--Based Detection of Early Renal Function Impairment Using Retinal Fundus Images: Model Development and Validation", journal="JMIR Med Inform", year="2020", month="Nov", day="26", volume="8", number="11", pages="e23472", keywords="deep learning; renal function; retinal fundus image; diabetes; renal; kidney; retinal; eye; imaging; impairment; detection; development; validation; model", abstract="Background: Retinal imaging has been applied for detecting eye diseases and cardiovascular risks using deep learning--based methods. Furthermore, retinal microvascular and structural changes were found in renal function impairments. However, a deep learning--based method using retinal images for detecting early renal function impairment has not yet been well studied. Objective: This study aimed to develop and evaluate a deep learning model for detecting early renal function impairment using retinal fundus images. Methods: This retrospective study enrolled patients who underwent renal function tests with color fundus images captured at any time between January 1, 2001, and August 31, 2019. A deep learning model was constructed to detect impaired renal function from the images. Early renal function impairment was defined as estimated glomerular filtration rate <90 mL/min/1.73 m2. Model performance was evaluated with respect to the receiver operating characteristic curve and area under the curve (AUC). Results: In total, 25,706 retinal fundus images were obtained from 6212 patients for the study period. The images were divided at an 8:1:1 ratio. The training, validation, and testing data sets respectively contained 20,787, 2189, and 2730 images from 4970, 621, and 621 patients. There were 10,686 and 15,020 images determined to indicate normal and impaired renal function, respectively. The AUC of the model was 0.81 in the overall population. In subgroups stratified by serum hemoglobin A1c (HbA1c) level, the AUCs were 0.81, 0.84, 0.85, and 0.87 for the HbA1c levels of ≤6.5{\%}, >6.5{\%}, >7.5{\%}, and >10{\%}, respectively. Conclusions: The deep learning model in this study enables the detection of early renal function impairment using retinal fundus images. The model was more accurate for patients with elevated serum HbA1c levels. ", doi="10.2196/23472", url="http://medinform.jmir.org/2020/11/e23472/", url="https://doi.org/10.2196/23472", url="http://www.ncbi.nlm.nih.gov/pubmed/33139242" } @Article{info:doi/10.2196/18563, author="Owais, Muhammad and Arsalan, Muhammad and Mahmood, Tahir and Kang, Jin Kyu and Park, Kang Ryoung", title="Automated Diagnosis of Various Gastrointestinal Lesions Using a Deep Learning--Based Classification and Retrieval Framework With a Large Endoscopic Database: Model Development and Validation", journal="J Med Internet Res", year="2020", month="Nov", day="26", volume="22", number="11", pages="e18563", keywords="artificial intelligence; endoscopic video retrieval; content-based medical image retrieval; polyp detection; deep learning; computer-aided diagnosis", abstract="Background: The early diagnosis of various gastrointestinal diseases can lead to effective treatment and reduce the risk of many life-threatening conditions. Unfortunately, various small gastrointestinal lesions are undetectable during early-stage examination by medical experts. In previous studies, various deep learning--based computer-aided diagnosis tools have been used to make a significant contribution to the effective diagnosis and treatment of gastrointestinal diseases. However, most of these methods were designed to detect a limited number of gastrointestinal diseases, such as polyps, tumors, or cancers, in a specific part of the human gastrointestinal tract. Objective: This study aimed to develop a comprehensive computer-aided diagnosis tool to assist medical experts in diagnosing various types of gastrointestinal diseases. Methods: Our proposed framework comprises a deep learning--based classification network followed by a retrieval method. In the first step, the classification network predicts the disease type for the current medical condition. Then, the retrieval part of the framework shows the relevant cases (endoscopic images) from the previous database. These past cases help the medical expert validate the current computer prediction subjectively, which ultimately results in better diagnosis and treatment. Results: All the experiments were performed using 2 endoscopic data sets with a total of 52,471 frames and 37 different classes. The optimal performances obtained by our proposed method in accuracy, F1 score, mean average precision, and mean average recall were 96.19{\%}, 96.99{\%}, 98.18{\%}, and 95.86{\%}, respectively. The overall performance of our proposed diagnostic framework substantially outperformed state-of-the-art methods. Conclusions: This study provides a comprehensive computer-aided diagnosis framework for identifying various types of gastrointestinal diseases. The results show the superiority of our proposed method over various other recent methods and illustrate its potential for clinical diagnosis and treatment. Our proposed network can be applicable to other classification domains in medical imaging, such as computed tomography scans, magnetic resonance imaging, and ultrasound sequences. ", doi="10.2196/18563", url="http://www.jmir.org/2020/11/e18563/", url="https://doi.org/10.2196/18563", url="http://www.ncbi.nlm.nih.gov/pubmed/33242010" } @Article{info:doi/10.2196/20031, author="Tsai, Vincent FS and Zhuang, Bin and Pong, Yuan-Hung and Hsieh, Ju-Ton and Chang, Hong-Chiang", title="Web- and Artificial Intelligence--Based Image Recognition For Sperm Motility Analysis: Verification Study", journal="JMIR Med Inform", year="2020", month="Nov", day="19", volume="8", number="11", pages="e20031", keywords="Male infertility; semen analysis; home sperm test; smartphone; artificial intelligence; cloud computing; telemedicine", abstract="Background: Human sperm quality fluctuates over time. Therefore, it is crucial for couples preparing for natural pregnancy to monitor sperm motility. Objective: This study verified the performance of an artificial intelligence--based image recognition and cloud computing sperm motility testing system (Bemaner, Createcare) composed of microscope and microfluidic modules and designed to adapt to different types of smartphones. Methods: Sperm videos were captured and uploaded to the cloud with an app. Analysis of sperm motility was performed by an artificial intelligence--based image recognition algorithm then results were displayed. According to the number of motile sperm in the vision field, 47 (deidentified) videos of sperm were scored using 6 grades (0-5) by a male-fertility expert with 10 years of experience. Pearson product-moment correlation was calculated between the grades and the results (concentration of total sperm, concentration of motile sperm, and motility percentage) computed by the system. Results: Good correlation was demonstrated between the grades and results computed by the system for concentration of total sperm (r=0.65, P<.001), concentration of motile sperm (r=0.84, P<.001), and motility percentage (r=0.90, P<.001). Conclusions: This smartphone-based sperm motility test (Bemaner) accurately measures motility-related parameters and could potentially be applied toward the following fields: male infertility detection, sperm quality test during preparation for pregnancy, and infertility treatment monitoring. With frequent at-home testing, more data can be collected to help make clinical decisions and to conduct epidemiological research. ", doi="10.2196/20031", url="http://medinform.jmir.org/2020/11/e20031/", url="https://doi.org/10.2196/20031", url="http://www.ncbi.nlm.nih.gov/pubmed/33211025" } @Article{info:doi/10.2196/24163, author="Islam, Md Mohaimenul and Yang, Hsuan-Chia and Poly, Tahmina Nasrin and Li, Yu-Chuan Jack", title="Development of an Artificial Intelligence--Based Automated Recommendation System for Clinical Laboratory Tests: Retrospective Analysis of the National Health Insurance Database", journal="JMIR Med Inform", year="2020", month="Nov", day="18", volume="8", number="11", pages="e24163", keywords="artificial intelligence; deep learning; clinical decision-support system; laboratory test; patient safety", abstract="Background: Laboratory tests are considered an essential part of patient safety as patients' screening, diagnosis, and follow-up are solely based on laboratory tests. Diagnosis of patients could be wrong, missed, or delayed if laboratory tests are performed erroneously. However, recognizing the value of correct laboratory test ordering remains underestimated by policymakers and clinicians. Nowadays, artificial intelligence methods such as machine learning and deep learning (DL) have been extensively used as powerful tools for pattern recognition in large data sets. Therefore, developing an automated laboratory test recommendation tool using available data from electronic health records (EHRs) could support current clinical practice. Objective: The objective of this study was to develop an artificial intelligence--based automated model that can provide laboratory tests recommendation based on simple variables available in EHRs. Methods: A retrospective analysis of the National Health Insurance database between January 1, 2013, and December 31, 2013, was performed. We reviewed the record of all patients who visited the cardiology department at least once and were prescribed laboratory tests. The data set was split into training and testing sets (80:20) to develop the DL model. In the internal validation, 25{\%} of data were randomly selected from the training set to evaluate the performance of this model. Results: We used the area under the receiver operating characteristic curve, precision, recall, and hamming loss as comparative measures. A total of 129,938 prescriptions were used in our model. The DL-based automated recommendation system for laboratory tests achieved a significantly higher area under the receiver operating characteristic curve (AUROCmacro and AUROCmicro of 0.76 and 0.87, respectively). Using a low cutoff, the model identified appropriate laboratory tests with 99{\%} sensitivity. Conclusions: The developed artificial intelligence model based on DL exhibited good discriminative capability for predicting laboratory tests using routinely collected EHR data. Utilization of DL approaches can facilitate optimal laboratory test selection for patients, which may in turn improve patient safety. However, future study is recommended to assess the cost-effectiveness for implementing this model in real-world clinical settings. ", doi="10.2196/24163", url="https://medinform.jmir.org/2020/11/e24163", url="https://doi.org/10.2196/24163", url="http://www.ncbi.nlm.nih.gov/pubmed/33206057" } @Article{info:doi/10.2196/23315, author="von Wedel, Philip and Hagist, Christian", title="Economic Value of Data and Analytics for Health Care Providers: Hermeneutic Systematic Literature Review", journal="J Med Internet Res", year="2020", month="Nov", day="18", volume="22", number="11", pages="e23315", keywords="digital health; health information technology; healthcare provider economics; electronic health records; data analytics; artificial intelligence", abstract="Background: The benefits of data and analytics for health care systems and single providers is an increasingly investigated field in digital health literature. Electronic health records (EHR), for example, can improve quality of care. Emerging analytics tools based on artificial intelligence show the potential to assist physicians in day-to-day workflows. Yet, single health care providers also need information regarding the economic impact when deciding on potential adoption of these tools. Objective: This paper examines the question of whether data and analytics provide economic advantages or disadvantages for health care providers. The goal is to provide a comprehensive overview including a variety of technologies beyond computer-based patient records. Ultimately, findings are also intended to determine whether economic barriers for adoption by providers could exist. Methods: A systematic literature search of the PubMed and Google Scholar online databases was conducted, following the hermeneutic methodology that encourages iterative search and interpretation cycles. After applying inclusion and exclusion criteria to 165 initially identified studies, 50 were included for qualitative synthesis and topic-based clustering. Results: The review identified 5 major technology categories, namely EHRs (n=30), computerized clinical decision support (n=8), advanced analytics (n=5), business analytics (n=5), and telemedicine (n=2). Overall, 62{\%} (31/50) of the reviewed studies indicated a positive economic impact for providers either via direct cost or revenue effects or via indirect efficiency or productivity improvements. When differentiating between categories, however, an ambiguous picture emerged for EHR, whereas analytics technologies like computerized clinical decision support and advanced analytics predominantly showed economic benefits. Conclusions: The research question of whether data and analytics create economic benefits for health care providers cannot be answered uniformly. The results indicate ambiguous effects for EHRs, here representing data, and mainly positive effects for the significantly less studied analytics field. The mixed results regarding EHRs can create an economic barrier for adoption by providers. This barrier can translate into a bottleneck to positive economic effects of analytics technologies relying on EHR data. Ultimately, more research on economic effects of technologies other than EHRs is needed to generate a more reliable evidence base. ", doi="10.2196/23315", url="http://www.jmir.org/2020/11/e23315/", url="https://doi.org/10.2196/23315", url="http://www.ncbi.nlm.nih.gov/pubmed/33206056" } @Article{info:doi/10.2196/19805, author="Gao, Yang and Xiao, Xiong and Han, Bangcheng and Li, Guilin and Ning, Xiaolin and Wang, Defeng and Cai, Weidong and Kikinis, Ron and Berkovsky, Shlomo and Di Ieva, Antonio and Zhang, Liwei and Ji, Nan and Liu, Sidong", title="Deep Learning Methodology for Differentiating Glioma Recurrence From Radiation Necrosis Using Multimodal Magnetic Resonance Imaging: Algorithm Development and Validation", journal="JMIR Med Inform", year="2020", month="Nov", day="17", volume="8", number="11", pages="e19805", keywords="recurrent tumor; radiation necrosis; progression; pseudoprogression; multimodal MRI; deep learning", abstract="Background: The radiological differential diagnosis between tumor recurrence and radiation-induced necrosis (ie, pseudoprogression) is of paramount importance in the management of glioma patients. Objective: This research aims to develop a deep learning methodology for automated differentiation of tumor recurrence from radiation necrosis based on routine magnetic resonance imaging (MRI) scans. Methods: In this retrospective study, 146 patients who underwent radiation therapy after glioma resection and presented with suspected recurrent lesions at the follow-up MRI examination were selected for analysis. Routine MRI scans were acquired from each patient, including T1, T2, and gadolinium-contrast-enhanced T1 sequences. Of those cases, 96 (65.8{\%}) were confirmed as glioma recurrence on postsurgical pathological examination, while 50 (34.2{\%}) were diagnosed as necrosis. A light-weighted deep neural network (DNN) (ie, efficient radionecrosis neural network [ERN-Net]) was proposed to learn radiological features of gliomas and necrosis from MRI scans. Sensitivity, specificity, accuracy, and area under the curve (AUC) were used to evaluate performance of the model in both image-wise and subject-wise classifications. Preoperative diagnostic performance of the model was also compared to that of the state-of-the-art DNN models and five experienced neurosurgeons. Results: DNN models based on multimodal MRI outperformed single-modal models. ERN-Net achieved the highest AUC in both image-wise (0.915) and subject-wise (0.958) classification tasks. The evaluated DNN models achieved an average sensitivity of 0.947 (SD 0.033), specificity of 0.817 (SD 0.075), and accuracy of 0.903 (SD 0.026), which were significantly better than the tested neurosurgeons (P=.02 in sensitivity and P<.001 in specificity and accuracy). Conclusions: Deep learning offers a useful computational tool for the differential diagnosis between recurrent gliomas and necrosis. The proposed ERN-Net model, a simple and effective DNN model, achieved excellent performance on routine MRI scans and showed a high clinical applicability. ", doi="10.2196/19805", url="http://medinform.jmir.org/2020/11/e19805/", url="https://doi.org/10.2196/19805", url="http://www.ncbi.nlm.nih.gov/pubmed/33200991" } @Article{info:doi/10.2196/15185, author="Koman, Jason and Fauvelle, Khristina and Schuck, St{\'e}phane and Texier, Nathalie and Mebarki, Adel", title="Physicians' Perceptions of the Use of a Chatbot for Information Seeking: Qualitative Study", journal="J Med Internet Res", year="2020", month="Nov", day="10", volume="22", number="11", pages="e15185", keywords="health; digital health; innovation; conversational agent; decision support system; qualitative research; chatbot; bot; medical drugs; prescription; risk minimization measures", abstract="Background: Seeking medical information can be an issue for physicians. In the specific context of medical practice, chatbots are hypothesized to present additional value for providing information quickly, particularly as far as drug risk minimization measures are concerned. Objective: This qualitative study aimed to elicit physicians' perceptions of a pilot version of a chatbot used in the context of drug information and risk minimization measures. Methods: General practitioners and specialists were recruited across France to participate in individual semistructured interviews. Interviews were recorded, transcribed, and analyzed using a horizontal thematic analysis approach. Results: Eight general practitioners and 2 specialists participated. The tone and ergonomics of the pilot version were appreciated by physicians. However, all participants emphasized the importance of getting exhaustive, trustworthy answers when interacting with a chatbot. Conclusions: The chatbot was perceived as a useful and innovative tool that could easily be integrated into routine medical practice and could help health professionals when seeking information on drug and risk minimization measures. ", doi="10.2196/15185", url="http://www.jmir.org/2020/11/e15185/", url="https://doi.org/10.2196/15185", url="http://www.ncbi.nlm.nih.gov/pubmed/33170134" } @Article{info:doi/10.2196/21659, author="Roosan, Don and Chok, Jay and Karim, Mazharul and Law, Anandi V and Baskys, Andrius and Hwang, Angela and Roosan, Moom R", title="Artificial Intelligence--Powered Smartphone App to Facilitate Medication Adherence: Protocol for a Human Factors Design Study", journal="JMIR Res Protoc", year="2020", month="Nov", day="9", volume="9", number="11", pages="e21659", keywords="artificial intelligence; smartphone app; patient cognition; complex medication information; medication adherence; machine learning; mobile phone", abstract="Background: Medication Guides consisting of crucial interactions and side effects are extensive and complex. Due to the exhaustive information, patients do not retain the necessary medication information, which can result in hospitalizations and medication nonadherence. A gap exists in understanding patients' cognition of managing complex medication information. However, advancements in technology and artificial intelligence (AI) allow us to understand patient cognitive processes to design an app to better provide important medication information to patients. Objective: Our objective is to improve the design of an innovative AI- and human factor--based interface that supports patients' medication information comprehension that could potentially improve medication adherence. Methods: This study has three aims. Aim 1 has three phases: (1) an observational study to understand patient perception of fear and biases regarding medication information, (2) an eye-tracking study to understand the attention locus for medication information, and (3) a psychological refractory period (PRP) paradigm study to understand functionalities. Observational data will be collected, such as audio and video recordings, gaze mapping, and time from PRP. A total of 50 patients, aged 18-65 years, who started at least one new medication, for which we developed visualization information, and who have a cognitive status of 34 during cognitive screening using the TICS-M test and health literacy level will be included in this aim of the study. In Aim 2, we will iteratively design and evaluate an AI-powered medication information visualization interface as a smartphone app with the knowledge gained from each component of Aim 1. The interface will be assessed through two usability surveys. A total of 300 patients, aged 18-65 years, with diabetes, cardiovascular diseases, or mental health disorders, will be recruited for the surveys. Data from the surveys will be analyzed through exploratory factor analysis. In Aim 3, in order to test the prototype, there will be a two-arm study design. This aim will include 900 patients, aged 18-65 years, with internet access, without any cognitive impairment, and with at least two medications. Patients will be sequentially randomized. Three surveys will be used to assess the primary outcome of medication information comprehension and the secondary outcome of medication adherence at 12 weeks. Results: Preliminary data collection will be conducted in 2021, and results are expected to be published in 2022. Conclusions: This study will lead the future of AI-based, innovative, digital interface design and aid in improving medication comprehension, which may improve medication adherence. The results from this study will also open up future research opportunities in understanding how patients manage complex medication information and will inform the format and design for innovative, AI-powered digital interfaces for Medication Guides. International Registered Report Identifier (IRRID): PRR1-10.2196/21659 ", doi="10.2196/21659", url="http://www.researchprotocols.org/2020/11/e21659/", url="https://doi.org/10.2196/21659", url="http://www.ncbi.nlm.nih.gov/pubmed/33164898" } @Article{info:doi/10.2196/21252, author="Spasic, Irena and Button, Kate", title="Patient Triage by Topic Modeling of Referral Letters: Feasibility Study", journal="JMIR Med Inform", year="2020", month="Nov", day="6", volume="8", number="11", pages="e21252", keywords="natural language processing; machine learning; data science; medical informatics; computer-assisted decision making", abstract="Background: Musculoskeletal conditions are managed within primary care, but patients can be referred to secondary care if a specialist opinion is required. The ever-increasing demand for health care resources emphasizes the need to streamline care pathways with the ultimate aim of ensuring that patients receive timely and optimal care. Information contained in referral letters underpins the referral decision-making process but is yet to be explored systematically for the purposes of treatment prioritization for musculoskeletal conditions. Objective: This study aims to explore the feasibility of using natural language processing and machine learning to automate the triage of patients with musculoskeletal conditions by analyzing information from referral letters. Specifically, we aim to determine whether referral letters can be automatically assorted into latent topics that are clinically relevant, that is, considered relevant when prescribing treatments. Here, clinical relevance is assessed by posing 2 research questions. Can latent topics be used to automatically predict treatment? Can clinicians interpret latent topics as cohorts of patients who share common characteristics or experiences such as medical history, demographics, and possible treatments? Methods: We used latent Dirichlet allocation to model each referral letter as a finite mixture over an underlying set of topics and model each topic as an infinite mixture over an underlying set of topic probabilities. The topic model was evaluated in the context of automating patient triage. Given a set of treatment outcomes, a binary classifier was trained for each outcome using previously extracted topics as the input features of the machine learning algorithm. In addition, a qualitative evaluation was performed to assess the human interpretability of topics. Results: The prediction accuracy of binary classifiers outperformed the stratified random classifier by a large margin, indicating that topic modeling could be used to predict the treatment, thus effectively supporting patient triage. The qualitative evaluation confirmed the high clinical interpretability of the topic model. Conclusions: The results established the feasibility of using natural language processing and machine learning to automate triage of patients with knee or hip pain by analyzing information from their referral letters. ", doi="10.2196/21252", url="https://medinform.jmir.org/2020/11/e21252", url="https://doi.org/10.2196/21252", url="http://www.ncbi.nlm.nih.gov/pubmed/33155985" } @Article{info:doi/10.2196/20251, author="Almusharraf, Fahad and Rose, Jonathan and Selby, Peter", title="Engaging Unmotivated Smokers to Move Toward Quitting: Design of Motivational Interviewing--Based Chatbot Through Iterative Interactions", journal="J Med Internet Res", year="2020", month="Nov", day="3", volume="22", number="11", pages="e20251", keywords="smoking cessation; motivational interviewing; chatbot; natural language processing", abstract="Background: At any given time, most smokers in a population are ambivalent with no motivation to quit. Motivational interviewing (MI) is an evidence-based technique that aims to elicit change in ambivalent smokers. MI practitioners are scarce and expensive, and smokers are difficult to reach. Smokers are potentially reachable through the web, and if an automated chatbot could emulate an MI conversation, it could form the basis of a low-cost and scalable intervention motivating smokers to quit. Objective: The primary goal of this study is to design, train, and test an automated MI-based chatbot capable of eliciting reflection in a conversation with cigarette smokers. This study describes the process of collecting training data to improve the chatbot's ability to generate MI-oriented responses, particularly reflections and summary statements. The secondary goal of this study is to observe the effects on participants through voluntary feedback given after completing a conversation with the chatbot. Methods: An interdisciplinary collaboration between an MI expert and experts in computer engineering and natural language processing (NLP) co-designed the conversation and algorithms underlying the chatbot. A sample of 121 adult cigarette smokers in 11 successive groups were recruited from a web-based platform for a single-arm prospective iterative design study. The chatbot was designed to stimulate reflections on the pros and cons of smoking using MI's running head start technique. Participants were also asked to confirm the chatbot's classification of their free-form responses to measure the classification accuracy of the underlying NLP models. Each group provided responses that were used to train the chatbot for the next group. Results: A total of 6568 responses from 121 participants in 11 successive groups over 14 weeks were received. From these responses, we were able to isolate 21 unique reasons for and against smoking and the relative frequency of each. The gradual collection of responses as inputs and smoking reasons as labels over the 11 iterations improved the F1 score of the classification within the chatbot from 0.63 in the first group to 0.82 in the final group. The mean time spent by each participant interacting with the chatbot was 21.3 (SD 14.0) min (minimum 6.4 and maximum 89.2). We also found that 34.7{\%} (42/121) of participants enjoyed the interaction with the chatbot, and 8.3{\%} (10/121) of participants noted explicit smoking cessation benefits from the conversation in voluntary feedback that did not solicit this explicitly. Conclusions: Recruiting ambivalent smokers through the web is a viable method to train a chatbot to increase accuracy in reflection and summary statements, the building blocks of MI. A new set of 21 smoking reasons (both for and against) has been identified. Initial feedback from smokers on the experience shows promise toward using it in an intervention. ", doi="10.2196/20251", url="https://www.jmir.org/2020/11/e20251", url="https://doi.org/10.2196/20251", url="http://www.ncbi.nlm.nih.gov/pubmed/33141095" } @Article{info:doi/10.2196/19548, author="{\v{C}}uki{\'{c}}, Milena and L{\'o}pez, Victoria and Pav{\'o}n, Juan", title="Classification of Depression Through Resting-State Electroencephalogram as a Novel Practice in Psychiatry: Review", journal="J Med Internet Res", year="2020", month="Nov", day="3", volume="22", number="11", pages="e19548", keywords="computational psychiatry; physiological complexity; machine learning; theory-driven approach; resting-state EEG; personalized medicine; computational neuroscience; unwarranted optimism", abstract="Background: Machine learning applications in health care have increased considerably in the recent past, and this review focuses on an important application in psychiatry related to the detection of depression. Since the advent of computational psychiatry, research based on functional magnetic resonance imaging has yielded remarkable results, but these tools tend to be too expensive for everyday clinical use. Objective: This review focuses on an affordable data-driven approach based on electroencephalographic recordings. Web-based applications via public or private cloud-based platforms would be a logical next step. We aim to compare several different approaches to the detection of depression from electroencephalographic recordings using various features and machine learning models. Methods: To detect depression, we reviewed published detection studies based on resting-state electroencephalogram with final machine learning, and to predict therapy outcomes, we reviewed a set of interventional studies using some form of stimulation in their methodology. Results: We reviewed 14 detection studies and 12 interventional studies published between 2008 and 2019. As direct comparison was not possible due to the large diversity of theoretical approaches and methods used, we compared them based on the steps in analysis and accuracies yielded. In addition, we compared possible drawbacks in terms of sample size, feature extraction, feature selection, classification, internal and external validation, and possible unwarranted optimism and reproducibility. In addition, we suggested desirable practices to avoid misinterpretation of results and optimism. Conclusions: This review shows the need for larger data sets and more systematic procedures to improve the use of the solution for clinical diagnostics. Therefore, regulation of the pipeline and standard requirements for methodology used should become mandatory to increase the reliability and accuracy of the complete methodology for it to be translated to modern psychiatry. ", doi="10.2196/19548", url="https://www.jmir.org/2020/11/e19548", url="https://doi.org/10.2196/19548", url="http://www.ncbi.nlm.nih.gov/pubmed/33141088" } @Article{info:doi/10.2196/18273, author="Zhou, Sicheng and Zhao, Yunpeng and Bian, Jiang and Haynos, Ann F and Zhang, Rui", title="Exploring Eating Disorder Topics on Twitter: Machine Learning Approach", journal="JMIR Med Inform", year="2020", month="Oct", day="30", volume="8", number="10", pages="e18273", keywords="eating disorders; topic modeling; text classification; social media; public health", abstract="Background: Eating disorders (EDs) are a group of mental illnesses that have an adverse effect on both mental and physical health. As social media platforms (eg, Twitter) have become an important data source for public health research, some studies have qualitatively explored the ways in which EDs are discussed on these platforms. Initial results suggest that such research offers a promising method for further understanding this group of diseases. Nevertheless, an efficient computational method is needed to further identify and analyze tweets relevant to EDs on a larger scale. Objective: This study aims to develop and validate a machine learning--based classifier to identify tweets related to EDs and to explore factors (ie, topics) related to EDs using a topic modeling method. Methods: We collected potential ED-relevant tweets using keywords from previous studies and annotated these tweets into different groups (ie, ED relevant vs irrelevant and then promotional information vs laypeople discussion). Several supervised machine learning methods, such as convolutional neural network (CNN), long short-term memory (LSTM), support vector machine, and na{\"i}ve Bayes, were developed and evaluated using annotated data. We used the classifier with the best performance to identify ED-relevant tweets and applied a topic modeling method---Correlation Explanation (CorEx)---to analyze the content of the identified tweets. To validate these machine learning results, we also collected a cohort of ED-relevant tweets on the basis of manually curated rules. Results: A total of 123,977 tweets were collected during the set period. We randomly annotated 2219 tweets for developing the machine learning classifiers. We developed a CNN-LSTM classifier to identify ED-relevant tweets published by laypeople in 2 steps: first relevant versus irrelevant (F1 score=0.89) and then promotional versus published by laypeople (F1 score=0.90). A total of 40,790 ED-relevant tweets were identified using the CNN-LSTM classifier. We also identified another set of tweets (ie, 17,632 ED-relevant and 83,557 ED-irrelevant tweets) posted by laypeople using manually specified rules. Using CorEx on all ED-relevant tweets, the topic model identified 162 topics. Overall, the coherence rate for topic modeling was 77.07{\%} (1264/1640), indicating a high quality of the produced topics. The topics were further reviewed and analyzed by a domain expert. Conclusions: A developed CNN-LSTM classifier could improve the efficiency of identifying ED-relevant tweets compared with the traditional manual-based method. The CorEx topic model was applied on the tweets identified by the machine learning--based classifier and the traditional manual approach separately. Highly overlapping topics were observed between the 2 cohorts of tweets. The produced topics were further reviewed by a domain expert. Some of the topics identified by the potential ED tweets may provide new avenues for understanding this serious set of disorders. ", doi="10.2196/18273", url="http://medinform.jmir.org/2020/10/e18273/", url="https://doi.org/10.2196/18273", url="http://www.ncbi.nlm.nih.gov/pubmed/33124997" } @Article{info:doi/10.2196/21222, author="Chou, Joseph H", title="Predictive Models for Neonatal Follow-Up Serum Bilirubin: Model Development and Validation", journal="JMIR Med Inform", year="2020", month="Oct", day="29", volume="8", number="10", pages="e21222", keywords="infant, newborn; neonatology; jaundice, neonatal; hyperbilirubinemia, neonatal; machine learning; supervised machine learning; data science; medical informatics; decision support techniques; models, statistical; predictive models", abstract="Background: Hyperbilirubinemia affects many newborn infants and, if not treated appropriately, can lead to irreversible brain injury. Objective: This study aims to develop predictive models of follow-up total serum bilirubin measurement and to compare their accuracy with that of clinician predictions. Methods: Subjects were patients born between June 2015 and June 2019 at 4 hospitals in Massachusetts. The prediction target was a follow-up total serum bilirubin measurement obtained <72 hours after a previous measurement. Birth before versus after February 2019 was used to generate a training set (27,428 target measurements) and a held-out test set (3320 measurements), respectively. Multiple supervised learning models were trained. To further assess model performance, predictions on the held-out test set were also compared with corresponding predictions from clinicians. Results: The best predictive accuracy on the held-out test set was obtained with the multilayer perceptron (ie, neural network, mean absolute error [MAE] 1.05 mg/dL) and Xgboost (MAE 1.04 mg/dL) models. A limited number of predictors were sufficient for constructing models with the best performance and avoiding overfitting: current bilirubin measurement, last rate of rise, proportion of time under phototherapy, time to next measurement, gestational age at birth, current age, and fractional weight change from birth. Clinicians made a total of 210 prospective predictions. The neural network model accuracy on this subset of predictions had an MAE of 1.06 mg/dL compared with clinician predictions with an MAE of 1.38 mg/dL (P<.0001). In babies born at 35 weeks of gestation or later, this approach was also applied to predict the binary outcome of subsequently exceeding consensus guidelines for phototherapy initiation and achieved an area under the receiver operator characteristic curve of 0.94 (95{\%} CI 0.91 to 0.97). Conclusions: This study developed predictive models for neonatal follow-up total serum bilirubin measurements that outperform clinicians. This may be the first report of models that predict specific bilirubin values, are not limited to near-term patients without risk factors, and take into account the effect of phototherapy. ", doi="10.2196/21222", url="http://medinform.jmir.org/2020/10/e21222/", url="https://doi.org/10.2196/21222", url="http://www.ncbi.nlm.nih.gov/pubmed/33118947" } @Article{info:doi/10.2196/21801, author="Izquierdo, Jose Luis and Ancochea, Julio and Soriano, Joan B", title="Clinical Characteristics and Prognostic Factors for Intensive Care Unit Admission of Patients With COVID-19: Retrospective Study Using Machine Learning and Natural Language Processing", journal="J Med Internet Res", year="2020", month="Oct", day="28", volume="22", number="10", pages="e21801", keywords="artificial intelligence; big data; COVID-19; electronic health records; tachypnea; SARS-CoV-2; predictive model", abstract="Background: Many factors involved in the onset and clinical course of the ongoing COVID-19 pandemic are still unknown. Although big data analytics and artificial intelligence are widely used in the realms of health and medicine, researchers are only beginning to use these tools to explore the clinical characteristics and predictive factors of patients with COVID-19. Objective: Our primary objectives are to describe the clinical characteristics and determine the factors that predict intensive care unit (ICU) admission of patients with COVID-19. Determining these factors using a well-defined population can increase our understanding of the real-world epidemiology of the disease. Methods: We used a combination of classic epidemiological methods, natural language processing (NLP), and machine learning (for predictive modeling) to analyze the electronic health records (EHRs) of patients with COVID-19. We explored the unstructured free text in the EHRs within the Servicio de Salud de Castilla-La Mancha (SESCAM) Health Care Network (Castilla-La Mancha, Spain) from the entire population with available EHRs (1,364,924 patients) from January 1 to March 29, 2020. We extracted related clinical information regarding diagnosis, progression, and outcome for all COVID-19 cases. Results: A total of 10,504 patients with a clinical or polymerase chain reaction--confirmed diagnosis of COVID-19 were identified; 5519 (52.5{\%}) were male, with a mean age of 58.2 years (SD 19.7). Upon admission, the most common symptoms were cough, fever, and dyspnea; however, all three symptoms occurred in fewer than half of the cases. Overall, 6.1{\%} (83/1353) of hospitalized patients required ICU admission. Using a machine-learning, data-driven algorithm, we identified that a combination of age, fever, and tachypnea was the most parsimonious predictor of ICU admission; patients younger than 56 years, without tachypnea, and temperature <39 degrees Celsius (or >39 {\textordmasculine}C without respiratory crackles) were not admitted to the ICU. In contrast, patients with COVID-19 aged 40 to 79 years were likely to be admitted to the ICU if they had tachypnea and delayed their visit to the emergency department after being seen in primary care. Conclusions: Our results show that a combination of easily obtainable clinical variables (age, fever, and tachypnea with or without respiratory crackles) predicts whether patients with COVID-19 will require ICU admission. ", doi="10.2196/21801", url="http://www.jmir.org/2020/10/e21801/", url="https://doi.org/10.2196/21801", url="http://www.ncbi.nlm.nih.gov/pubmed/33090964" } @Article{info:doi/10.2196/20891, author="Lee, Geun Hyeong and Shin, Soo-Yong", title="Federated Learning on Clinical Benchmark Data: Performance Assessment", journal="J Med Internet Res", year="2020", month="Oct", day="26", volume="22", number="10", pages="e20891", keywords="federated learning; medical data; privacy protection; machine learning; deep learning", abstract="Background: Federated learning (FL) is a newly proposed machine-learning method that uses a decentralized dataset. Since data transfer is not necessary for the learning process in FL, there is a significant advantage in protecting personal privacy. Therefore, many studies are being actively conducted in the applications of FL for diverse areas. Objective: The aim of this study was to evaluate the reliability and performance of FL using three benchmark datasets, including a clinical benchmark dataset. Methods: To evaluate FL in a realistic setting, we implemented FL using a client-server architecture with Python. The implemented client-server version of the FL software was deployed to Amazon Web Services. Modified National Institute of Standards and Technology (MNIST), Medical Information Mart for Intensive Care-III (MIMIC-III), and electrocardiogram (ECG) datasets were used to evaluate the performance of FL. To test FL in a realistic setting, the MNIST dataset was split into 10 different clients, with one digit for each client. In addition, we conducted four different experiments according to basic, imbalanced, skewed, and a combination of imbalanced and skewed data distributions. We also compared the performance of FL to that of the state-of-the-art method with respect to in-hospital mortality using the MIMIC-III dataset. Likewise, we conducted experiments comparing basic and imbalanced data distributions using MIMIC-III and ECG data. Results: FL on the basic MNIST dataset with 10 clients achieved an area under the receiver operating characteristic curve (AUROC) of 0.997 and an F1-score of 0.946. The experiment with the imbalanced MNIST dataset achieved an AUROC of 0.995 and an F1-score of 0.921. The experiment with the skewed MNIST dataset achieved an AUROC of 0.992 and an F1-score of 0.905. Finally, the combined imbalanced and skewed experiment achieved an AUROC of 0.990 and an F1-score of 0.891. The basic FL on in-hospital mortality using MIMIC-III data achieved an AUROC of 0.850 and an F1-score of 0.944, while the experiment with the imbalanced MIMIC-III dataset achieved an AUROC of 0.850 and an F1-score of 0.943. For ECG classification, the basic FL achieved an AUROC of 0.938 and an F1-score of 0.807, and the imbalanced ECG dataset achieved an AUROC of 0.943 and an F1-score of 0.807. Conclusions: FL demonstrated comparative performance on different benchmark datasets. In addition, FL demonstrated reliable performance in cases where the distribution was imbalanced, skewed, and extreme, reflecting the real-life scenario in which data distributions from various hospitals are different. FL can achieve high performance while maintaining privacy protection because there is no requirement to centralize the data. ", doi="10.2196/20891", url="http://www.jmir.org/2020/10/e20891/", url="https://doi.org/10.2196/20891", url="http://www.ncbi.nlm.nih.gov/pubmed/33104011" } @Article{info:doi/10.2196/20346, author="Milne-Ives, Madison and de Cock, Caroline and Lim, Ernest and Shehadeh, Melissa Harper and de Pennington, Nick and Mole, Guy and Normando, Eduardo and Meinert, Edward", title="The Effectiveness of Artificial Intelligence Conversational Agents in Health Care: Systematic Review", journal="J Med Internet Res", year="2020", month="Oct", day="22", volume="22", number="10", pages="e20346", keywords="artificial intelligence; avatar; chatbot; conversational agent; digital health; intelligent assistant; speech recognition software; virtual assistant; virtual coach; virtual health care; virtual nursing; voice recognition software", abstract="Background: The high demand for health care services and the growing capability of artificial intelligence have led to the development of conversational agents designed to support a variety of health-related activities, including behavior change, treatment support, health monitoring, training, triage, and screening support. Automation of these tasks could free clinicians to focus on more complex work and increase the accessibility to health care services for the public. An overarching assessment of the acceptability, usability, and effectiveness of these agents in health care is needed to collate the evidence so that future development can target areas for improvement and potential for sustainable adoption. Objective: This systematic review aims to assess the effectiveness and usability of conversational agents in health care and identify the elements that users like and dislike to inform future research and development of these agents. Methods: PubMed, Medline (Ovid), EMBASE (Excerpta Medica dataBASE), CINAHL (Cumulative Index to Nursing and Allied Health Literature), Web of Science, and the Association for Computing Machinery Digital Library were systematically searched for articles published since 2008 that evaluated unconstrained natural language processing conversational agents used in health care. EndNote (version X9, Clarivate Analytics) reference management software was used for initial screening, and full-text screening was conducted by 1 reviewer. Data were extracted, and the risk of bias was assessed by one reviewer and validated by another. Results: A total of 31 studies were selected and included a variety of conversational agents, including 14 chatbots (2 of which were voice chatbots), 6 embodied conversational agents (3 of which were interactive voice response calls, virtual patients, and speech recognition screening systems), 1 contextual question-answering agent, and 1 voice recognition triage system. Overall, the evidence reported was mostly positive or mixed. Usability and satisfaction performed well (27/30 and 26/31), and positive or mixed effectiveness was found in three-quarters of the studies (23/30). However, there were several limitations of the agents highlighted in specific qualitative feedback. Conclusions: The studies generally reported positive or mixed evidence for the effectiveness, usability, and satisfactoriness of the conversational agents investigated, but qualitative user perceptions were more mixed. The quality of many of the studies was limited, and improved study design and reporting are necessary to more accurately evaluate the usefulness of the agents in health care and identify key areas for improvement. Further research should also analyze the cost-effectiveness, privacy, and security of the agents. International Registered Report Identifier (IRRID): RR2-10.2196/16934 ", doi="10.2196/20346", url="http://www.jmir.org/2020/10/e20346/", url="https://doi.org/10.2196/20346", url="http://www.ncbi.nlm.nih.gov/pubmed/33090118" } @Article{info:doi/10.2196/22550, author="Almog, Yasmeen Adar and Rai, Angshu and Zhang, Patrick and Moulaison, Amanda and Powell, Ross and Mishra, Anirban and Weinberg, Kerry and Hamilton, Celeste and Oates, Mary and McCloskey, Eugene and Cummings, Steven R", title="Deep Learning With Electronic Health Records for Short-Term Fracture Risk Identification: Crystal Bone Algorithm Development and Validation", journal="J Med Internet Res", year="2020", month="Oct", day="16", volume="22", number="10", pages="e22550", keywords="fracture; bone; osteoporosis; low bone mass; prediction; natural language processing; NLP; machine learning; deep learning; artificial intelligence; AI; electronic health record; EHR", abstract="Background: Fractures as a result of osteoporosis and low bone mass are common and give rise to significant clinical, personal, and economic burden. Even after a fracture occurs, high fracture risk remains widely underdiagnosed and undertreated. Common fracture risk assessment tools utilize a subset of clinical risk factors for prediction, and often require manual data entry. Furthermore, these tools predict risk over the long term and do not explicitly provide short-term risk estimates necessary to identify patients likely to experience a fracture in the next 1-2 years. Objective: The goal of this study was to develop and evaluate an algorithm for the identification of patients at risk of fracture in a subsequent 1- to 2-year period. In order to address the aforementioned limitations of current prediction tools, this approach focused on a short-term timeframe, automated data entry, and the use of longitudinal data to inform the predictions. Methods: Using retrospective electronic health record data from over 1,000,000 patients, we developed Crystal Bone, an algorithm that applies machine learning techniques from natural language processing to the temporal nature of patient histories to generate short-term fracture risk predictions. Similar to how language models predict the next word in a given sentence or the topic of a document, Crystal Bone predicts whether a patient's future trajectory might contain a fracture event, or whether the signature of the patient's journey is similar to that of a typical future fracture patient. A holdout set with 192,590 patients was used to validate accuracy. Experimental baseline models and human-level performance were used for comparison. Results: The model accurately predicted 1- to 2-year fracture risk for patients aged over 50 years (area under the receiver operating characteristics curve [AUROC] 0.81). These algorithms outperformed the experimental baselines (AUROC 0.67) and showed meaningful improvements when compared to retrospective approximation of human-level performance by correctly identifying 9649 of 13,765 (70{\%}) at-risk patients who did not receive any preventative bone-health-related medical interventions from their physicians. Conclusions: These findings indicate that it is possible to use a patient's unique medical history as it changes over time to predict the risk of short-term fracture. Validating and applying such a tool within the health care system could enable automated and widespread prediction of this risk and may help with identification of patients at very high risk of fracture. ", doi="10.2196/22550", url="http://www.jmir.org/2020/10/e22550/", url="https://doi.org/10.2196/22550", url="http://www.ncbi.nlm.nih.gov/pubmed/32956069" } @Article{info:doi/10.2196/19878, author="Liu, Ping-Yen and Tsai, Yi-Shan and Chen, Po-Lin and Tsai, Huey-Pin and Hsu, Ling-Wei and Wang, Chi-Shiang and Lee, Nan-Yao and Huang, Mu-Shiang and Wu, Yun-Chiao and Ko, Wen-Chien and Yang, Yi-Ching and Chiang, Jung-Hsien and Shen, Meng-Ru", title="Application of an Artificial Intelligence Trilogy to Accelerate Processing of Suspected Patients With SARS-CoV-2 at a Smart Quarantine Station: Observational Study", journal="J Med Internet Res", year="2020", month="Oct", day="14", volume="22", number="10", pages="e19878", keywords="SARS-CoV-2; COVID-19; artificial intelligence; smart device assisted decision making; quarantine station", abstract="Background: As the COVID-19 epidemic increases in severity, the burden of quarantine stations outside emergency departments (EDs) at hospitals is increasing daily. To address the high screening workload at quarantine stations, all staff members with medical licenses are required to work shifts in these stations. Therefore, it is necessary to simplify the workflow and decision-making process for physicians and surgeons from all subspecialties. Objective: The aim of this paper is to demonstrate how the National Cheng Kung University Hospital artificial intelligence (AI) trilogy of diversion to a smart quarantine station, AI-assisted image interpretation, and a built-in clinical decision-making algorithm improves medical care and reduces quarantine processing times. Methods: This observational study on the emerging COVID-19 pandemic included 643 patients. An ``AI trilogy'' of diversion to a smart quarantine station, AI-assisted image interpretation, and a built-in clinical decision-making algorithm on a tablet computer was applied to shorten the quarantine survey process and reduce processing time during the COVID-19 pandemic. Results: The use of the AI trilogy facilitated the processing of suspected cases of COVID-19 with or without symptoms; also, travel, occupation, contact, and clustering histories were obtained with the tablet computer device. A separate AI-mode function that could quickly recognize pulmonary infiltrates on chest x-rays was merged into the smart clinical assisting system (SCAS), and this model was subsequently trained with COVID-19 pneumonia cases from the GitHub open source data set. The detection rates for posteroanterior and anteroposterior chest x-rays were 55/59 (93{\%}) and 5/11 (45{\%}), respectively. The SCAS algorithm was continuously adjusted based on updates to the Taiwan Centers for Disease Control public safety guidelines for faster clinical decision making. Our ex vivo study demonstrated the efficiency of disinfecting the tablet computer surface by wiping it twice with 75{\%} alcohol sanitizer. To further analyze the impact of the AI application in the quarantine station, we subdivided the station group into groups with or without AI. Compared with the conventional ED (n=281), the survey time at the quarantine station (n=1520) was significantly shortened; the median survey time at the ED was 153 minutes (95{\%} CI 108.5-205.0), vs 35 minutes at the quarantine station (95{\%} CI 24-56; P<.001). Furthermore, the use of the AI application in the quarantine station reduced the survey time in the quarantine station; the median survey time without AI was 101 minutes (95{\%} CI 40-153), vs 34 minutes (95{\%} CI 24-53) with AI in the quarantine station (P<.001). Conclusions: The AI trilogy improved our medical care workflow by shortening the quarantine survey process and reducing the processing time, which is especially important during an emerging infectious disease epidemic. ", doi="10.2196/19878", url="http://www.jmir.org/2020/10/e19878/", url="https://doi.org/10.2196/19878", url="http://www.ncbi.nlm.nih.gov/pubmed/33001832" } @Article{info:doi/10.2196/18287, author="Xiu, Xiaolei and Qian, Qing and Wu, Sizhu", title="Construction of a Digestive System Tumor Knowledge Graph Based on Chinese Electronic Medical Records: Development and Usability Study", journal="JMIR Med Inform", year="2020", month="Oct", day="7", volume="8", number="10", pages="e18287", keywords="Chinese electronic medical records; knowledge graph; digestive system tumor; graph evaluation", abstract="Background: With the increasing incidences and mortality of digestive system tumor diseases in China, ways to use clinical experience data in Chinese electronic medical records (CEMRs) to determine potentially effective relationships between diagnosis and treatment have become a priority. As an important part of artificial intelligence, a knowledge graph is a powerful tool for information processing and knowledge organization that provides an ideal means to solve this problem. Objective: This study aimed to construct a semantic-driven digestive system tumor knowledge graph (DSTKG) to represent the knowledge in CEMRs with fine granularity and semantics. Methods: This paper focuses on the knowledge graph schema and semantic relationships that were the main challenges for constructing a Chinese tumor knowledge graph. The DSTKG was developed through a multistep procedure. As an initial step, a complete DSTKG construction framework based on CEMRs was proposed. Then, this research built a knowledge graph schema containing 7 classes and 16 kinds of semantic relationships and accomplished the DSTKG by knowledge extraction, named entity linking, and drawing the knowledge graph. Finally, the quality of the DSTKG was evaluated from 3 aspects: data layer, schema layer, and application layer. Results: Experts agreed that the DSTKG was good overall (mean score 4.20). Especially for the aspects of ``rationality of schema structure,'' ``scalability,'' and ``readability of results,'' the DSTKG performed well, with scores of 4.72, 4.67, and 4.69, respectively, which were much higher than the average. However, the small amount of data in the DSTKG negatively affected its ``practicability'' score. Compared with other Chinese tumor knowledge graphs, the DSTKG can represent more granular entities, properties, and semantic relationships. In addition, the DSTKG was flexible, allowing personalized customization to meet the designer's focus on specific interests in the digestive system tumor. Conclusions: We constructed a granular semantic DSTKG. It could provide guidance for the construction of a tumor knowledge graph and provide a preliminary step for the intelligent application of knowledge graphs based on CEMRs. Additional data sources and stronger research on assertion classification are needed to gain insight into the DSTKG's potential. ", doi="10.2196/18287", url="http://medinform.jmir.org/2020/10/e18287/", url="https://doi.org/10.2196/18287", url="http://www.ncbi.nlm.nih.gov/pubmed/33026359" } @Article{info:doi/10.2196/22845, author="Zhang, Jingwen and Oh, Yoo Jung and Lange, Patrick and Yu, Zhou and Fukuoka, Yoshimi", title="Artificial Intelligence Chatbot Behavior Change Model for Designing Artificial Intelligence Chatbots to Promote Physical Activity and a Healthy Diet: Viewpoint", journal="J Med Internet Res", year="2020", month="Sep", day="30", volume="22", number="9", pages="e22845", keywords="chatbot; conversational agent; artificial intelligence; physical activity; diet; intervention; behavior change; natural language processing; communication", abstract="Background: Chatbots empowered by artificial intelligence (AI) can increasingly engage in natural conversations and build relationships with users. Applying AI chatbots to lifestyle modification programs is one of the promising areas to develop cost-effective and feasible behavior interventions to promote physical activity and a healthy diet. Objective: The purposes of this perspective paper are to present a brief literature review of chatbot use in promoting physical activity and a healthy diet, describe the AI chatbot behavior change model our research team developed based on extensive interdisciplinary research, and discuss ethical principles and considerations. Methods: We conducted a preliminary search of studies reporting chatbots for improving physical activity and/or diet in four databases in July 2020. We summarized the characteristics of the chatbot studies and reviewed recent developments in human-AI communication research and innovations in natural language processing. Based on the identified gaps and opportunities, as well as our own clinical and research experience and findings, we propose an AI chatbot behavior change model. Results: Our review found a lack of understanding around theoretical guidance and practical recommendations on designing AI chatbots for lifestyle modification programs. The proposed AI chatbot behavior change model consists of the following four components to provide such guidance: (1) designing chatbot characteristics and understanding user background; (2) building relational capacity; (3) building persuasive conversational capacity; and (4) evaluating mechanisms and outcomes. The rationale and evidence supporting the design and evaluation choices for this model are presented in this paper. Conclusions: As AI chatbots become increasingly integrated into various digital communications, our proposed theoretical framework is the first step to conceptualize the scope of utilization in health behavior change domains and to synthesize all possible dimensions of chatbot features to inform intervention design and evaluation. There is a need for more interdisciplinary work to continue developing AI techniques to improve a chatbot's relational and persuasive capacities to change physical activity and diet behaviors with strong ethical principles. ", doi="10.2196/22845", url="https://www.jmir.org/2020/9/e22845", url="https://doi.org/10.2196/22845", url="http://www.ncbi.nlm.nih.gov/pubmed/32996892" } @Article{info:doi/10.2196/21849, author="K{\"u}hnle, Lara and M{\"u}cke, Urs and Lechner, Werner M and Klawonn, Frank and Grigull, Lorenz", title="Development of a Social Network for People Without a Diagnosis (RarePairs): Evaluation Study", journal="J Med Internet Res", year="2020", month="Sep", day="29", volume="22", number="9", pages="e21849", keywords="rare disease; diagnostic support tool; prototype; social network; machine learning; artificial intelligence", abstract="Background: Diagnostic delay in rare disease (RD) is common, occasionally lasting up to more than 20 years. In attempting to reduce it, diagnostic support tools have been studied extensively. However, social platforms have not yet been used for systematic diagnostic support. This paper illustrates the development and prototypic application of a social network using scientifically developed questions to match individuals without a diagnosis. Objective: The study aimed to outline, create, and evaluate a prototype tool (a social network platform named RarePairs), helping patients with undiagnosed RDs to find individuals with similar symptoms. The prototype includes a matching algorithm, bringing together individuals with similar disease burden in the lead-up to diagnosis. Methods: We divided our project into 4 phases. In phase 1, we used known data and findings in the literature to understand and specify the context of use. In phase 2, we specified the user requirements. In phase 3, we designed a prototype based on the results of phases 1 and 2, as well as incorporating a state-of-the-art questionnaire with 53 items for recognizing an RD. Lastly, we evaluated this prototype with a data set of 973 questionnaires from individuals suffering from different RDs using 24 distance calculating methods. Results: Based on a step-by-step construction process, the digital patient platform prototype, RarePairs, was developed. In order to match individuals with similar experiences, it uses answer patterns generated by a specifically designed questionnaire (Q53). A total of 973 questionnaires answered by patients with RDs were used to construct and test an artificial intelligence (AI) algorithm like the k-nearest neighbor search. With this, we found matches for every single one of the 973 records. The cross-validation of those matches showed that the algorithm outperforms random matching significantly. Statistically, for every data set the algorithm found at least one other record (match) with the same diagnosis. Conclusions: Diagnostic delay is torturous for patients without a diagnosis. Shortening the delay is important for both doctors and patients. Diagnostic support using AI can be promoted differently. The prototype of the social media platform RarePairs might be a low-threshold patient platform, and proved suitable to match and connect different individuals with comparable symptoms. This exchange promoted through RarePairs might be used to speed up the diagnostic process. Further studies include its evaluation in a prospective setting and implementation of RarePairs as a mobile phone app. ", doi="10.2196/21849", url="http://www.jmir.org/2020/9/e21849/", url="https://doi.org/10.2196/21849", url="http://www.ncbi.nlm.nih.gov/pubmed/32990634" } @Article{info:doi/10.2196/20645, author="Li, Rui and Yin, Changchang and Yang, Samuel and Qian, Buyue and Zhang, Ping", title="Marrying Medical Domain Knowledge With Deep Learning on Electronic Health Records: A Deep Visual Analytics Approach", journal="J Med Internet Res", year="2020", month="Sep", day="28", volume="22", number="9", pages="e20645", keywords="electronic health records; interpretable deep learning; knowledge graph; visual analytics", abstract="Background: Deep learning models have attracted significant interest from health care researchers during the last few decades. There have been many studies that apply deep learning to medical applications and achieve promising results. However, there are three limitations to the existing models: (1) most clinicians are unable to interpret the results from the existing models, (2) existing models cannot incorporate complicated medical domain knowledge (eg, a disease causes another disease), and (3) most existing models lack visual exploration and interaction. Both the electronic health record (EHR) data set and the deep model results are complex and abstract, which impedes clinicians from exploring and communicating with the model directly. Objective: The objective of this study is to develop an interpretable and accurate risk prediction model as well as an interactive clinical prediction system to support EHR data exploration, knowledge graph demonstration, and model interpretation. Methods: A domain-knowledge--guided recurrent neural network (DG-RNN) model is proposed to predict clinical risks. The model takes medical event sequences as input and incorporates medical domain knowledge by attending to a subgraph of the whole medical knowledge graph. A global pooling operation and a fully connected layer are used to output the clinical outcomes. The middle results and the parameters of the fully connected layer are helpful in identifying which medical events cause clinical risks. DG-Viz is also designed to support EHR data exploration, knowledge graph demonstration, and model interpretation. Results: We conducted both risk prediction experiments and a case study on a real-world data set. A total of 554 patients with heart failure and 1662 control patients without heart failure were selected from the data set. The experimental results show that the proposed DG-RNN outperforms the state-of-the-art approaches by approximately 1.5{\%}. The case study demonstrates how our medical physician collaborator can effectively explore the data and interpret the prediction results using DG-Viz. Conclusions: In this study, we present DG-Viz, an interactive clinical prediction system, which brings together the power of deep learning (ie, a DG-RNN--based model) and visual analytics to predict clinical risks and visually interpret the EHR prediction results. Experimental results and a case study on heart failure risk prediction tasks demonstrate the effectiveness and usefulness of the DG-Viz system. This study will pave the way for interactive, interpretable, and accurate clinical risk predictions. ", doi="10.2196/20645", url="http://www.jmir.org/2020/9/e20645/", url="https://doi.org/10.2196/20645", url="http://www.ncbi.nlm.nih.gov/pubmed/32985996" } @Article{info:doi/10.2196/18660, author="Kriventsov, Stan and Lindsey, Alexander and Hayeri, Amir", title="The Diabits App for Smartphone-Assisted Predictive Monitoring of Glycemia in Patients With Diabetes: Retrospective Observational Study", journal="JMIR Diabetes", year="2020", month="Sep", day="22", volume="5", number="3", pages="e18660", keywords="blood glucose predictions; type 1 diabetes; artificial intelligence; machine learning; digital health; mobile phone", abstract="Background: Diabetes mellitus, which causes dysregulation of blood glucose in humans, is a major public health challenge. Patients with diabetes must monitor their glycemic levels to keep them in a healthy range. This task is made easier by using continuous glucose monitoring (CGM) devices and relaying their output to smartphone apps, thus providing users with real-time information on their glycemic fluctuations and possibly predicting future trends. Objective: This study aims to discuss various challenges of predictive monitoring of glycemia and examines the accuracy and blood glucose control effects of Diabits, a smartphone app that helps patients with diabetes monitor and manage their blood glucose levels in real time. Methods: Using data from CGM devices and user input, Diabits applies machine learning techniques to create personalized patient models and predict blood glucose fluctuations up to 60 min in advance. These predictions give patients an opportunity to take pre-emptive action to maintain their blood glucose values within the reference range. In this retrospective observational cohort study, the predictive accuracy of Diabits and the correlation between daily use of the app and blood glucose control metrics were examined based on real app users' data. Moreover, the accuracy of predictions on the 2018 Ohio T1DM (type 1 diabetes mellitus) data set was calculated and compared against other published results. Results: On the basis of more than 6.8 million data points, 30-min Diabits predictions evaluated using Parkes Error Grid were found to be 86.89{\%} (5,963,930/6,864,130) clinically accurate (zone A) and 99.56{\%} (6,833,625/6,864,130) clinically acceptable (zones A and B), whereas 60-min predictions were 70.56{\%} (4,843,605/6,864,130) clinically accurate and 97.49{\%} (6,692,165/6,864,130) clinically acceptable. By analyzing daily use statistics and CGM data for the 280 most long-standing users of Diabits, it was established that under free-living conditions, many common blood glucose control metrics improved with increased frequency of app use. For instance, the average blood glucose for the days these users did not interact with the app was 154.0 (SD 47.2) mg/dL, with 67.52{\%} of the time spent in the healthy 70 to 180 mg/dL range. For days with 10 or more Diabits sessions, the average blood glucose decreased to 141.6 (SD 42.0) mg/dL (P<.001), whereas the time in euglycemic range increased to 74.28{\%} (P<.001). On the Ohio T1DM data set of 6 patients with type 1 diabetes, 30-min predictions of the base Diabits model had an average root mean square error of 18.68 (SD 2.19) mg/dL, which is an improvement over the published state-of-the-art results for this data set. Conclusions: Diabits accurately predicts future glycemic fluctuations, potentially making it easier for patients with diabetes to maintain their blood glucose in the reference range. Furthermore, an improvement in glucose control was observed on days with more frequent Diabits use. ", doi="10.2196/18660", url="http://diabetes.jmir.org/2020/3/e18660/", url="https://doi.org/10.2196/18660", url="http://www.ncbi.nlm.nih.gov/pubmed/32960180" } @Article{info:doi/10.2196/19897, author="Li, Juan and Maharjan, Bikesh and Xie, Bo and Tao, Cui", title="A Personalized Voice-Based Diet Assistant for Caregivers of Alzheimer Disease and Related Dementias: System Development and Validation", journal="J Med Internet Res", year="2020", month="Sep", day="21", volume="22", number="9", pages="e19897", keywords="Alzheimer disease; dementia; diet; knowledge; ontology; voice assistant", abstract="Background: The world's aging population is increasing, with an expected increase in the prevalence of Alzheimer disease and related dementias (ADRD). Proper nutrition and good eating behavior show promise for preventing and slowing the progression of ADRD and consequently improving patients with ADRD's health status and quality of life. Most ADRD care is provided by informal caregivers, so assisting caregivers to manage patients with ADRD's diet is important. Objective: This study aims to design, develop, and test an artificial intelligence--powered voice assistant to help informal caregivers manage the daily diet of patients with ADRD and learn food and nutrition-related knowledge. Methods: The voice assistant is being implemented in several steps: construction of a comprehensive knowledge base with ontologies that define ADRD diet care and user profiles, and is extended with external knowledge graphs; management of conversation between users and the voice assistant; personalized ADRD diet services provided through a semantics-based knowledge graph search and reasoning engine; and system evaluation in use cases with additional qualitative evaluations. Results: A prototype voice assistant was evaluated in the lab using various use cases. Preliminary qualitative test results demonstrate reasonable rates of dialogue success and recommendation correctness. Conclusions: The voice assistant provides a natural, interactive interface for users, and it does not require the user to have a technical background, which may facilitate senior caregivers' use in their daily care tasks. This study suggests the feasibility of using the intelligent voice assistant to help caregivers manage patients with ADRD's diet. ", doi="10.2196/19897", url="http://www.jmir.org/2020/9/e19897/", url="https://doi.org/10.2196/19897", url="http://www.ncbi.nlm.nih.gov/pubmed/32955452" } @Article{info:doi/10.2196/21983, author="Bang, Chang Seok and Lee, Jae Jun and Baik, Gwang Ho", title="Artificial Intelligence for the Prediction of Helicobacter Pylori Infection in Endoscopic Images: Systematic Review and Meta-Analysis Of Diagnostic Test Accuracy", journal="J Med Internet Res", year="2020", month="Sep", day="16", volume="22", number="9", pages="e21983", keywords="artificial intelligence; convolutional neural network; deep learning; machine learning; endoscopy; Helicobacter pylori", abstract="Background: Helicobacter pylori plays a central role in the development of gastric cancer, and prediction of H pylori infection by visual inspection of the gastric mucosa is an important function of endoscopy. However, there are currently no established methods of optical diagnosis of H pylori infection using endoscopic images. Definitive diagnosis requires endoscopic biopsy. Artificial intelligence (AI) has been increasingly adopted in clinical practice, especially for image recognition and classification. Objective: This study aimed to evaluate the diagnostic test accuracy of AI for the prediction of H pylori infection using endoscopic images. Methods: Two independent evaluators searched core databases. The inclusion criteria included studies with endoscopic images of H pylori infection and with application of AI for the prediction of H pylori infection presenting diagnostic performance. Systematic review and diagnostic test accuracy meta-analysis were performed. Results: Ultimately, 8 studies were identified. Pooled sensitivity, specificity, diagnostic odds ratio, and area under the curve of AI for the prediction of H pylori infection were 0.87 (95{\%} CI 0.72-0.94), 0.86 (95{\%} CI 0.77-0.92), 40 (95{\%} CI 15-112), and 0.92 (95{\%} CI 0.90-0.94), respectively, in the 1719 patients (385 patients with H pylori infection vs 1334 controls). Meta-regression showed methodological quality and included the number of patients in each study for the purpose of heterogeneity. There was no evidence of publication bias. The accuracy of the AI algorithm reached 82{\%} for discrimination between noninfected images and posteradication images. Conclusions: An AI algorithm is a reliable tool for endoscopic diagnosis of H pylori infection. The limitations of lacking external validation performance and being conducted only in Asia should be overcome. Trial Registration: PROSPERO CRD42020175957; https://www.crd.york.ac.uk/prospero/display{\_}record.php?RecordID=175957 ", doi="10.2196/21983", url="http://www.jmir.org/2020/9/e21983/", url="https://doi.org/10.2196/21983", url="http://www.ncbi.nlm.nih.gov/pubmed/32936088" } @Article{info:doi/10.2196/18689, author="Zhang, Liang and Qu, Yue and Jin, Bo and Jing, Lu and Gao, Zhan and Liang, Zhanhua", title="An Intelligent Mobile-Enabled System for Diagnosing Parkinson Disease: Development and Validation of a Speech Impairment Detection System", journal="JMIR Med Inform", year="2020", month="Sep", day="16", volume="8", number="9", pages="e18689", keywords="Parkinson disease; speech disorder; remote diagnosis; artificial intelligence; mobile phone app; mobile health", abstract="Background: Parkinson disease (PD) is one of the most common neurological diseases. At present, because the exact cause is still unclear, accurate diagnosis and progression monitoring remain challenging. In recent years, exploring the relationship between PD and speech impairment has attracted widespread attention in the academic world. Most of the studies successfully validated the effectiveness of some vocal features. Moreover, the noninvasive nature of speech signal--based testing has pioneered a new way for telediagnosis and telemonitoring. In particular, there is an increasing demand for artificial intelligence--powered tools in the digital health era. Objective: This study aimed to build a real-time speech signal analysis tool for PD diagnosis and severity assessment. Further, the underlying system should be flexible enough to integrate any machine learning or deep learning algorithm. Methods: At its core, the system we built consists of two parts: (1) speech signal processing: both traditional and novel speech signal processing technologies have been employed for feature engineering, which can automatically extract a few linear and nonlinear dysphonia features, and (2) application of machine learning algorithms: some classical regression and classification algorithms from the machine learning field have been tested; we then chose the most efficient algorithms and relevant features. Results: Experimental results showed that our system had an outstanding ability to both diagnose and assess severity of PD. By using both linear and nonlinear dysphonia features, the accuracy reached 88.74{\%} and recall reached 97.03{\%} in the diagnosis task. Meanwhile, mean absolute error was 3.7699 in the assessment task. The system has already been deployed within a mobile app called No Pa. Conclusions: This study performed diagnosis and severity assessment of PD from the perspective of speech order detection. The efficiency and effectiveness of the algorithms indirectly validated the practicality of the system. In particular, the system reflects the necessity of a publicly accessible PD diagnosis and assessment system that can perform telediagnosis and telemonitoring of PD. This system can also optimize doctors' decision-making processes regarding treatments. ", doi="10.2196/18689", url="http://medinform.jmir.org/2020/9/e18689/", url="https://doi.org/10.2196/18689", url="http://www.ncbi.nlm.nih.gov/pubmed/32936086" } @Article{info:doi/10.2196/20641, author="Park, Eunjeong and Lee, Kijeong and Han, Taehwa and Nam, Hyo Suk", title="Automatic Grading of Stroke Symptoms for Rapid Assessment Using Optimized Machine Learning and 4-Limb Kinematics: Clinical Validation Study", journal="J Med Internet Res", year="2020", month="Sep", day="16", volume="22", number="9", pages="e20641", keywords="machine learning; artificial intelligence; sensors; kinematics; stroke; telemedicine", abstract="Background: Subtle abnormal motor signs are indications of serious neurological diseases. Although neurological deficits require fast initiation of treatment in a restricted time, it is difficult for nonspecialists to detect and objectively assess the symptoms. In the clinical environment, diagnoses and decisions are based on clinical grading methods, including the National Institutes of Health Stroke Scale (NIHSS) score or the Medical Research Council (MRC) score, which have been used to measure motor weakness. Objective grading in various environments is necessitated for consistent agreement among patients, caregivers, paramedics, and medical staff to facilitate rapid diagnoses and dispatches to appropriate medical centers. Objective: In this study, we aimed to develop an autonomous grading system for stroke patients. We investigated the feasibility of our new system to assess motor weakness and grade NIHSS and MRC scores of 4 limbs, similar to the clinical examinations performed by medical staff. Methods: We implemented an automatic grading system composed of a measuring unit with wearable sensors and a grading unit with optimized machine learning. Inertial sensors were attached to measure subtle weaknesses caused by paralysis of upper and lower limbs. We collected 60 instances of data with kinematic features of motor disorders from neurological examination and demographic information of stroke patients with NIHSS 0 or 1 and MRC 7, 8, or 9 grades in a stroke unit. Training data with 240 instances were generated using a synthetic minority oversampling technique to complement the imbalanced number of data between classes and low number of training data. We trained 2 representative machine learning algorithms, an ensemble and a support vector machine (SVM), to implement auto-NIHSS and auto-MRC grading. The optimized algorithms performed a 5-fold cross-validation and were searched by Bayes optimization in 30 trials. The trained model was tested with the 60 original hold-out instances for performance evaluation in accuracy, sensitivity, specificity, and area under the receiver operating characteristics curve (AUC). Results: The proposed system can grade NIHSS scores with an accuracy of 83.3{\%} and an AUC of 0.912 using an optimized ensemble algorithm, and it can grade with an accuracy of 80.0{\%} and an AUC of 0.860 using an optimized SVM algorithm. The auto-MRC grading achieved an accuracy of 76.7{\%} and a mean AUC of 0.870 in SVM classification and an accuracy of 78.3{\%} and a mean AUC of 0.877 in ensemble classification. Conclusions: The automatic grading system quantifies proximal weakness in real time and assesses symptoms through automatic grading. The pilot outcomes demonstrated the feasibility of remote monitoring of motor weakness caused by stroke. The system can facilitate consistent grading with instant assessment and expedite dispatches to appropriate hospitals and treatment initiation by sharing auto-MRC and auto-NIHSS scores between prehospital and hospital responses as an objective observation. ", doi="10.2196/20641", url="http://www.jmir.org/2020/9/e20641/", url="https://doi.org/10.2196/20641", url="http://www.ncbi.nlm.nih.gov/pubmed/32936079" } @Article{info:doi/10.2196/19133, author="Ferrario, Andrea and Demiray, Burcu and Yordanova, Kristina and Luo, Minxia and Martin, Mike", title="Social Reminiscence in Older Adults' Everyday Conversations: Automated Detection Using Natural Language Processing and Machine Learning", journal="J Med Internet Res", year="2020", month="Sep", day="15", volume="22", number="9", pages="e19133", keywords="aging; dementia; reminiscence; real-life conversations; electronically activated recorder (EAR); natural language processing; machine learning; imbalanced learning", abstract="Background: Reminiscence is the act of thinking or talking about personal experiences that occurred in the past. It is a central task of old age that is essential for healthy aging, and it serves multiple functions, such as decision-making and introspection, transmitting life lessons, and bonding with others. The study of social reminiscence behavior in everyday life can be used to generate data and detect reminiscence from general conversations. Objective: The aims of this original paper are to (1) preprocess coded transcripts of conversations in German of older adults with natural language processing (NLP), and (2) implement and evaluate learning strategies using different NLP features and machine learning algorithms to detect reminiscence in a corpus of transcripts. Methods: The methods in this study comprise (1) collecting and coding of transcripts of older adults' conversations in German, (2) preprocessing transcripts to generate NLP features (bag-of-words models, part-of-speech tags, pretrained German word embeddings), and (3) training machine learning models to detect reminiscence using random forests, support vector machines, and adaptive and extreme gradient boosting algorithms. The data set comprises 2214 transcripts, including 109 transcripts with reminiscence. Due to class imbalance in the data, we introduced three learning strategies: (1) class-weighted learning, (2) a meta-classifier consisting of a voting ensemble, and (3) data augmentation with the Synthetic Minority Oversampling Technique (SMOTE) algorithm. For each learning strategy, we performed cross-validation on a random sample of the training data set of transcripts. We computed the area under the curve (AUC), the average precision (AP), precision, recall, as well as F1 score and specificity measures on the test data, for all combinations of NLP features, algorithms, and learning strategies. Results: Class-weighted support vector machines on bag-of-words features outperformed all other classifiers (AUC=0.91, AP=0.56, precision=0.5, recall=0.45, F1=0.48, specificity=0.98), followed by support vector machines on SMOTE-augmented data and word embeddings features (AUC=0.89, AP=0.54, precision=0.35, recall=0.59, F1=0.44, specificity=0.94). For the meta-classifier strategy, adaptive and extreme gradient boosting algorithms trained on word embeddings and bag-of-words outperformed all other classifiers and NLP features; however, the performance of the meta-classifier learning strategy was lower compared to other strategies, with highly imbalanced precision-recall trade-offs. Conclusions: This study provides evidence of the applicability of NLP and machine learning pipelines for the automated detection of reminiscence in older adults' everyday conversations in German. The methods and findings of this study could be relevant for designing unobtrusive computer systems for the real-time detection of social reminiscence in the everyday life of older adults and classifying their functions. With further improvements, these systems could be deployed in health interventions aimed at improving older adults' well-being by promoting self-reflection and suggesting coping strategies to be used in the case of dysfunctional reminiscence cases, which can undermine physical and mental health. ", doi="10.2196/19133", url="http://www.jmir.org/2020/9/e19133/", url="https://doi.org/10.2196/19133", url="http://www.ncbi.nlm.nih.gov/pubmed/32866108" } @Article{info:doi/10.2196/21573, author="Shen, Jiayi and Chen, Jiebin and Zheng, Zequan and Zheng, Jiabin and Liu, Zherui and Song, Jian and Wong, Sum Yi and Wang, Xiaoling and Huang, Mengqi and Fang, Po-Han and Jiang, Bangsheng and Tsang, Winghei and He, Zonglin and Liu, Taoran and Akinwunmi, Babatunde and Wang, Chi Chiu and Zhang, Casper J P and Huang, Jian and Ming, Wai-Kit", title="An Innovative Artificial Intelligence--Based App for the Diagnosis of Gestational Diabetes Mellitus (GDM-AI): Development Study", journal="J Med Internet Res", year="2020", month="Sep", day="15", volume="22", number="9", pages="e21573", keywords="AI; application; disease diagnosis; maternal health care; artificial intelligence; app; women; rural; innovation; diabetes; gestational diabetes; diagnosis", abstract="Background: Gestational diabetes mellitus (GDM) can cause adverse consequences to both mothers and their newborns. However, pregnant women living in low- and middle-income areas or countries often fail to receive early clinical interventions at local medical facilities due to restricted availability of GDM diagnosis. The outstanding performance of artificial intelligence (AI) in disease diagnosis in previous studies demonstrates its promising applications in GDM diagnosis. Objective: This study aims to investigate the implementation of a well-performing AI algorithm in GDM diagnosis in a setting, which requires fewer medical equipment and staff and to establish an app based on the AI algorithm. This study also explores possible progress if our app is widely used. Methods: An AI model that included 9 algorithms was trained on 12,304 pregnant outpatients with their consent who received a test for GDM in the obstetrics and gynecology department of the First Affiliated Hospital of Jinan University, a local hospital in South China, between November 2010 and October 2017. GDM was diagnosed according to American Diabetes Association (ADA) 2011 diagnostic criteria. Age and fasting blood glucose were chosen as critical parameters.For validation, we performed k-fold cross-validation (k=5) for the internal dataset and an external validation dataset that included 1655 cases from the Prince of Wales Hospital, the affiliated teaching hospital of the Chinese University of Hong Kong, a non-local hospital. Accuracy, sensitivity, and other criteria were calculated for each algorithm. Results: The areas under the receiver operating characteristic curve (AUROC) of external validation dataset for support vector machine (SVM), random forest, AdaBoost, k-nearest neighbors (kNN), naive Bayes (NB), decision tree, logistic regression (LR), eXtreme gradient boosting (XGBoost), and gradient boosting decision tree (GBDT) were 0.780, 0.657, 0.736, 0.669, 0.774, 0.614, 0.769, 0.742, and 0.757, respectively. SVM also retained high performance in other criteria. The specificity for SVM retained 100{\%} in the external validation set with an accuracy of 88.7{\%}. Conclusions: Our prospective and multicenter study is the first clinical study that supports the GDM diagnosis for pregnant women in resource-limited areas, using only fasting blood glucose value, patients' age, and a smartphone connected to the internet. Our study proved that SVM can achieve accurate diagnosis with less operation cost and higher efficacy. Our study (referred to as GDM-AI study, ie, the study of AI-based diagnosis of GDM) also shows our app has a promising future in improving the quality of maternal health for pregnant women, precision medicine, and long-distance medical care. We recommend future work should expand the dataset scope and replicate the process to validate the performance of the AI algorithms. ", doi="10.2196/21573", url="https://www.jmir.org/2020/9/e21573", url="https://doi.org/10.2196/21573", url="http://www.ncbi.nlm.nih.gov/pubmed/32930674" } @Article{info:doi/10.2196/20701, author="Schachner, Theresa and Keller, Roman and v Wangenheim, Florian", title="Artificial Intelligence-Based Conversational Agents for Chronic Conditions: Systematic Literature Review", journal="J Med Internet Res", year="2020", month="Sep", day="14", volume="22", number="9", pages="e20701", keywords="artificial intelligence; conversational agents; chatbots; healthcare; chronic diseases; systematic literature review", abstract="Background: A rising number of conversational agents or chatbots are equipped with artificial intelligence (AI) architecture. They are increasingly prevalent in health care applications such as those providing education and support to patients with chronic diseases, one of the leading causes of death in the 21st century. AI-based chatbots enable more effective and frequent interactions with such patients. Objective: The goal of this systematic literature review is to review the characteristics, health care conditions, and AI architectures of AI-based conversational agents designed specifically for chronic diseases. Methods: We conducted a systematic literature review using PubMed MEDLINE, EMBASE, PyscInfo, CINAHL, ACM Digital Library, ScienceDirect, and Web of Science. We applied a predefined search strategy using the terms ``conversational agent,'' ``healthcare,'' ``artificial intelligence,'' and their synonyms. We updated the search results using Google alerts, and screened reference lists for other relevant articles. We included primary research studies that involved the prevention, treatment, or rehabilitation of chronic diseases, involved a conversational agent, and included any kind of AI architecture. Two independent reviewers conducted screening and data extraction, and Cohen kappa was used to measure interrater agreement.A narrative approach was applied for data synthesis. Results: The literature search found 2052 articles, out of which 10 papers met the inclusion criteria. The small number of identified studies together with the prevalence of quasi-experimental studies (n=7) and prevailing prototype nature of the chatbots (n=7) revealed the immaturity of the field. The reported chatbots addressed a broad variety of chronic diseases (n=6), showcasing a tendency to develop specialized conversational agents for individual chronic conditions. However, there lacks comparison of these chatbots within and between chronic diseases. In addition, the reported evaluation measures were not standardized, and the addressed health goals showed a large range. Together, these study characteristics complicated comparability and open room for future research. While natural language processing represented the most used AI technique (n=7) and the majority of conversational agents allowed for multimodal interaction (n=6), the identified studies demonstrated broad heterogeneity, lack of depth of reported AI techniques and systems, and inconsistent usage of taxonomy of the underlying AI software, further aggravating comparability and generalizability of study results. Conclusions: The literature on AI-based conversational agents for chronic conditions is scarce and mostly consists of quasi-experimental studies with chatbots in prototype stage that use natural language processing and allow for multimodal user interaction. Future research could profit from evidence-based evaluation of the AI-based conversational agents and comparison thereof within and between different chronic health conditions. Besides increased comparability, the quality of chatbots developed for specific chronic conditions and their subsequent impact on the target patients could be enhanced by more structured development and standardized evaluation processes. ", doi="10.2196/20701", url="http://www.jmir.org/2020/9/e20701/", url="https://doi.org/10.2196/20701", url="http://www.ncbi.nlm.nih.gov/pubmed/32924957" } @Article{info:doi/10.2196/18091, author="Maron, Roman C and Utikal, Jochen S and Hekler, Achim and Hauschild, Axel and Sattler, Elke and Sondermann, Wiebke and Haferkamp, Sebastian and Schilling, Bastian and Heppt, Markus V and Jansen, Philipp and Reinholz, Markus and Franklin, Cindy and Schmitt, Laurenz and Hartmann, Daniela and Krieghoff-Henning, Eva and Schmitt, Max and Weichenthal, Michael and von Kalle, Christof and Fr{\"o}hling, Stefan and Brinker, Titus J", title="Artificial Intelligence and Its Effect on Dermatologists' Accuracy in Dermoscopic Melanoma Image Classification: Web-Based Survey Study", journal="J Med Internet Res", year="2020", month="Sep", day="11", volume="22", number="9", pages="e18091", keywords="artificial intelligence; machine learning; deep learning; neural network; dermatology; diagnosis; nevi; melanoma; skin neoplasm", abstract="Background: Early detection of melanoma can be lifesaving but this remains a challenge. Recent diagnostic studies have revealed the superiority of artificial intelligence (AI) in classifying dermoscopic images of melanoma and nevi, concluding that these algorithms should assist a dermatologist's diagnoses. Objective: The aim of this study was to investigate whether AI support improves the accuracy and overall diagnostic performance of dermatologists in the dichotomous image--based discrimination between melanoma and nevus. Methods: Twelve board-certified dermatologists were presented disjoint sets of 100 unique dermoscopic images of melanomas and nevi (total of 1200 unique images), and they had to classify the images based on personal experience alone (part I) and with the support of a trained convolutional neural network (CNN, part II). Additionally, dermatologists were asked to rate their confidence in their final decision for each image. Results: While the mean specificity of the dermatologists based on personal experience alone remained almost unchanged (70.6{\%} vs 72.4{\%}; P=.54) with AI support, the mean sensitivity and mean accuracy increased significantly (59.4{\%} vs 74.6{\%}; P=.003 and 65.0{\%} vs 73.6{\%}; P=.002, respectively) with AI support. Out of the 10{\%} (10/94; 95{\%} CI 8.4{\%}-11.8{\%}) of cases where dermatologists were correct and AI was incorrect, dermatologists on average changed to the incorrect answer for 39{\%} (4/10; 95{\%} CI 23.2{\%}-55.6{\%}) of cases. When dermatologists were incorrect and AI was correct (25/94, 27{\%}; 95{\%} CI 24.0{\%}-30.1{\%}), dermatologists changed their answers to the correct answer for 46{\%} (11/25; 95{\%} CI 33.1{\%}-58.4{\%}) of cases. Additionally, the dermatologists' average confidence in their decisions increased when the CNN confirmed their decision and decreased when the CNN disagreed, even when the dermatologists were correct. Reported values are based on the mean of all participants. Whenever absolute values are shown, the denominator and numerator are approximations as every dermatologist ended up rating a varying number of images due to a quality control step. Conclusions: The findings of our study show that AI support can improve the overall accuracy of the dermatologists in the dichotomous image--based discrimination between melanoma and nevus. This supports the argument for AI-based tools to aid clinicians in skin lesion classification and provides a rationale for studies of such classifiers in real-life settings, wherein clinicians can integrate additional information such as patient age and medical history into their decisions. ", doi="10.2196/18091", url="https://www.jmir.org/2020/9/e18091", url="https://doi.org/10.2196/18091", url="http://www.ncbi.nlm.nih.gov/pubmed/32915161" } @Article{info:doi/10.2196/19554, author="Wilmink, Gerald and Dupey, Katherine and Alkire, Schon and Grote, Jeffrey and Zobel, Gregory and Fillit, Howard M and Movva, Satish", title="Artificial Intelligence--Powered Digital Health Platform and Wearable Devices Improve Outcomes for Older Adults in Assisted Living Communities: Pilot Intervention Study", journal="JMIR Aging", year="2020", month="Sep", day="10", volume="3", number="2", pages="e19554", keywords="health technology; artificial intelligence; AI; preventive; senior technology; assisted living; long-term services; long-term care providers", abstract="Background: Wearables and artificial intelligence (AI)--powered digital health platforms that utilize machine learning algorithms can autonomously measure a senior's change in activity and behavior and may be useful tools for proactive interventions that target modifiable risk factors. Objective: The goal of this study was to analyze how a wearable device and AI-powered digital health platform could provide improved health outcomes for older adults in assisted living communities. Methods: Data from 490 residents from six assisted living communities were analyzed retrospectively over 24 months. The intervention group (+CP) consisted of 3 communities that utilized CarePredict (n=256), and the control group (--CP) consisted of 3 communities (n=234) that did not utilize CarePredict. The following outcomes were measured and compared to baseline: hospitalization rate, fall rate, length of stay (LOS), and staff response time. Results: The residents of the +CP and --CP communities exhibit no statistical difference in age (P=.64), sex (P=.63), and staff service hours per resident (P=.94). The data show that the +CP communities exhibited a 39{\%} lower hospitalization rate (P=.02), a 69{\%} lower fall rate (P=.01), and a 67{\%} greater length of stay (P=.03) than the --CP communities. The staff alert acknowledgment and reach resident times also improved in the +CP communities by 37{\%} (P=.02) and 40{\%} (P=.02), respectively. Conclusions: The AI-powered digital health platform provides the community staff with actionable information regarding each resident's activities and behavior, which can be used to identify older adults that are at an increased risk for a health decline. Staff can use this data to intervene much earlier, protecting seniors from conditions that left untreated could result in hospitalization. In summary, the use of wearables and AI-powered digital health platform can contribute to improved health outcomes for seniors in assisted living communities. The accuracy of the system will be further validated in a larger trial. ", doi="10.2196/19554", url="http://aging.jmir.org/2020/2/e19554/", url="https://doi.org/10.2196/19554", url="http://www.ncbi.nlm.nih.gov/pubmed/32723711" } @Article{info:doi/10.2196/18142, author="Mohammadi, Ramin and Atif, Mursal and Centi, Amanda Jayne and Agboola, Stephen and Jethwani, Kamal and Kvedar, Joseph and Kamarthi, Sagar", title="Neural Network--Based Algorithm for Adjusting Activity Targets to Sustain Exercise Engagement Among People Using Activity Trackers: Retrospective Observation and Algorithm Development Study", journal="JMIR Mhealth Uhealth", year="2020", month="Sep", day="8", volume="8", number="9", pages="e18142", keywords="activity tracker; exercise engagement; dynamic activity target; neural network; activity target prediction; machine learning", abstract="Background: It is well established that lack of physical activity is detrimental to the overall health of an individual. Modern-day activity trackers enable individuals to monitor their daily activities to meet and maintain targets. This is expected to promote activity encouraging behavior, but the benefits of activity trackers attenuate over time due to waning adherence. One of the key approaches to improving adherence to goals is to motivate individuals to improve on their historic performance metrics. Objective: The aim of this work was to build a machine learning model to predict an achievable weekly activity target by considering (1) patterns in the user's activity tracker data in the previous week and (2) behavior and environment characteristics. By setting realistic goals, ones that are neither too easy nor too difficult to achieve, activity tracker users can be encouraged to continue to meet these goals, and at the same time, to find utility in their activity tracker. Methods: We built a neural network model that prescribes a weekly activity target for an individual that can be realistically achieved. The inputs to the model were user-specific personal, social, and environmental factors, daily step count from the previous 7 days, and an entropy measure that characterized the pattern of daily step count. Data for training and evaluating the machine learning model were collected over a duration of 9 weeks. Results: Of 30 individuals who were enrolled, data from 20 participants were used. The model predicted target daily count with a mean absolute error of 1545 (95{\%} CI 1383-1706) steps for an 8-week period. Conclusions: Artificial intelligence applied to physical activity data combined with behavioral data can be used to set personalized goals in accordance with the individual's level of activity and thereby improve adherence to a fitness tracker; this could be used to increase engagement with activity trackers. A follow-up prospective study is ongoing to determine the performance of the engagement algorithm. ", doi="10.2196/18142", url="https://mhealth.jmir.org/2020/9/e18142", url="https://doi.org/10.2196/18142", url="http://www.ncbi.nlm.nih.gov/pubmed/32897235" } @Article{info:doi/10.2196/18930, author="Entezarjou, Artin and Bonamy, Anna-Karin Edstedt and Benjaminsson, Simon and Herman, Pawel and Midl{\"o}v, Patrik", title="Human- Versus Machine Learning--Based Triage Using Digitalized Patient Histories in Primary Care: Comparative Study", journal="JMIR Med Inform", year="2020", month="Sep", day="3", volume="8", number="9", pages="e18930", keywords="machine learning; artificial intelligence; decision support; primary care; triage", abstract="Background: Smartphones have made it possible for patients to digitally report symptoms before physical primary care visits. Using machine learning (ML), these data offer an opportunity to support decisions about the appropriate level of care (triage). Objective: The purpose of this study was to explore the interrater reliability between human physicians and an automated ML-based triage method. Methods: After testing several models, a na{\"i}ve Bayes triage model was created using data from digital medical histories, capable of classifying digital medical history reports as either in need of urgent physical examination or not in need of urgent physical examination. The model was tested on 300 digital medical history reports and classification was compared with the majority vote of an expert panel of 5 primary care physicians (PCPs). Reliability between raters was measured using both Cohen $\kappa$ (adjusted for chance agreement) and percentage agreement (not adjusted for chance agreement). Results: Interrater reliability as measured by Cohen $\kappa$ was 0.17 when comparing the majority vote of the reference group with the model. Agreement was 74{\%} (138/186) for cases judged not in need of urgent physical examination and 42{\%} (38/90) for cases judged to be in need of urgent physical examination. No specific features linked to the model's triage decision could be identified. Between physicians within the panel, Cohen $\kappa$ was 0.2. Intrarater reliability when 1 physician retriaged 50 reports resulted in Cohen $\kappa$ of 0.55. Conclusions: Low interrater and intrarater agreement in triage decisions among PCPs limits the possibility to use human decisions as a reference for ML to automate triage in primary care. ", doi="10.2196/18930", url="https://medinform.jmir.org/2020/9/e18930", url="https://doi.org/10.2196/18930", url="http://www.ncbi.nlm.nih.gov/pubmed/32880578" } @Article{info:doi/10.2196/19348, author="Birnbaum, Michael Leo and Kulkarni, Prathamesh ``Param'' and Van Meter, Anna and Chen, Victor and Rizvi, Asra F and Arenare, Elizabeth and De Choudhury, Munmun and Kane, John M", title="Utilizing Machine Learning on Internet Search Activity to Support the Diagnostic Process and Relapse Detection in Young Individuals With Early Psychosis: Feasibility Study", journal="JMIR Ment Health", year="2020", month="Sep", day="1", volume="7", number="9", pages="e19348", keywords="schizophrenia spectrum disorders; internet search activity; Google; diagnostic prediction; relapse prediction; machine learning; digital data; digital phenotyping; digital biomarkers", abstract="Background: Psychiatry is nearly entirely reliant on patient self-reporting, and there are few objective and reliable tests or sources of collateral information available to help diagnostic and assessment procedures. Technology offers opportunities to collect objective digital data to complement patient experience and facilitate more informed treatment decisions. Objective: We aimed to develop computational algorithms based on internet search activity designed to support diagnostic procedures and relapse identification in individuals with schizophrenia spectrum disorders. Methods: We extracted 32,733 time-stamped search queries across 42 participants with schizophrenia spectrum disorders and 74 healthy volunteers between the ages of 15 and 35 (mean 24.4 years, 44.0{\%} male), and built machine-learning diagnostic and relapse classifiers utilizing the timing, frequency, and content of online search activity. Results: Classifiers predicted a diagnosis of schizophrenia spectrum disorders with an area under the curve value of 0.74 and predicted a psychotic relapse in individuals with schizophrenia spectrum disorders with an area under the curve of 0.71. Compared with healthy participants, those with schizophrenia spectrum disorders made fewer searches and their searches consisted of fewer words. Prior to a relapse hospitalization, participants with schizophrenia spectrum disorders were more likely to use words related to hearing, perception, and anger, and were less likely to use words related to health. Conclusions: Online search activity holds promise for gathering objective and easily accessed indicators of psychiatric symptoms. Utilizing search activity as collateral behavioral health information would represent a major advancement in efforts to capitalize on objective digital data to improve mental health monitoring. ", doi="10.2196/19348", url="https://mental.jmir.org/2020/9/e19348", url="https://doi.org/10.2196/19348", url="http://www.ncbi.nlm.nih.gov/pubmed/32870161" } @Article{info:doi/10.2196/21056, author="Harada, Yukinori and Shimizu, Taro", title="Impact of a Commercial Artificial Intelligence--Driven Patient Self-Assessment Solution on Waiting Times at General Internal Medicine Outpatient Departments: Retrospective Study", journal="JMIR Med Inform", year="2020", month="Aug", day="31", volume="8", number="8", pages="e21056", keywords="artificial intelligence; automated medical history taking system; eHealth; interrupted time-series analysis; waiting time", abstract="Background: Patient waiting time at outpatient departments is directly related to patient satisfaction and quality of care, particularly in patients visiting the general internal medicine outpatient departments for the first time. Moreover, reducing wait time from arrival in the clinic to the initiation of an examination is key to reducing patients' anxiety. The use of automated medical history--taking systems in general internal medicine outpatient departments is a promising strategy to reduce waiting times. Recently, Ubie Inc in Japan developed AI Monshin, an artificial intelligence--based, automated medical history--taking system for general internal medicine outpatient departments. Objective: We hypothesized that replacing the use of handwritten self-administered questionnaires with the use of AI Monshin would reduce waiting times in general internal medicine outpatient departments. Therefore, we conducted this study to examine whether the use of AI Monshin reduced patient waiting times. Methods: We retrospectively analyzed the waiting times of patients visiting the general internal medicine outpatient department at a Japanese community hospital without an appointment from April 2017 to April 2020. AI Monshin was implemented in April 2019. We compared the median waiting time before and after implementation by conducting an interrupted time-series analysis of the median waiting time per month. We also conducted supplementary analyses to explain the main results. Results: We analyzed 21,615 visits. The median waiting time after AI Monshin implementation (74.4 minutes, IQR 57.1) was not significantly different from that before AI Monshin implementation (74.3 minutes, IQR 63.7) (P=.12). In the interrupted time-series analysis, the underlying linear time trend (--0.4 minutes per month; P=.06; 95{\%} CI --0.9 to 0.02), level change (40.6 minutes; P=.09; 95{\%} CI --5.8 to 87.0), and slope change (--1.1 minutes per month; P=.16; 95{\%} CI --2.7 to 0.4) were not statistically significant. In a supplemental analysis of data from 9054 of 21,615 visits (41.9{\%}), the median examination time after AI Monshin implementation (6.0 minutes, IQR 5.2) was slightly but significantly longer than that before AI Monshin implementation (5.7 minutes, IQR 5.0) (P=.003). Conclusions: The implementation of an artificial intelligence--based, automated medical history--taking system did not reduce waiting time for patients visiting the general internal medicine outpatient department without an appointment, and there was a slight increase in the examination time after implementation; however, the system may have enhanced the quality of care by supporting the optimization of staff assignments. ", doi="10.2196/21056", url="http://medinform.jmir.org/2020/8/e21056/", url="https://doi.org/10.2196/21056", url="http://www.ncbi.nlm.nih.gov/pubmed/32865504" } @Article{info:doi/10.2196/19962, author="Adler, Daniel A and Ben-Zeev, Dror and Tseng, Vincent W-S and Kane, John M and Brian, Rachel and Campbell, Andrew T and Hauser, Marta and Scherer, Emily A and Choudhury, Tanzeem", title="Predicting Early Warning Signs of Psychotic Relapse From Passive Sensing Data: An Approach Using Encoder-Decoder Neural Networks", journal="JMIR Mhealth Uhealth", year="2020", month="Aug", day="31", volume="8", number="8", pages="e19962", keywords="psychotic disorders; schizophrenia; mHealth; mental health; mobile health; smartphone applications; machine learning; passive sensing; digital biomarkers; digital phenotyping; artificial intelligence; deep learning; mobile phone", abstract="Background: Schizophrenia spectrum disorders (SSDs) are chronic conditions, but the severity of symptomatic experiences and functional impairments vacillate over the course of illness. Developing unobtrusive remote monitoring systems to detect early warning signs of impending symptomatic relapses would allow clinicians to intervene before the patient's condition worsens. Objective: In this study, we aim to create the first models, exclusively using passive sensing data from a smartphone, to predict behavioral anomalies that could indicate early warning signs of a psychotic relapse. Methods: Data used to train and test the models were collected during the CrossCheck study. Hourly features derived from smartphone passive sensing data were extracted from 60 patients with SSDs (42 nonrelapse and 18 relapse >1 time throughout the study) and used to train models and test performance. We trained 2 types of encoder-decoder neural network models and a clustering-based local outlier factor model to predict behavioral anomalies that occurred within the 30-day period before a participant's date of relapse (the near relapse period). Models were trained to recreate participant behavior on days of relative health (DRH, outside of the near relapse period), following which a threshold to the recreation error was applied to predict anomalies. The neural network model architecture and the percentage of relapse participant data used to train all models were varied. Results: A total of 20,137 days of collected data were analyzed, with 726 days of data (0.037{\%}) within any 30-day near relapse period. The best performing model used a fully connected neural network autoencoder architecture and achieved a median sensitivity of 0.25 (IQR 0.15-1.00) and specificity of 0.88 (IQR 0.14-0.96; a median 108{\%} increase in behavioral anomalies near relapse). We conducted a post hoc analysis using the best performing model to identify behavioral features that had a medium-to-large effect (Cohen d>0.5) in distinguishing anomalies near relapse from DRH among 4 participants who relapsed multiple times throughout the study. Qualitative validation using clinical notes collected during the original CrossCheck study showed that the identified features from our analysis were presented to clinicians during relapse events. Conclusions: Our proposed method predicted a higher rate of anomalies in patients with SSDs within the 30-day near relapse period and can be used to uncover individual-level behaviors that change before relapse. This approach will enable technologists and clinicians to build unobtrusive digital mental health tools that can predict incipient relapse in SSDs. ", doi="10.2196/19962", url="https://mhealth.jmir.org/2020/8/e19962", url="https://doi.org/10.2196/19962", url="http://www.ncbi.nlm.nih.gov/pubmed/32865506" } @Article{info:doi/10.2196/19870, author="Shen, Xiao and Wang, Guanjin and Kwan, Rick Yiu-Cho and Choi, Kup-Sze", title="Using Dual Neural Network Architecture to Detect the Risk of Dementia With Community Health Data: Algorithm Development and Validation Study", journal="JMIR Med Inform", year="2020", month="Aug", day="31", volume="8", number="8", pages="e19870", keywords="cognitive screening; dementia risk; dual neural network; predictive models; primary care", abstract="Background: Recent studies have revealed lifestyle behavioral risk factors that can be modified to reduce the risk of dementia. As modification of lifestyle takes time, early identification of people with high dementia risk is important for timely intervention and support. As cognitive impairment is a diagnostic criterion of dementia, cognitive assessment tools are used in primary care to screen for clinically unevaluated cases. Among them, Mini-Mental State Examination (MMSE) is a very common instrument. However, MMSE is a questionnaire that is administered when symptoms of memory decline have occurred. Early administration at the asymptomatic stage and repeated measurements would lead to a practice effect that degrades the effectiveness of MMSE when it is used at later stages. Objective: The aim of this study was to exploit machine learning techniques to assist health care professionals in detecting high-risk individuals by predicting the results of MMSE using elderly health data collected from community-based primary care services. Methods: A health data set of 2299 samples was adopted in the study. The input data were divided into two groups of different characteristics (ie, client profile data and health assessment data). The predictive output was the result of two-class classification of the normal and high-risk cases that were defined based on MMSE. A dual neural network (DNN) model was proposed to obtain the latent representations of the two groups of input data separately, which were then concatenated for the two-class classification. Mean and k-nearest neighbor were used separately to tackle missing data, whereas a cost-sensitive learning (CSL) algorithm was proposed to deal with class imbalance. The performance of the DNN was evaluated by comparing it with that of conventional machine learning methods. Results: A total of 16 predictive models were built using the elderly health data set. Among them, the proposed DNN with CSL outperformed in the detection of high-risk cases. The area under the receiver operating characteristic curve, average precision, sensitivity, and specificity reached 0.84, 0.88, 0.73, and 0.80, respectively. Conclusions: The proposed method has the potential to serve as a tool to screen for elderly people with cognitive impairment and predict high-risk cases of dementia at the asymptomatic stage, providing health care professionals with early signals that can prompt suggestions for a follow-up or a detailed diagnosis. ", doi="10.2196/19870", url="https://medinform.jmir.org/2020/8/e19870", url="https://doi.org/10.2196/19870", url="http://www.ncbi.nlm.nih.gov/pubmed/32865498" } @Article{info:doi/10.2196/19918, author="Lee, Joon", title="Is Artificial Intelligence Better Than Human Clinicians in Predicting Patient Outcomes?", journal="J Med Internet Res", year="2020", month="Aug", day="26", volume="22", number="8", pages="e19918", keywords="patient outcome prediction; artificial intelligence; machine learning; human-generated predictions; human-AI symbiosis", doi="10.2196/19918", url="http://www.jmir.org/2020/8/e19918/", url="https://doi.org/10.2196/19918", url="http://www.ncbi.nlm.nih.gov/pubmed/32845249" } @Article{info:doi/10.2196/20794, author="Mackey, Tim Ken and Li, Jiawei and Purushothaman, Vidya and Nali, Matthew and Shah, Neal and Bardier, Cortni and Cai, Mingxiang and Liang, Bryan", title="Big Data, Natural Language Processing, and Deep Learning to Detect and Characterize Illicit COVID-19 Product Sales: Infoveillance Study on Twitter and Instagram", journal="JMIR Public Health Surveill", year="2020", month="Aug", day="25", volume="6", number="3", pages="e20794", keywords="COVID-19; coronavirus; infectious disease; social media; surveillance; infoveillance; infodemiology; infodemic; fraud; cybercrime", abstract="Background: The coronavirus disease (COVID-19) pandemic is perhaps the greatest global health challenge of the last century. Accompanying this pandemic is a parallel ``infodemic,'' including the online marketing and sale of unapproved, illegal, and counterfeit COVID-19 health products including testing kits, treatments, and other questionable ``cures.'' Enabling the proliferation of this content is the growing ubiquity of internet-based technologies, including popular social media platforms that now have billions of global users. Objective: This study aims to collect, analyze, identify, and enable reporting of suspected fake, counterfeit, and unapproved COVID-19--related health care products from Twitter and Instagram. Methods: This study is conducted in two phases beginning with the collection of COVID-19--related Twitter and Instagram posts using a combination of web scraping on Instagram and filtering the public streaming Twitter application programming interface for keywords associated with suspect marketing and sale of COVID-19 products. The second phase involved data analysis using natural language processing (NLP) and deep learning to identify potential sellers that were then manually annotated for characteristics of interest. We also visualized illegal selling posts on a customized data dashboard to enable public health intelligence. Results: We collected a total of 6,029,323 tweets and 204,597 Instagram posts filtered for terms associated with suspect marketing and sale of COVID-19 health products from March to April for Twitter and February to May for Instagram. After applying our NLP and deep learning approaches, we identified 1271 tweets and 596 Instagram posts associated with questionable sales of COVID-19--related products. Generally, product introduction came in two waves, with the first consisting of questionable immunity-boosting treatments and a second involving suspect testing kits. We also detected a low volume of pharmaceuticals that have not been approved for COVID-19 treatment. Other major themes detected included products offered in different languages, various claims of product credibility, completely unsubstantiated products, unapproved testing modalities, and different payment and seller contact methods. Conclusions: Results from this study provide initial insight into one front of the ``infodemic'' fight against COVID-19 by characterizing what types of health products, selling claims, and types of sellers were active on two popular social media platforms at earlier stages of the pandemic. This cybercrime challenge is likely to continue as the pandemic progresses and more people seek access to COVID-19 testing and treatment. This data intelligence can help public health agencies, regulatory authorities, legitimate manufacturers, and technology platforms better remove and prevent this content from harming the public. ", doi="10.2196/20794", url="http://publichealth.jmir.org/2020/3/e20794/", url="https://doi.org/10.2196/20794", url="http://www.ncbi.nlm.nih.gov/pubmed/32750006" } @Article{info:doi/10.2196/18189, author="Xie, Bo and Tao, Cui and Li, Juan and Hilsabeck, Robin C and Aguirre, Alyssa", title="Artificial Intelligence for Caregivers of Persons With Alzheimer's Disease and Related Dementias: Systematic Literature Review", journal="JMIR Med Inform", year="2020", month="Aug", day="20", volume="8", number="8", pages="e18189", keywords="Alzheimer disease; dementia; caregiving; technology; artificial intelligence", abstract="Background: Artificial intelligence (AI) has great potential for improving the care of persons with Alzheimer's disease and related dementias (ADRD) and the quality of life of their family caregivers. To date, however, systematic review of the literature on the impact of AI on ADRD management has been lacking. Objective: This paper aims to (1) identify and examine literature on AI that provides information to facilitate ADRD management by caregivers of individuals diagnosed with ADRD and (2) identify gaps in the literature that suggest future directions for research. Methods: Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines for conducting systematic literature reviews, during August and September 2019, we performed 3 rounds of selection. First, we searched predetermined keywords in PubMed, Cumulative Index to Nursing and Allied Health Literature Plus with Full Text, PsycINFO, IEEE Xplore Digital Library, and the ACM Digital Library. This step generated 113 nonduplicate results. Next, we screened the titles and abstracts of the 113 papers according to inclusion and exclusion criteria, after which 52 papers were excluded and 61 remained. Finally, we screened the full text of the remaining papers to ensure that they met the inclusion or exclusion criteria; 31 papers were excluded, leaving a final sample of 30 papers for analysis. Results: Of the 30 papers, 20 reported studies that focused on using AI to assist in activities of daily living. A limited number of specific daily activities were targeted. The studies' aims suggested three major purposes: (1) to test the feasibility, usability, or perceptions of prototype AI technology; (2) to generate preliminary data on the technology's performance (primarily accuracy in detecting target events, such as falls); and (3) to understand user needs and preferences for the design and functionality of to-be-developed technology. The majority of the studies were qualitative, with interviews, focus groups, and observation being their most common methods. Cross-sectional surveys were also common, but with small convenience samples. Sample sizes ranged from 6 to 106, with the vast majority on the low end. The majority of the studies were descriptive, exploratory, and lacking theoretical guidance. Many studies reported positive outcomes in favor of their AI technology's feasibility and satisfaction; some studies reported mixed results on these measures. Performance of the technology varied widely across tasks. Conclusions: These findings call for more systematic designs and evaluations of the feasibility and efficacy of AI-based interventions for caregivers of people with ADRD. These gaps in the research would be best addressed through interdisciplinary collaboration, incorporating complementary expertise from the health sciences and computer science/engineering--related fields. ", doi="10.2196/18189", url="http://medinform.jmir.org/2020/8/e18189/", url="https://doi.org/10.2196/18189", url="http://www.ncbi.nlm.nih.gov/pubmed/32663146" } @Article{info:doi/10.2196/22590, author="Hung, Man and Lauren, Evelyn and Hon, Eric S and Birmingham, Wendy C and Xu, Julie and Su, Sharon and Hon, Shirley D and Park, Jungweon and Dang, Peter and Lipsky, Martin S", title="Social Network Analysis of COVID-19 Sentiments: Application of Artificial Intelligence", journal="J Med Internet Res", year="2020", month="Aug", day="18", volume="22", number="8", pages="e22590", keywords="COVID-19; coronavirus; sentiment; social network; Twitter; infodemiology; infodemic; pandemic; crisis; public health; business economy; artificial intelligence", abstract="Background: The coronavirus disease (COVID-19) pandemic led to substantial public discussion. Understanding these discussions can help institutions, governments, and individuals navigate the pandemic. Objective: The aim of this study is to analyze discussions on Twitter related to COVID-19 and to investigate the sentiments toward COVID-19. Methods: This study applied machine learning methods in the field of artificial intelligence to analyze data collected from Twitter. Using tweets originating exclusively in the United States and written in English during the 1-month period from March 20 to April 19, 2020, the study examined COVID-19--related discussions. Social network and sentiment analyses were also conducted to determine the social network of dominant topics and whether the tweets expressed positive, neutral, or negative sentiments. Geographic analysis of the tweets was also conducted. Results: There were a total of 14,180,603 likes, 863,411 replies, 3,087,812 retweets, and 641,381 mentions in tweets during the study timeframe. Out of 902,138 tweets analyzed, sentiment analysis classified 434,254 (48.2{\%}) tweets as having a positive sentiment, 187,042 (20.7{\%}) as neutral, and 280,842 (31.1{\%}) as negative. The study identified 5 dominant themes among COVID-19--related tweets: health care environment, emotional support, business economy, social change, and psychological stress. Alaska, Wyoming, New Mexico, Pennsylvania, and Florida were the states expressing the most negative sentiment while Vermont, North Dakota, Utah, Colorado, Tennessee, and North Carolina conveyed the most positive sentiment. Conclusions: This study identified 5 prevalent themes of COVID-19 discussion with sentiments ranging from positive to negative. These themes and sentiments can clarify the public's response to COVID-19 and help officials navigate the pandemic. ", doi="10.2196/22590", url="http://www.jmir.org/2020/8/e22590/", url="https://doi.org/10.2196/22590", url="http://www.ncbi.nlm.nih.gov/pubmed/32750001" } @Article{info:doi/10.2196/20007, author="Michelson, Matthew and Chow, Tiffany and Martin, Neil A and Ross, Mike and Tee Qiao Ying, Amelia and Minton, Steven", title="Artificial Intelligence for Rapid Meta-Analysis: Case Study on Ocular Toxicity of Hydroxychloroquine", journal="J Med Internet Res", year="2020", month="Aug", day="17", volume="22", number="8", pages="e20007", keywords="meta-analysis; rapid meta-analysis; artificial intelligence; drug; analysis; hydroxychloroquine; toxic; COVID-19; treatment; side effect; ocular; eye", abstract="Background: Rapid access to evidence is crucial in times of an evolving clinical crisis. To that end, we propose a novel approach to answer clinical queries, termed rapid meta-analysis (RMA). Unlike traditional meta-analysis, RMA balances a quick time to production with reasonable data quality assurances, leveraging artificial intelligence (AI) to strike this balance. Objective: We aimed to evaluate whether RMA can generate meaningful clinical insights, but crucially, in a much faster processing time than traditional meta-analysis, using a relevant, real-world example. Methods: The development of our RMA approach was motivated by a currently relevant clinical question: is ocular toxicity and vision compromise a side effect of hydroxychloroquine therapy? At the time of designing this study, hydroxychloroquine was a leading candidate in the treatment of coronavirus disease (COVID-19). We then leveraged AI to pull and screen articles, automatically extract their results, review the studies, and analyze the data with standard statistical methods. Results: By combining AI with human analysis in our RMA, we generated a meaningful, clinical result in less than 30 minutes. The RMA identified 11 studies considering ocular toxicity as a side effect of hydroxychloroquine and estimated the incidence to be 3.4{\%} (95{\%} CI 1.11{\%}-9.96{\%}). The heterogeneity across individual study findings was high, which should be taken into account in interpretation of the result. Conclusions: We demonstrate that a novel approach to meta-analysis using AI can generate meaningful clinical insights in a much shorter time period than traditional meta-analysis. ", doi="10.2196/20007", url="http://www.jmir.org/2020/8/e20007/", url="https://doi.org/10.2196/20007", url="http://www.ncbi.nlm.nih.gov/pubmed/32804086" } @Article{info:doi/10.2196/17211, author="Iqbal, Usman and Celi, Leo Anthony and Li, Yu-Chuan Jack", title="How Can Artificial Intelligence Make Medicine More Preemptive?", journal="J Med Internet Res", year="2020", month="Aug", day="11", volume="22", number="8", pages="e17211", keywords="artificial intelligence; digital health; eHealth; health care technology; medical innovations; health information technology; advanced care systems", doi="10.2196/17211", url="https://www.jmir.org/2020/8/e17211", url="https://doi.org/10.2196/17211", url="http://www.ncbi.nlm.nih.gov/pubmed/32780024" } @Article{info:doi/10.2196/19104, author="Adly, Aya Sedky and Adly, Afnan Sedky and Adly, Mahmoud Sedky", title="Approaches Based on Artificial Intelligence and the Internet of Intelligent Things to Prevent the Spread of COVID-19: Scoping Review", journal="J Med Internet Res", year="2020", month="Aug", day="10", volume="22", number="8", pages="e19104", keywords="SARS-CoV-2; COVID-19; novel coronavirus; artificial intelligence; internet of things; telemedicine; machine learning; modeling; simulation; robotics", abstract="Background: Artificial intelligence (AI) and the Internet of Intelligent Things (IIoT) are promising technologies to prevent the concerningly rapid spread of coronavirus disease (COVID-19) and to maximize safety during the pandemic. With the exponential increase in the number of COVID-19 patients, it is highly possible that physicians and health care workers will not be able to treat all cases. Thus, computer scientists can contribute to the fight against COVID-19 by introducing more intelligent solutions to achieve rapid control of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus that causes the disease. Objective: The objectives of this review were to analyze the current literature, discuss the applicability of reported ideas for using AI to prevent and control COVID-19, and build a comprehensive view of how current systems may be useful in particular areas. This may be of great help to many health care administrators, computer scientists, and policy makers worldwide. Methods: We conducted an electronic search of articles in the MEDLINE, Google Scholar, Embase, and Web of Knowledge databases to formulate a comprehensive review that summarizes different categories of the most recently reported AI-based approaches to prevent and control the spread of COVID-19. Results: Our search identified the 10 most recent AI approaches that were suggested to provide the best solutions for maximizing safety and preventing the spread of COVID-19. These approaches included detection of suspected cases, large-scale screening, monitoring, interactions with experimental therapies, pneumonia screening, use of the IIoT for data and information gathering and integration, resource allocation, predictions, modeling and simulation, and robotics for medical quarantine. Conclusions: We found few or almost no studies regarding the use of AI to examine COVID-19 interactions with experimental therapies, the use of AI for resource allocation to COVID-19 patients, or the use of AI and the IIoT for COVID-19 data and information gathering/integration. Moreover, the adoption of other approaches, including use of AI for COVID-19 prediction, use of AI for COVID-19 modeling and simulation, and use of AI robotics for medical quarantine, should be further emphasized by researchers because these important approaches lack sufficient numbers of studies. Therefore, we recommend that computer scientists focus on these approaches, which are still not being adequately addressed. ", doi="10.2196/19104", url="https://www.jmir.org/2020/8/e19104", url="https://doi.org/10.2196/19104", url="http://www.ncbi.nlm.nih.gov/pubmed/32584780" } @Article{info:doi/10.2196/17158, author="Tudor Car, Lorainne and Dhinagaran, Dhakshenya Ardhithy and Kyaw, Bhone Myint and Kowatsch, Tobias and Joty, Shafiq and Theng, Yin-Leng and Atun, Rifat", title="Conversational Agents in Health Care: Scoping Review and Conceptual Analysis", journal="J Med Internet Res", year="2020", month="Aug", day="7", volume="22", number="8", pages="e17158", keywords="conversational agents; chatbots; artificial intelligence; machine learning; mobile phone; health care; scoping review", abstract="Background: Conversational agents, also known as chatbots, are computer programs designed to simulate human text or verbal conversations. They are increasingly used in a range of fields, including health care. By enabling better accessibility, personalization, and efficiency, conversational agents have the potential to improve patient care. Objective: This study aimed to review the current applications, gaps, and challenges in the literature on conversational agents in health care and provide recommendations for their future research, design, and application. Methods: We performed a scoping review. A broad literature search was performed in MEDLINE (Medical Literature Analysis and Retrieval System Online; Ovid), EMBASE (Excerpta Medica database; Ovid), PubMed, Scopus, and Cochrane Central with the search terms ``conversational agents,'' ``conversational AI,'' ``chatbots,'' and associated synonyms. We also searched the gray literature using sources such as the OCLC (Online Computer Library Center) WorldCat database and ResearchGate in April 2019. Reference lists of relevant articles were checked for further articles. Screening and data extraction were performed in parallel by 2 reviewers. The included evidence was analyzed narratively by employing the principles of thematic analysis. Results: The literature search yielded 47 study reports (45 articles and 2 ongoing clinical trials) that matched the inclusion criteria. The identified conversational agents were largely delivered via smartphone apps (n=23) and used free text only as the main input (n=19) and output (n=30) modality. Case studies describing chatbot development (n=18) were the most prevalent, and only 11 randomized controlled trials were identified. The 3 most commonly reported conversational agent applications in the literature were treatment and monitoring, health care service support, and patient education. Conclusions: The literature on conversational agents in health care is largely descriptive and aimed at treatment and monitoring and health service support. It mostly reports on text-based, artificial intelligence--driven, and smartphone app--delivered conversational agents. There is an urgent need for a robust evaluation of diverse health care conversational agents' formats, focusing on their acceptability, safety, and effectiveness. ", doi="10.2196/17158", url="http://www.jmir.org/2020/8/e17158/", url="https://doi.org/10.2196/17158", url="http://www.ncbi.nlm.nih.gov/pubmed/32763886" } @Article{info:doi/10.2196/15394, author="Cheng, Hao-Yuan and Wu, Yu-Chun and Lin, Min-Hau and Liu, Yu-Lun and Tsai, Yue-Yang and Wu, Jo-Hua and Pan, Ke-Han and Ke, Chih-Jung and Chen, Chiu-Mei and Liu, Ding-Ping and Lin, I-Feng and Chuang, Jen-Hsiang", title="Applying Machine Learning Models with An Ensemble Approach for Accurate Real-Time Influenza Forecasting in Taiwan: Development and Validation Study", journal="J Med Internet Res", year="2020", month="Aug", day="5", volume="22", number="8", pages="e15394", keywords="influenza; Influenza-like illness; forecasting; machine learning; artificial intelligence; epidemic forecasting; surveillance", abstract="Background: Changeful seasonal influenza activity in subtropical areas such as Taiwan causes problems in epidemic preparedness. The Taiwan Centers for Disease Control has maintained real-time national influenza surveillance systems since 2004. Except for timely monitoring, epidemic forecasting using the national influenza surveillance data can provide pivotal information for public health response. Objective: We aimed to develop predictive models using machine learning to provide real-time influenza-like illness forecasts. Methods: Using surveillance data of influenza-like illness visits from emergency departments (from the Real-Time Outbreak and Disease Surveillance System), outpatient departments (from the National Health Insurance database), and the records of patients with severe influenza with complications (from the National Notifiable Disease Surveillance System), we developed 4 machine learning models (autoregressive integrated moving average, random forest, support vector regression, and extreme gradient boosting) to produce weekly influenza-like illness predictions for a given week and 3 subsequent weeks. We established a framework of the machine learning models and used an ensemble approach called stacking to integrate these predictions. We trained the models using historical data from 2008-2014. We evaluated their predictive ability during 2015-2017 for each of the 4-week time periods using Pearson correlation, mean absolute percentage error (MAPE), and hit rate of trend prediction. A dashboard website was built to visualize the forecasts, and the results of real-world implementation of this forecasting framework in 2018 were evaluated using the same metrics. Results: All models could accurately predict the timing and magnitudes of the seasonal peaks in the then-current week (nowcast) ($\rho$=0.802-0.965; MAPE: 5.2{\%}-9.2{\%}; hit rate: 0.577-0.756), 1-week ($\rho$=0.803-0.918; MAPE: 8.3{\%}-11.8{\%}; hit rate: 0.643-0.747), 2-week ($\rho$=0.783-0.867; MAPE: 10.1{\%}-15.3{\%}; hit rate: 0.669-0.734), and 3-week forecasts ($\rho$=0.676-0.801; MAPE: 12.0{\%}-18.9{\%}; hit rate: 0.643-0.786), especially the ensemble model. In real-world implementation in 2018, the forecasting performance was still accurate in nowcasts ($\rho$=0.875-0.969; MAPE: 5.3{\%}-8.0{\%}; hit rate: 0.582-0.782) and remained satisfactory in 3-week forecasts ($\rho$=0.721-0.908; MAPE: 7.6{\%}-13.5{\%}; hit rate: 0.596-0.904). Conclusions: This machine learning and ensemble approach can make accurate, real-time influenza-like illness forecasts for a 4-week period, and thus, facilitate decision making. ", doi="10.2196/15394", url="https://www.jmir.org/2020/8/e15394", url="https://doi.org/10.2196/15394", url="http://www.ncbi.nlm.nih.gov/pubmed/32755888" } @Article{info:doi/10.2196/18228, author="Guo, Yuqi and Hao, Zhichao and Zhao, Shichong and Gong, Jiaqi and Yang, Fan", title="Artificial Intelligence in Health Care: Bibliometric Analysis", journal="J Med Internet Res", year="2020", month="Jul", day="29", volume="22", number="7", pages="e18228", keywords="health care; artificial intelligence; bibliometric analysis; telehealth; neural networks; machine learning", abstract="Background: As a critical driving power to promote health care, the health care--related artificial intelligence (AI) literature is growing rapidly. Objective: The purpose of this analysis is to provide a dynamic and longitudinal bibliometric analysis of health care--related AI publications. Methods: The Web of Science (Clarivate PLC) was searched to retrieve all existing and highly cited AI-related health care research papers published in English up to December 2019. Based on bibliometric indicators, a search strategy was developed to screen the title for eligibility, using the abstract and full text where needed. The growth rate of publications, characteristics of research activities, publication patterns, and research hotspot tendencies were computed using the HistCite software. Results: The search identified 5235 hits, of which 1473 publications were included in the analyses. Publication output increased an average of 17.02{\%} per year since 1995, but the growth rate of research papers significantly increased to 45.15{\%} from 2014 to 2019. The major health problems studied in AI research are cancer, depression, Alzheimer disease, heart failure, and diabetes. Artificial neural networks, support vector machines, and convolutional neural networks have the highest impact on health care. Nucleosides, convolutional neural networks, and tumor markers have remained research hotspots through 2019. Conclusions: This analysis provides a comprehensive overview of the AI-related research conducted in the field of health care, which helps researchers, policy makers, and practitioners better understand the development of health care--related AI research and possible practice implications. Future AI research should be dedicated to filling in the gaps between AI health care research and clinical applications. ", doi="10.2196/18228", url="http://www.jmir.org/2020/7/e18228/", url="https://doi.org/10.2196/18228", url="http://www.ncbi.nlm.nih.gov/pubmed/32723713" } @Article{info:doi/10.2196/18082, author="Barata, Filipe and Tinschert, Peter and Rassouli, Frank and Steurer-Stey, Claudia and Fleisch, Elgar and Puhan, Milo Alan and Brutsche, Martin and Kotz, David and Kowatsch, Tobias", title="Automatic Recognition, Segmentation, and Sex Assignment of Nocturnal Asthmatic Coughs and Cough Epochs in Smartphone Audio Recordings: Observational Field Study", journal="J Med Internet Res", year="2020", month="Jul", day="14", volume="22", number="7", pages="e18082", keywords="asthma; cough recognition; cough segmentation; sex assignment; deep learning; smartphone; mobile phone", abstract="Background: Asthma is one of the most prevalent chronic respiratory diseases. Despite increased investment in treatment, little progress has been made in the early recognition and treatment of asthma exacerbations over the last decade. Nocturnal cough monitoring may provide an opportunity to identify patients at risk for imminent exacerbations. Recently developed approaches enable smartphone-based cough monitoring. These approaches, however, have not undergone longitudinal overnight testing nor have they been specifically evaluated in the context of asthma. Also, the problem of distinguishing partner coughs from patient coughs when two or more people are sleeping in the same room using contact-free audio recordings remains unsolved. Objective: The objective of this study was to evaluate the automatic recognition and segmentation of nocturnal asthmatic coughs and cough epochs in smartphone-based audio recordings that were collected in the field. We also aimed to distinguish partner coughs from patient coughs in contact-free audio recordings by classifying coughs based on sex. Methods: We used a convolutional neural network model that we had developed in previous work for automated cough recognition. We further used techniques (such as ensemble learning, minibatch balancing, and thresholding) to address the imbalance in the data set. We evaluated the classifier in a classification task and a segmentation task. The cough-recognition classifier served as the basis for the cough-segmentation classifier from continuous audio recordings. We compared automated cough and cough-epoch counts to human-annotated cough and cough-epoch counts. We employed Gaussian mixture models to build a classifier for cough and cough-epoch signals based on sex. Results: We recorded audio data from 94 adults with asthma (overall: mean 43 years; SD 16 years; female: 54/94, 57{\%}; male 40/94, 43{\%}). Audio data were recorded by each participant in their everyday environment using a smartphone placed next to their bed; recordings were made over a period of 28 nights. Out of 704,697 sounds, we identified 30,304 sounds as coughs. A total of 26,166 coughs occurred without a 2-second pause between coughs, yielding 8238 cough epochs. The ensemble classifier performed well with a Matthews correlation coefficient of 92{\%} in a pure classification task and achieved comparable cough counts to that of human annotators in the segmentation of coughing. The count difference between automated and human-annotated coughs was a mean --0.1 (95{\%} CI --12.11, 11.91) coughs. The count difference between automated and human-annotated cough epochs was a mean 0.24 (95{\%} CI --3.67, 4.15) cough epochs. The Gaussian mixture model cough epoch--based sex classification performed best yielding an accuracy of 83{\%}. Conclusions: Our study showed longitudinal nocturnal cough and cough-epoch recognition from nightly recorded smartphone-based audio from adults with asthma. The model distinguishes partner cough from patient cough in contact-free recordings by identifying cough and cough-epoch signals that correspond to the sex of the patient. This research represents a step towards enabling passive and scalable cough monitoring for adults with asthma. ", doi="10.2196/18082", url="https://www.jmir.org/2020/7/e18082", url="https://doi.org/10.2196/18082", url="http://www.ncbi.nlm.nih.gov/pubmed/32459641" } @Article{info:doi/10.2196/16649, author="Gao, Shuqing and He, Lingnan and Chen, Yue and Li, Dan and Lai, Kaisheng", title="Public Perception of Artificial Intelligence in Medical Care: Content Analysis of Social Media", journal="J Med Internet Res", year="2020", month="Jul", day="13", volume="22", number="7", pages="e16649", keywords="artificial intelligence; public perception; social media; content analysis; medical care", abstract="Background: High-quality medical resources are in high demand worldwide, and the application of artificial intelligence (AI) in medical care may help alleviate the crisis related to this shortage. The development of the medical AI industry depends to a certain extent on whether industry experts have a comprehensive understanding of the public's views on medical AI. Currently, the opinions of the general public on this matter remain unclear. Objective: The purpose of this study is to explore the public perception of AI in medical care through a content analysis of social media data, including specific topics that the public is concerned about; public attitudes toward AI in medical care and the reasons for them; and public opinion on whether AI can replace human doctors. Methods: Through an application programming interface, we collected a data set from the Sina Weibo platform comprising more than 16 million users throughout China by crawling all public posts from January to December 2017. Based on this data set, we identified 2315 posts related to AI in medical care and classified them through content analysis. Results: Among the 2315 identified posts, we found three types of AI topics discussed on the platform: (1) technology and application (n=987, 42.63{\%}), (2) industry development (n=706, 30.50{\%}), and (3) impact on society (n=622, 26.87{\%}). Out of 956 posts where public attitudes were expressed, 59.4{\%} (n=568), 34.4{\%} (n=329), and 6.2{\%} (n=59) of the posts expressed positive, neutral, and negative attitudes, respectively. The immaturity of AI technology (27/59, 46{\%}) and a distrust of related companies (n=15, 25{\%}) were the two main reasons for the negative attitudes. Across 200 posts that mentioned public attitudes toward replacing human doctors with AI, 47.5{\%} (n=95) and 32.5{\%} (n=65) of the posts expressed that AI would completely or partially replace human doctors, respectively. In comparison, 20.0{\%} (n=40) of the posts expressed that AI would not replace human doctors. Conclusions: Our findings indicate that people are most concerned about AI technology and applications. Generally, the majority of people held positive attitudes and believed that AI doctors would completely or partially replace human ones. Compared with previous studies on medical doctors, the general public has a more positive attitude toward medical AI. Lack of trust in AI and the absence of the humanistic care factor are essential reasons why some people still have a negative attitude toward medical AI. We suggest that practitioners may need to pay more attention to promoting the credibility of technology companies and meeting patients' emotional needs instead of focusing merely on technical issues. ", doi="10.2196/16649", url="http://www.jmir.org/2020/7/e16649/", url="https://doi.org/10.2196/16649", url="http://www.ncbi.nlm.nih.gov/pubmed/32673231" } @Article{info:doi/10.2196/16021, author="Abd-Alrazaq, Alaa Ali and Rababeh, Asma and Alajlani, Mohannad and Bewick, Bridgette M and Househ, Mowafa", title="Effectiveness and Safety of Using Chatbots to Improve Mental Health: Systematic Review and Meta-Analysis", journal="J Med Internet Res", year="2020", month="Jul", day="13", volume="22", number="7", pages="e16021", keywords="chatbots; conversational agents; mental health; mental disorders; depression; anxiety; effectiveness; safety", abstract="Background: The global shortage of mental health workers has prompted the utilization of technological advancements, such as chatbots, to meet the needs of people with mental health conditions. Chatbots are systems that are able to converse and interact with human users using spoken, written, and visual language. While numerous studies have assessed the effectiveness and safety of using chatbots in mental health, no reviews have pooled the results of those studies. Objective: This study aimed to assess the effectiveness and safety of using chatbots to improve mental health through summarizing and pooling the results of previous studies. Methods: A systematic review was carried out to achieve this objective. The search sources were 7 bibliographic databases (eg, MEDLINE, EMBASE, PsycINFO), the search engine ``Google Scholar,'' and backward and forward reference list checking of the included studies and relevant reviews. Two reviewers independently selected the studies, extracted data from the included studies, and assessed the risk of bias. Data extracted from studies were synthesized using narrative and statistical methods, as appropriate. Results: Of 1048 citations retrieved, we identified 12 studies examining the effect of using chatbots on 8 outcomes. Weak evidence demonstrated that chatbots were effective in improving depression, distress, stress, and acrophobia. In contrast, according to similar evidence, there was no statistically significant effect of using chatbots on subjective psychological wellbeing. Results were conflicting regarding the effect of chatbots on the severity of anxiety and positive and negative affect. Only two studies assessed the safety of chatbots and concluded that they are safe in mental health, as no adverse events or harms were reported. Conclusions: Chatbots have the potential to improve mental health. However, the evidence in this review was not sufficient to definitely conclude this due to lack of evidence that their effect is clinically important, a lack of studies assessing each outcome, high risk of bias in those studies, and conflicting results for some outcomes. Further studies are required to draw solid conclusions about the effectiveness and safety of chatbots. Trial Registration: PROSPERO International Prospective Register of Systematic Reviews CRD42019141219; https://www.crd.york.ac.uk/prospero/display{\_}record.php?ID=CRD42019141219 ", doi="10.2196/16021", url="http://www.jmir.org/2020/7/e16021/", url="https://doi.org/10.2196/16021", url="http://www.ncbi.nlm.nih.gov/pubmed/32673216" } @Article{info:doi/10.2196/18697, author="Jin, Bo and Qu, Yue and Zhang, Liang and Gao, Zhan", title="Diagnosing Parkinson Disease Through Facial Expression Recognition: Video Analysis", journal="J Med Internet Res", year="2020", month="Jul", day="10", volume="22", number="7", pages="e18697", keywords="Parkinson disease; face landmarks; machine learning; artificial intelligence", abstract="Background: The number of patients with neurological diseases is currently increasing annually, which presents tremendous challenges for both patients and doctors. With the advent of advanced information technology, digital medical care is gradually changing the medical ecology. Numerous people are exploring new ways to receive a consultation, track their diseases, and receive rehabilitation training in more convenient and efficient ways. In this paper, we explore the use of facial expression recognition via artificial intelligence to diagnose a typical neurological system disease, Parkinson disease (PD). Objective: This study proposes methods to diagnose PD through facial expression recognition. Methods: We collected videos of facial expressions of people with PD and matched controls. We used relative coordinates and positional jitter to extract facial expression features (facial expression amplitude and shaking of small facial muscle groups) from the key points returned by Face++. Algorithms from traditional machine learning and advanced deep learning were utilized to diagnose PD. Results: The experimental results showed our models can achieve outstanding facial expression recognition ability for PD diagnosis. Applying a long short-term model neural network to the positions of the key features, precision and F1 values of 86{\%} and 75{\%}, respectively, can be reached. Further, utilizing a support vector machine algorithm for the facial expression amplitude features and shaking of the small facial muscle groups, an F1 value of 99{\%} can be achieved. Conclusions: This study contributes to the digital diagnosis of PD based on facial expression recognition. The disease diagnosis model was validated through our experiment. The results can help doctors understand the real-time dynamics of the disease and even conduct remote diagnosis. ", doi="10.2196/18697", url="https://www.jmir.org/2020/7/e18697", url="https://doi.org/10.2196/18697", url="http://www.ncbi.nlm.nih.gov/pubmed/32673247" } @Article{info:doi/10.2196/17558, author="Maher, Carol Ann and Davis, Courtney Rose and Curtis, Rachel Grace and Short, Camille Elizabeth and Murphy, Karen Joy", title="A Physical Activity and Diet Program Delivered by Artificially Intelligent Virtual Health Coach: Proof-of-Concept Study", journal="JMIR Mhealth Uhealth", year="2020", month="Jul", day="10", volume="8", number="7", pages="e17558", keywords="virtual assistant; chatbot; Mediterranean diet; physical activity; lifestyle", abstract="Background: Poor diet and physical inactivity are leading modifiable causes of death and disease. Advances in artificial intelligence technology present tantalizing opportunities for creating virtual health coaches capable of providing personalized support at scale. Objective: This proof of concept study aimed to test the feasibility (recruitment and retention) and preliminary efficacy of physical activity and Mediterranean-style dietary intervention (MedLiPal) delivered via artificially intelligent virtual health coach. Methods: This 12-week single-arm pre-post study took place in Adelaide, Australia, from March to August 2019. Participants were inactive community-dwelling adults aged 45 to 75 years, recruited through news stories, social media posts, and flyers. The program included access to an artificially intelligent chatbot, Paola, who guided participants through a computer-based individualized introductory session, weekly check-ins, and goal setting, and was available 24/7 to answer questions. Participants used a Garmin Vivofit4 tracker to monitor daily steps, a website with educational materials and recipes, and a printed diet and activity log sheet. Primary outcomes included feasibility (based on recruitment and retention) and preliminary efficacy for changing physical activity and diet. Secondary outcomes were body composition (based on height, weight, and waist circumference) and blood pressure. Results: Over 4 weeks, 99 potential participants registered expressions of interest, with 81 of those screened meeting eligibility criteria. Participants completed a mean of 109.8 (95{\%} CI 1.9-217.7) more minutes of physical activity at week 12 compared with baseline. Mediterranean diet scores increased from a mean of 3.8 out of 14 at baseline, to 9.6 at 12 weeks (mean improvement 5.7 points, 95{\%} CI 4.2-7.3). After 12 weeks, participants lost an average 1.3 kg (95{\%} CI --0.1 to --2.5 kg) and 2.1 cm from their waist circumference (95{\%} CI --3.5 to --0.7 cm). There were no significant changes in blood pressure. Feasibility was excellent in terms of recruitment, retention (90{\%} at 12 weeks), and safety (no adverse events). Conclusions: An artificially intelligent virtual assistant-led lifestyle-modification intervention was feasible and achieved measurable improvements in physical activity, diet, and body composition at 12 weeks. Future research examining artificially intelligent interventions at scale, and for other health purposes, is warranted. ", doi="10.2196/17558", url="https://mhealth.jmir.org/2020/7/e17558", url="https://doi.org/10.2196/17558", url="http://www.ncbi.nlm.nih.gov/pubmed/32673246" } @Article{info:doi/10.2196/17216, author="Chae, Sang Hoon and Kim, Yushin and Lee, Kyoung-Soub and Park, Hyung-Soon", title="Development and Clinical Evaluation of a Web-Based Upper Limb Home Rehabilitation System Using a Smartwatch and Machine Learning Model for Chronic Stroke Survivors: Prospective Comparative Study", journal="JMIR Mhealth Uhealth", year="2020", month="Jul", day="9", volume="8", number="7", pages="e17216", keywords="home-based rehabilitation; artificial intelligence; machine learning; wearable device; smartwatch; chronic stroke", abstract="Background: Recent advancements in wearable sensor technology have shown the feasibility of remote physical therapy at home. In particular, the current COVID-19 pandemic has revealed the need and opportunity of internet-based wearable technology in future health care systems. Previous research has shown the feasibility of human activity recognition technologies for monitoring rehabilitation activities in home environments; however, few comprehensive studies ranging from development to clinical evaluation exist. Objective: This study aimed to (1) develop a home-based rehabilitation (HBR) system that can recognize and record the type and frequency of rehabilitation exercises conducted by the user using a smartwatch and smartphone app equipped with a machine learning (ML) algorithm and (2) evaluate the efficacy of the home-based rehabilitation system through a prospective comparative study with chronic stroke survivors. Methods: The HBR system involves an off-the-shelf smartwatch, a smartphone, and custom-developed apps. A convolutional neural network was used to train the ML algorithm for detecting home exercises. To determine the most accurate way for detecting the type of home exercise, we compared accuracy results with the data sets of personal or total data and accelerometer, gyroscope, or accelerometer combined with gyroscope data. From March 2018 to February 2019, we conducted a clinical study with two groups of stroke survivors. In total, 17 and 6 participants were enrolled for statistical analysis in the HBR group and control group, respectively. To measure clinical outcomes, we performed the Wolf Motor Function Test (WMFT), Fugl-Meyer Assessment of Upper Extremity, grip power test, Beck Depression Inventory, and range of motion (ROM) assessment of the shoulder joint at 0, 6, and 12 months, and at a follow-up assessment 6 weeks after retrieving the HBR system. Results: The ML model created with personal data involving accelerometer combined with gyroscope data (5590/5601, 99.80{\%}) was the most accurate compared with accelerometer (5496/5601, 98.13{\%}) or gyroscope data (5381/5601, 96.07{\%}). In the comparative study, the drop-out rates in the control and HBR groups were 40{\%} (4/10) and 22{\%} (5/22) at 12 weeks and 100{\%} (10/10) and 45{\%} (10/22) at 18 weeks, respectively. The HBR group (n=17) showed a significant improvement in the mean WMFT score (P=.02) and ROM of flexion (P=.004) and internal rotation (P=.001). The control group (n=6) showed a significant change only in shoulder internal rotation (P=.03). Conclusions: This study found that a home care system using a commercial smartwatch and ML model can facilitate participation in home training and improve the functional score of the WMFT and shoulder ROM of flexion and internal rotation in the treatment of patients with chronic stroke. This strategy can possibly be a cost-effective tool for the home care treatment of stroke survivors in the future. Trial Registration: Clinical Research Information Service KCT0004818; https://tinyurl.com/y92w978t ", doi="10.2196/17216", url="http://mhealth.jmir.org/2020/7/e17216/", url="https://doi.org/10.2196/17216", url="http://www.ncbi.nlm.nih.gov/pubmed/32480361" } @Article{info:doi/10.2196/14500, author="Kim, Bora and Kim, Younghoon and Park, C Hyung Keun and Rhee, Sang Jin and Kim, Young Shin and Leventhal, Bennett L and Ahn, Yong Min and Paik, Hyojung", title="Identifying the Medical Lethality of Suicide Attempts Using Network Analysis and Deep Learning: Nationwide Study", journal="JMIR Med Inform", year="2020", month="Jul", day="9", volume="8", number="7", pages="e14500", keywords="suicide; deep learning; network; antecedent behaviors", abstract="Background: Suicide is one of the leading causes of death among young and middle-aged people. However, little is understood about the behaviors leading up to actual suicide attempts and whether these behaviors are specific to the nature of suicide attempts. Objective: The goal of this study was to examine the clusters of behaviors antecedent to suicide attempts to determine if they could be used to assess the potential lethality of the attempt. To accomplish this goal, we developed a deep learning model using the relationships among behaviors antecedent to suicide attempts and the attempts themselves. Methods: This study used data from the Korea National Suicide Survey. We identified 1112 individuals who attempted suicide and completed a psychiatric evaluation in the emergency room. The 15-item Beck Suicide Intent Scale (SIS) was used for assessing antecedent behaviors, and the medical outcomes of the suicide attempts were measured by assessing lethality with the Columbia Suicide Severity Rating Scale (C-SSRS; lethal suicide attempt >3 and nonlethal attempt ≤3). Results: Using scores from the SIS, individuals who had lethal and nonlethal attempts comprised two different network nodes with the edges representing the relationships among nodes. Among the antecedent behaviors, the conception of a method's lethality predicted suicidal behaviors with severe medical outcomes. The vectorized relationship values among the elements of antecedent behaviors in our deep learning model (E-GONet) increased performances, such as F1 and area under the precision-recall gain curve (AUPRG), for identifying lethal attempts (up to 3{\%} for F1 and 32{\%} for AUPRG), as compared with other models (mean F1: 0.81 for E-GONet, 0.78 for linear regression, and 0.80 for random forest; mean AUPRG: 0.73 for E-GONet, 0.41 for linear regression, and 0.69 for random forest). Conclusions: The relationships among behaviors antecedent to suicide attempts can be used to understand the suicidal intent of individuals and help identify the lethality of potential suicide attempts. Such a model may be useful in prioritizing cases for preventive intervention. ", doi="10.2196/14500", url="http://medinform.jmir.org/2020/7/e14500/", url="https://doi.org/10.2196/14500", url="http://www.ncbi.nlm.nih.gov/pubmed/32673253" } @Article{info:doi/10.2196/17707, author="Alami, Hassane and Lehoux, Pascale and Auclair, Yannick and de Guise, Mich{\`e}le and Gagnon, Marie-Pierre and Shaw, James and Roy, Denis and Fleet, Richard and Ag Ahmed, Mohamed Ali and Fortin, Jean-Paul", title="Artificial Intelligence and Health Technology Assessment: Anticipating a New Level of Complexity", journal="J Med Internet Res", year="2020", month="Jul", day="7", volume="22", number="7", pages="e17707", keywords="artificial intelligence; health technology assessment; eHealth; health care; medical device; patient; health services", doi="10.2196/17707", url="https://www.jmir.org/2020/7/e17707", url="https://doi.org/10.2196/17707", url="http://www.ncbi.nlm.nih.gov/pubmed/32406850" } @Article{info:doi/10.2196/19285, author="Sapci, A Hasan and Sapci, H Aylin", title="Artificial Intelligence Education and Tools for Medical and Health Informatics Students: Systematic Review", journal="JMIR Med Educ", year="2020", month="Jun", day="30", volume="6", number="1", pages="e19285", keywords="artificial intelligence; education; machine learning; deep learning; medical education; health informatics; systematic review", abstract="Background: The use of artificial intelligence (AI) in medicine will generate numerous application possibilities to improve patient care, provide real-time data analytics, and enable continuous patient monitoring. Clinicians and health informaticians should become familiar with machine learning and deep learning. Additionally, they should have a strong background in data analytics and data visualization to use, evaluate, and develop AI applications in clinical practice. Objective: The main objective of this study was to evaluate the current state of AI training and the use of AI tools to enhance the learning experience. Methods: A comprehensive systematic review was conducted to analyze the use of AI in medical and health informatics education, and to evaluate existing AI training practices. PRISMA-P (Preferred Reporting Items for Systematic Reviews and Meta-Analysis Protocols) guidelines were followed. The studies that focused on the use of AI tools to enhance medical education and the studies that investigated teaching AI as a new competency were categorized separately to evaluate recent developments. Results: This systematic review revealed that recent publications recommend the integration of AI training into medical and health informatics curricula. Conclusions: To the best of our knowledge, this is the first systematic review exploring the current state of AI education in both medicine and health informatics. Since AI curricula have not been standardized and competencies have not been determined, a framework for specialized AI training in medical and health informatics education is proposed. ", doi="10.2196/19285", url="http://mededu.jmir.org/2020/1/e19285/", url="https://doi.org/10.2196/19285", url="http://www.ncbi.nlm.nih.gov/pubmed/32602844" } @Article{info:doi/10.2196/19202, author="Du, Lin", title="Medical Emergency Resource Allocation Model in Large-Scale Emergencies Based on Artificial Intelligence: Algorithm Development", journal="JMIR Med Inform", year="2020", month="Jun", day="25", volume="8", number="6", pages="e19202", keywords="medical emergency; resource allocation model; distribution model; large-scale emergencies; artificial intelligence", abstract="Background: Before major emergencies occur, the government needs to prepare various emergency supplies in advance. To do this, it should consider the coordinated storage of different types of materials while ensuring that emergency materials are not missed or superfluous. Objective: This paper aims to improve the dispatch and transportation efficiency of emergency materials under a model in which the government makes full use of Internet of Things technology and artificial intelligence technology. Methods: The paper established a model for emergency material preparation and dispatch based on queueing theory and further established a workflow system for emergency material preparation, dispatch, and transportation based on a Petri net, resulting in a highly efficient emergency material preparation and dispatch simulation system framework. Results: A decision support platform was designed to integrate all the algorithms and principles proposed. Conclusions: The resulting framework can effectively coordinate the workflow of emergency material preparation and dispatch, helping to shorten the total time of emergency material preparation, dispatch, and transportation. ", doi="10.2196/19202", url="http://medinform.jmir.org/2020/6/e19202/", url="https://doi.org/10.2196/19202", url="http://www.ncbi.nlm.nih.gov/pubmed/32584262" } @Article{info:doi/10.2196/18890, author="Linden, Brooke and Tam-Seto, Linna and Stuart, Heather", title="Adherence of the {\#}Here4U App -- Military Version to Criteria for the Development of Rigorous Mental Health Apps", journal="JMIR Form Res", year="2020", month="Jun", day="17", volume="4", number="6", pages="e18890", keywords="mental health services; telemedicine; mHealth; chatbot; e-solutions; Canadian Armed Forces; military health; mobile phone", abstract="Background: Over the past several years, the emergence of mobile mental health apps has increased as a potential solution for populations who may face logistical and social barriers to traditional service delivery, including individuals connected to the military. Objective: The goal of the {\#}Here4U App -- Military Version is to provide evidence-informed mental health support to members of Canada's military community, leveraging artificial intelligence in the form of IBM Canada's Watson Assistant to carry on unique text-based conversations with users, identify presenting mental health concerns, and refer users to self-help resources or recommend professional health care where appropriate. Methods: As the availability and use of mental health apps has increased, so too has the list of recommendations and guidelines for efficacious development. We describe the development and testing conducted between 2018 and 2020 and assess the quality of the {\#}Here4U App against 16 criteria for rigorous mental health app development, as identified by Bakker and colleagues in 2016. Results: The {\#}Here4U App -- Military Version met the majority of Bakker and colleagues' criteria, with those unmet considered not applicable to this particular product or out of scope for research conducted to date. Notably, a formal evaluation of the efficacy of the app is a major priority moving forward. Conclusions: The {\#}Here4U App -- Military Version is a promising new mental health e-solution for members of the Canadian Armed Forces community, filling many of the gaps left by traditional service delivery. ", doi="10.2196/18890", url="https://formative.jmir.org/2020/6/e18890", url="https://doi.org/10.2196/18890", url="http://www.ncbi.nlm.nih.gov/pubmed/32554374" } @Article{info:doi/10.2196/18301, author="Abd-Alrazaq, Alaa and Safi, Zeineb and Alajlani, Mohannad and Warren, Jim and Househ, Mowafa and Denecke, Kerstin", title="Technical Metrics Used to Evaluate Health Care Chatbots: Scoping Review", journal="J Med Internet Res", year="2020", month="Jun", day="5", volume="22", number="6", pages="e18301", keywords="chatbots; conversational agents; health care; evaluation; metrics", abstract="Background: Dialog agents (chatbots) have a long history of application in health care, where they have been used for tasks such as supporting patient self-management and providing counseling. Their use is expected to grow with increasing demands on health systems and improving artificial intelligence (AI) capability. Approaches to the evaluation of health care chatbots, however, appear to be diverse and haphazard, resulting in a potential barrier to the advancement of the field. Objective: This study aims to identify the technical (nonclinical) metrics used by previous studies to evaluate health care chatbots. Methods: Studies were identified by searching 7 bibliographic databases (eg, MEDLINE and PsycINFO) in addition to conducting backward and forward reference list checking of the included studies and relevant reviews. The studies were independently selected by two reviewers who then extracted data from the included studies. Extracted data were synthesized narratively by grouping the identified metrics into categories based on the aspect of chatbots that the metrics evaluated. Results: Of the 1498 citations retrieved, 65 studies were included in this review. Chatbots were evaluated using 27 technical metrics, which were related to chatbots as a whole (eg, usability, classifier performance, speed), response generation (eg, comprehensibility, realism, repetitiveness), response understanding (eg, chatbot understanding as assessed by users, word error rate, concept error rate), and esthetics (eg, appearance of the virtual agent, background color, and content). Conclusions: The technical metrics of health chatbot studies were diverse, with survey designs and global usability metrics dominating. The lack of standardization and paucity of objective measures make it difficult to compare the performance of health chatbots and could inhibit advancement of the field. We suggest that researchers more frequently include metrics computed from conversation logs. In addition, we recommend the development of a framework of technical metrics with recommendations for specific circumstances for their inclusion in chatbot studies. ", doi="10.2196/18301", url="http://www.jmir.org/2020/6/e18301/", url="https://doi.org/10.2196/18301", url="http://www.ncbi.nlm.nih.gov/pubmed/32442157" } @Article{info:doi/10.2196/16670, author="Bala, Sandeep and Keniston, Angela and Burden, Marisha", title="Patient Perception of Plain-Language Medical Notes Generated Using Artificial Intelligence Software: Pilot Mixed-Methods Study", journal="JMIR Form Res", year="2020", month="Jun", day="5", volume="4", number="6", pages="e16670", keywords="artificial intelligence; patient education; natural language processing; OpenNotes; Open Notes; patient-physician relationship; simplified notes; plain-language notes", abstract="Background: Clinicians' time with patients has become increasingly limited due to regulatory burden, documentation and billing, administrative responsibilities, and market forces. These factors limit clinicians' time to deliver thorough explanations to patients. OpenNotes began as a research initiative exploring the ability of sharing medical notes with patients to help patients understand their health care. Providing patients access to their medical notes has been shown to have many benefits, including improved patient satisfaction and clinical outcomes. OpenNotes has since evolved into a national movement that helps clinicians share notes with patients. However, a significant barrier to the widespread adoption of OpenNotes has been clinicians' concerns that OpenNotes may cost additional time to correct patient confusion over medical language. Recent advances in artificial intelligence (AI) technology may help resolve this concern by converting medical notes to plain language with minimal time required of clinicians. Objective: This pilot study assesses patient comprehension and perceived benefits, concerns, and insights regarding an AI-simplified note through comprehension questions and guided interview. Methods: Synthea, a synthetic patient generator, was used to generate a standardized medical-language patient note which was then simplified using AI software. A multiple-choice comprehension assessment questionnaire was drafted with physician input. Study participants were recruited from inpatients at the University of Colorado Hospital. Participants were randomly assigned to be tested for their comprehension of the standardized medical-language version or AI-generated plain-language version of the patient note. Following this, participants reviewed the opposite version of the note and participated in a guided interview. A Student t test was performed to assess for differences in comprehension assessment scores between plain-language and medical-language note groups. Multivariate modeling was performed to assess the impact of demographic variables on comprehension. Interview responses were thematically analyzed. Results: Twenty patients agreed to participate. The mean number of comprehension assessment questions answered correctly was found to be higher in the plain-language group compared with the medical-language group; however, the Student t test was found to be underpowered to determine if this was significant. Age, ethnicity, and health literacy were found to have a significant impact on comprehension scores by multivariate modeling. Thematic analysis of guided interviews highlighted patients' perceived benefits, concerns, and suggestions regarding such notes. Major themes of benefits were that simplified plain-language notes may (1) be more useable than unsimplified medical-language notes, (2) improve the patient-clinician relationship, and (3) empower patients through an enhanced understanding of their health care. Conclusions: AI software may translate medical notes into plain-language notes that are perceived as beneficial by patients. Limitations included sample size, inpatient-only setting, and possible confounding factors. Larger studies are needed to assess comprehension. Insight from patient responses to guided interviews can guide the future study and development of this technology. ", doi="10.2196/16670", url="https://formative.jmir.org/2020/6/e16670", url="https://doi.org/10.2196/16670", url="http://www.ncbi.nlm.nih.gov/pubmed/32442148" } @Article{info:doi/10.2196/18677, author="Fu, Weifeng", title="Application of an Isolated Word Speech Recognition System in the Field of Mental Health Consultation: Development and Usability Study", journal="JMIR Med Inform", year="2020", month="Jun", day="3", volume="8", number="6", pages="e18677", keywords="speech recognition; isolated words; mental health; small vocabulary; HMM; hidden Markov model; programming", abstract="Background: Speech recognition is a technology that enables machines to understand human language. Objective: In this study, speech recognition of isolated words from a small vocabulary was applied to the field of mental health counseling. Methods: A software platform was used to establish a human-machine chat for psychological counselling. The software uses voice recognition technology to decode the user's voice information. The software system analyzes and processes the user's voice information according to many internal related databases, and then gives the user accurate feedback. For users who need psychological treatment, the system provides them with psychological education. Results: The speech recognition system included features such as speech extraction, endpoint detection, feature value extraction, training data, and speech recognition. Conclusions: The Hidden Markov Model was adopted, based on multithread programming under a VC2005 compilation environment, to realize the parallel operation of the algorithm and improve the efficiency of speech recognition. After the design was completed, simulation debugging was performed in the laboratory. The experimental results showed that the designed program met the basic requirements of a speech recognition system. ", doi="10.2196/18677", url="https://medinform.jmir.org/2020/6/e18677", url="https://doi.org/10.2196/18677", url="http://www.ncbi.nlm.nih.gov/pubmed/32384054" } @Article{info:doi/10.2196/16896, author="Bian, Yanyan and Xiang, Yongbo and Tong, Bingdu and Feng, Bin and Weng, Xisheng", title="Artificial Intelligence--Assisted System in Postoperative Follow-up of Orthopedic Patients: Exploratory Quantitative and Qualitative Study", journal="J Med Internet Res", year="2020", month="May", day="26", volume="22", number="5", pages="e16896", keywords="artificial intelligence; conversational agent; follow-up; cost-effectiveness", abstract="Background: Patient follow-up is an essential part of hospital ward management. With the development of deep learning algorithms, individual follow-up assignments might be completed by artificial intelligence (AI). We developed an AI-assisted follow-up conversational agent that can simulate the human voice and select an appropriate follow-up time for quantitative, automatic, and personalized patient follow-up. Patient feedback and voice information could be collected and converted into text data automatically. Objective: The primary objective of this study was to compare the cost-effectiveness of AI-assisted follow-up to manual follow-up of patients after surgery. The secondary objective was to compare the feedback from AI-assisted follow-up to feedback from manual follow-up. Methods: The AI-assisted follow-up system was adopted in the Orthopedic Department of Peking Union Medical College Hospital in April 2019. A total of 270 patients were followed up through this system. Prior to that, 2656 patients were followed up by phone calls manually. Patient characteristics, telephone connection rate, follow-up rate, feedback collection rate, time spent, and feedback composition were compared between the two groups of patients. Results: There was no statistically significant difference in age, gender, or disease between the two groups. There was no significant difference in telephone connection rate (manual: 2478/2656, 93.3{\%}; AI-assisted: 249/270, 92.2{\%}; P=.50) or successful follow-up rate (manual: 2301/2478, 92.9{\%}; AI-assisted: 231/249, 92.8{\%}; P=.96) between the two groups. The time spent on 100 patients in the manual follow-up group was about 9.3 hours. In contrast, the time spent on the AI-assisted follow-up was close to 0 hours. The feedback rate in the AI-assisted follow-up group was higher than that in the manual follow-up group (manual: 68/2656, 2.5{\%}; AI-assisted: 28/270, 10.3{\%}; P<.001). The composition of feedback was different in the two groups. Feedback from the AI-assisted follow-up group mainly included nursing, health education, and hospital environment content, while feedback from the manual follow-up group mostly included medical consultation content. Conclusions: The effectiveness of AI-assisted follow-up was not inferior to that of manual follow-up. Human resource costs are saved by AI. AI can help obtain comprehensive feedback from patients, although its depth and pertinence of communication need to be improved. ", doi="10.2196/16896", url="http://www.jmir.org/2020/5/e16896/", url="https://doi.org/10.2196/16896", url="http://www.ncbi.nlm.nih.gov/pubmed/32452807" } @Article{info:doi/10.2196/15859, author="Arem, Hannah and Scott, Remle and Greenberg, Daniel and Kaltman, Rebecca and Lieberman, Daniel and Lewin, Daniel", title="Assessing Breast Cancer Survivors' Perceptions of Using Voice-Activated Technology to Address Insomnia: Feasibility Study Featuring Focus Groups and In-Depth Interviews", journal="JMIR Cancer", year="2020", month="May", day="26", volume="6", number="1", pages="e15859", keywords="artificial intelligence; breast neoplasms; survivors; insomnia; cognitive behavioral therapy; mobile phones", abstract="Background: Breast cancer survivors (BCSs) are a growing population with a higher prevalence of insomnia than women of the same age without a history of cancer. Cognitive behavioral therapy for insomnia (CBT-I) has been shown to be effective in this population, but it is not widely available to those who need it. Objective: This study aimed to better understand BCSs' experiences with insomnia and to explore the feasibility and acceptability of delivering CBT-I using a virtual assistant (Amazon Alexa). Methods: We first conducted a formative phase with 2 focus groups and 3 in-depth interviews to understand BCSs' perceptions of insomnia as well as their interest in and comfort with using a virtual assistant to learn about CBT-I. We then developed a prototype incorporating participant preferences and CBT-I components and demonstrated it in group and individual settings to BCSs to evaluate acceptability, interest, perceived feasibility, educational potential, and usability of the prototype. We also collected open-ended feedback on the content and used frequencies to describe the quantitative data. Results: We recruited 11 BCSs with insomnia in the formative phase and 14 BCSs in the prototype demonstration. In formative work, anxiety, fear, and hot flashes were identified as causes of insomnia. After prototype demonstration, nearly 79{\%} (11/14) of participants reported an interest in and perceived feasibility of using the virtual assistant to record sleep patterns. Approximately two-thirds of the participants thought lifestyle modification (9/14, 64{\%}) and sleep restriction (9/14, 64{\%}) would be feasible and were interested in this feature of the program (10/14, 71{\%} and 9/14, 64{\%}, respectively). Relaxation exercises were rated as interesting and feasible using the virtual assistant by 71{\%} (10/14) of the participants. Usability was rated as better than average, and all women reported that they would recommend the program to friends and family. Conclusions: This virtual assistant prototype delivering CBT-I components by using a smart speaker was rated as feasible and acceptable, suggesting that this prototype should be fully developed and tested for efficacy in the BCS population. If efficacy is shown in this population, the prototype should also be adapted for other high-risk populations. ", doi="10.2196/15859", url="http://cancer.jmir.org/2020/1/e15859/", url="https://doi.org/10.2196/15859", url="http://www.ncbi.nlm.nih.gov/pubmed/32348274" } @Article{info:doi/10.2196/17647, author="Park, Soo Jin and Lee, Eun Ji and Kim, Se Ik and Kong, Seong-Ho and Jeong, Chang Wook and Kim, Hee Seung", title="Clinical Desire for an Artificial Intelligence--Based Surgical Assistant System: Electronic Survey--Based Study", journal="JMIR Med Inform", year="2020", month="May", day="15", volume="8", number="5", pages="e17647", keywords="artificial intelligence; solo surgery; laparoscopic surgery", abstract="Background: Techniques utilizing artificial intelligence (AI) are rapidly growing in medical research and development, especially in the operating room. However, the application of AI in the operating room has been limited to small tasks or software, such as clinical decision systems. It still largely depends on human resources and technology involving the surgeons' hands. Therefore, we conceptualized AI-based solo surgery (AISS) defined as laparoscopic surgery conducted by only one surgeon with support from an AI-based surgical assistant system, and we performed an electronic survey on the clinical desire for such a system. Objective: This study aimed to evaluate the experiences of surgeons who have performed laparoscopic surgery, the limitations of conventional laparoscopic surgical systems, and the desire for an AI-based surgical assistant system for AISS. Methods: We performed an online survey for gynecologists, urologists, and general surgeons from June to August 2017. The questionnaire consisted of six items about experience, two about limitations, and five about the clinical desire for an AI-based surgical assistant system for AISS. Results: A total of 508 surgeons who have performed laparoscopic surgery responded to the survey. Most of the surgeons needed two or more assistants during laparoscopic surgery, and the rate was higher among gynecologists (251/278, 90.3{\%}) than among general surgeons (123/173, 71.1{\%}) and urologists (35/57, 61.4{\%}). The majority of responders answered that the skillfulness of surgical assistants was ``very important'' or ``important.'' The most uncomfortable aspect of laparoscopic surgery was unskilled movement of the camera (431/508, 84.8{\%}) and instruments (303/508, 59.6{\%}). About 40{\%} (199/508, 39.1{\%}) of responders answered that the AI-based surgical assistant system could substitute 41{\%}-60{\%} of the current workforce, and 83.3{\%} (423/508) showed willingness to buy the system. Furthermore, the most reasonable price was US {\$}30,000-50,000. Conclusions: Surgeons who perform laparoscopic surgery may feel discomfort with the conventional laparoscopic surgical system in terms of assistant skillfulness, and they may think that the skillfulness of surgical assistants is essential. They desire to alleviate present inconveniences with the conventional laparoscopic surgical system and to perform a safe and comfortable operation by using an AI-based surgical assistant system for AISS. ", doi="10.2196/17647", url="http://medinform.jmir.org/2020/5/e17647/", url="https://doi.org/10.2196/17647", url="http://www.ncbi.nlm.nih.gov/pubmed/32412421" } @Article{info:doi/10.2196/17620, author="Abdullah, Rana and Fakieh, Bahjat", title="Health Care Employees' Perceptions of the Use of Artificial Intelligence Applications: Survey Study", journal="J Med Internet Res", year="2020", month="May", day="14", volume="22", number="5", pages="e17620", keywords="artificial intelligence; employees; healthcare sector; perception; Saudi Arabia", abstract="Background: The advancement of health care information technology and the emergence of artificial intelligence has yielded tools to improve the quality of various health care processes. Few studies have investigated employee perceptions of artificial intelligence implementation in Saudi Arabia and the Arabian world. In addition, limited studies investigated the effect of employee knowledge and job title on the perception of artificial intelligence implementation in the workplace. Objective: The aim of this study was to explore health care employee perceptions and attitudes toward the implementation of artificial intelligence technologies in health care institutions in Saudi Arabia. Methods: An online questionnaire was published, and responses were collected from 250 employees, including doctors, nurses, and technicians at 4 of the largest hospitals in Riyadh, Saudi Arabia. Results: The results of this study showed that 3.11 of 4 respondents feared artificial intelligence would replace employees and had a general lack of knowledge regarding artificial intelligence. In addition, most respondents were unaware of the advantages and most common challenges to artificial intelligence applications in the health sector, indicating a need for training. The results also showed that technicians were the most frequently impacted by artificial intelligence applications due to the nature of their jobs, which do not require much direct human interaction. Conclusions: The Saudi health care sector presents an advantageous market potential that should be attractive to researchers and developers of artificial intelligence solutions. ", doi="10.2196/17620", url="http://www.jmir.org/2020/5/e17620/", url="https://doi.org/10.2196/17620", url="http://www.ncbi.nlm.nih.gov/pubmed/32406857" } @Article{info:doi/10.2196/17234, author="Liang, Bin and Yang, Na and He, Guosheng and Huang, Peng and Yang, Yong", title="Identification of the Facial Features of Patients With Cancer: A Deep Learning--Based Pilot Study", journal="J Med Internet Res", year="2020", month="Apr", day="29", volume="22", number="4", pages="e17234", keywords="convolutional neural network; facial features; cancer patient; deep learning; cancer", abstract="Background: Cancer has become the second leading cause of death globally. Most cancer cases are due to genetic mutations, which affect metabolism and result in facial changes. Objective: In this study, we aimed to identify the facial features of patients with cancer using the deep learning technique. Methods: Images of faces of patients with cancer were collected to build the cancer face image data set. A face image data set of people without cancer was built by randomly selecting images from the publicly available MegaAge data set according to the sex and age distribution of the cancer face image data set. Each face image was preprocessed to obtain an upright centered face chip, following which the background was filtered out to exclude the effects of nonrelative factors. A residual neural network was constructed to classify cancer and noncancer cases. Transfer learning, minibatches, few epochs, L2 regulation, and random dropout training strategies were used to prevent overfitting. Moreover, guided gradient-weighted class activation mapping was used to reveal the relevant features. Results: A total of 8124 face images of patients with cancer (men: n=3851, 47.4{\%}; women: n=4273, 52.6{\%}) were collected from January 2018 to January 2019. The ages of the patients ranged from 1 year to 70 years (median age 52 years). The average faces of both male and female patients with cancer displayed more obvious facial adiposity than the average faces of people without cancer, which was supported by a landmark comparison. When testing the data set, the training process was terminated after 5 epochs. The area under the receiver operating characteristic curve was 0.94, and the accuracy rate was 0.82. The main relative feature of cancer cases was facial skin, while the relative features of noncancer cases were extracted from the complementary face region. Conclusions: In this study, we built a face data set of patients with cancer and constructed a deep learning model to classify the faces of people with and those without cancer. We found that facial skin and adiposity were closely related to the presence of cancer. ", doi="10.2196/17234", url="http://www.jmir.org/2020/4/e17234/", url="https://doi.org/10.2196/17234", url="http://www.ncbi.nlm.nih.gov/pubmed/32347802" } @Article{info:doi/10.2196/17125, author="Falissard, Louis and Morgand, Claire and Roussel, Sylvie and Imbaud, Claire and Ghosn, Walid and Bounebache, Karim and Rey, Gr{\'e}goire", title="A Deep Artificial Neural Network−Based Model for Prediction of Underlying Cause of Death From Death Certificates: Algorithm Development and Validation", journal="JMIR Med Inform", year="2020", month="Apr", day="28", volume="8", number="4", pages="e17125", keywords="machine learning; deep learning; mortality statistics; underlying cause of death", abstract="Background: Coding of underlying causes of death from death certificates is a process that is nowadays undertaken mostly by humans with potential assistance from expert systems, such as the Iris software. It is, consequently, an expensive process that can, in addition, suffer from geospatial discrepancies, thus severely impairing the comparability of death statistics at the international level. The recent advances in artificial intelligence, specifically the rise of deep learning methods, has enabled computers to make efficient decisions on a number of complex problems that were typically considered out of reach without human assistance; they require a considerable amount of data to learn from, which is typically their main limiting factor. However, the C{\'e}piDc (Centre d'{\'e}pid{\'e}miologie sur les causes m{\'e}dicales de D{\'e}c{\`e}s) stores an exhaustive database of death certificates at the French national scale, amounting to several millions of training examples available for the machine learning practitioner. Objective: This article investigates the application of deep neural network methods to coding underlying causes of death. Methods: The investigated dataset was based on data contained from every French death certificate from 2000 to 2015, containing information such as the subject's age and gender, as well as the chain of events leading to his or her death, for a total of around 8 million observations. The task of automatically coding the subject's underlying cause of death was then formulated as a predictive modelling problem. A deep neural network−based model was then designed and fit to the dataset. Its error rate was then assessed on an exterior test dataset and compared to the current state-of-the-art (ie, the Iris software). Statistical significance of the proposed approach's superiority was assessed via bootstrap. Results: The proposed approach resulted in a test accuracy of 97.8{\%} (95{\%} CI 97.7-97.9), which constitutes a significant improvement over the current state-of-the-art and its accuracy of 74.5{\%} (95{\%} CI 74.0-75.0) assessed on the same test example. Such an improvement opens up a whole field of new applications, from nosologist-level batch-automated coding to international and temporal harmonization of cause of death statistics. A typical example of such an application is demonstrated by recoding French overdose-related deaths from 2000 to 2010. Conclusions: This article shows that deep artificial neural networks are perfectly suited to the analysis of electronic health records and can learn a complex set of medical rules directly from voluminous datasets, without any explicit prior knowledge. Although not entirely free from mistakes, the derived algorithm constitutes a powerful decision-making tool that is able to handle structured medical data with an unprecedented performance. We strongly believe that the methods developed in this article are highly reusable in a variety of settings related to epidemiology, biostatistics, and the medical sciences in general. ", doi="10.2196/17125", url="http://medinform.jmir.org/2020/4/e17125/", url="https://doi.org/10.2196/17125", url="http://www.ncbi.nlm.nih.gov/pubmed/32343252" } @Article{info:doi/10.2196/17490, author="Buchanan, Christine and Howitt, M Lyndsay and Wilson, Rita and Booth, Richard G and Risling, Tracie and Bamford, Megan", title="Nursing in the Age of Artificial Intelligence: Protocol for a Scoping Review", journal="JMIR Res Protoc", year="2020", month="Apr", day="16", volume="9", number="4", pages="e17490", keywords="nursing; artificial intelligence; machine learning; robotics; compassionate care; scoping review", abstract="Background: It is predicted that digital health technologies that incorporate artificial intelligence will transform health care delivery in the next decade. Little research has explored how emerging trends in artificial intelligence--driven digital health technologies may influence the relationship between nurses and patients. Objective: The purpose of this scoping review is to summarize the findings from 4 research questions regarding emerging trends in artificial intelligence--driven digital health technologies and their influence on nursing practice across the 5 domains outlined by the Canadian Nurses Association framework: administration, clinical care, education, policy, and research. Specifically, this scoping review will examine how emerging trends will transform the roles and functions of nurses over the next 10 years and beyond. Methods: Using an established scoping review methodology, MEDLINE, Cumulative Index to Nursing and Allied Health Literature, Embase, PsycINFO, Cochrane Database of Systematic Reviews, Cochrane Central, Education Resources Information Centre, Scopus, Web of Science, and Proquest databases were searched. In addition to the electronic database searches, a targeted website search will be performed to access relevant grey literature. Abstracts and full-text studies will be independently screened by 2 reviewers using prespecified inclusion and exclusion criteria. Included literature will focus on nursing and digital health technologies that incorporate artificial intelligence. Data will be charted using a structured form and narratively summarized. Results: Electronic database searches have retrieved 10,318 results. The scoping review and subsequent briefing paper will be completed by the fall of 2020. Conclusions: A symposium will be held to share insights gained from this scoping review with key thought leaders and a cross section of stakeholders from administration, clinical care, education, policy, and research as well as patient advocates. The symposium will provide a forum to explore opportunities for action to advance the future of nursing in a technological world and, more specifically, nurses' delivery of compassionate care in the age of artificial intelligence. Results from the symposium will be summarized in the form of a briefing paper and widely disseminated to relevant stakeholders. International Registered Report Identifier (IRRID): DERR1-10.2196/17490 ", doi="10.2196/17490", url="http://www.researchprotocols.org/2020/4/e17490/", url="https://doi.org/10.2196/17490", url="http://www.ncbi.nlm.nih.gov/pubmed/32297873" } @Article{info:doi/10.2196/15876, author="King, Andrew J and Cooper, Gregory F and Clermont, Gilles and Hochheiser, Harry and Hauskrecht, Milos and Sittig, Dean F and Visweswaran, Shyam", title="Leveraging Eye Tracking to Prioritize Relevant Medical Record Data: Comparative Machine Learning Study", journal="J Med Internet Res", year="2020", month="Apr", day="2", volume="22", number="4", pages="e15876", keywords="electronic medical record system; eye tracking; machine learning; intensive care unit; information-seeking behavior", abstract="Background: Electronic medical record (EMR) systems capture large amounts of data per patient and present that data to physicians with little prioritization. Without prioritization, physicians must mentally identify and collate relevant data, an activity that can lead to cognitive overload. To mitigate cognitive overload, a Learning EMR (LEMR) system prioritizes the display of relevant medical record data. Relevant data are those that are pertinent to a context---defined as the combination of the user, clinical task, and patient case. To determine which data are relevant in a specific context, a LEMR system uses supervised machine learning models of physician information-seeking behavior. Since obtaining information-seeking behavior data via manual annotation is slow and expensive, automatic methods for capturing such data are needed. Objective: The goal of the research was to propose and evaluate eye tracking as a high-throughput method to automatically acquire physician information-seeking behavior useful for training models for a LEMR system. Methods: Critical care medicine physicians reviewed intensive care unit patient cases in an EMR interface developed for the study. Participants manually identified patient data that were relevant in the context of a clinical task: preparing a patient summary to present at morning rounds. We used eye tracking to capture each physician's gaze dwell time on each data item (eg, blood glucose measurements). Manual annotations and gaze dwell times were used to define target variables for developing supervised machine learning models of physician information-seeking behavior. We compared the performance of manual selection and gaze-derived models on an independent set of patient cases. Results: A total of 68 pairs of manual selection and gaze-derived machine learning models were developed from training data and evaluated on an independent evaluation data set. A paired Wilcoxon signed-rank test showed similar performance of manual selection and gaze-derived models on area under the receiver operating characteristic curve (P=.40). Conclusions: We used eye tracking to automatically capture physician information-seeking behavior and used it to train models for a LEMR system. The models that were trained using eye tracking performed like models that were trained using manual annotations. These results support further development of eye tracking as a high-throughput method for training clinical decision support systems that prioritize the display of relevant medical record data. ", doi="10.2196/15876", url="https://www.jmir.org/2020/4/e15876", url="https://doi.org/10.2196/15876", url="http://www.ncbi.nlm.nih.gov/pubmed/32238342" } @Article{info:doi/10.2196/16606, author="Schoeb, Dominik and Suarez-Ibarrola, Rodrigo and Hein, Simon and Dressler, Franz Friedrich and Adams, Fabian and Schlager, Daniel and Miernik, Arkadiusz", title="Use of Artificial Intelligence for Medical Literature Search: Randomized Controlled Trial Using the Hackathon Format", journal="Interact J Med Res", year="2020", month="Mar", day="30", volume="9", number="1", pages="e16606", keywords="artificial intelligence; literature review; medical information technology", abstract="Background: Mapping out the research landscape around a project is often time consuming and difficult. Objective: This study evaluates a commercial artificial intelligence (AI) search engine (IRIS.AI) for its applicability in an automated literature search on a specific medical topic. Methods: To evaluate the AI search engine in a standardized manner, the concept of a science hackathon was applied. Three groups of researchers were tasked with performing a literature search on a clearly defined scientific project. All participants had a high level of expertise for this specific field of research. Two groups were given access to the AI search engine IRIS.AI. All groups were given the same amount of time for their search and were instructed to document their results. Search results were summarized and ranked according to a predetermined scoring system. Results: The final scoring awarded 49 and 39 points out of 60 to AI groups 1 and 2, respectively, and the control group received 46 points. A total of 20 scientific studies with high relevance were identified, and 5 highly relevant studies (``spot on'') were reported by each group. Conclusions: AI technology is a promising approach to facilitate literature searches and the management of medical libraries. In this study, however, the application of AI technology lead to a more focused literature search without a significant improvement in the number of results. ", doi="10.2196/16606", url="http://www.i-jmr.org/2020/1/e16606/", url="https://doi.org/10.2196/16606", url="http://www.ncbi.nlm.nih.gov/pubmed/32224481" } @Article{info:doi/10.2196/16235, author="Ta, Vivian and Griffith, Caroline and Boatfield, Carolynn and Wang, Xinyu and Civitello, Maria and Bader, Haley and DeCero, Esther and Loggarakis, Alexia", title="User Experiences of Social Support From Companion Chatbots in Everyday Contexts: Thematic Analysis", journal="J Med Internet Res", year="2020", month="Mar", day="6", volume="22", number="3", pages="e16235", keywords="artificial intelligence; social support; artificial agents; chatbots; interpersonal relations", abstract="Background: Previous research suggests that artificial agents may be a promising source of social support for humans. However, the bulk of this research has been conducted in the context of social support interventions that specifically address stressful situations or health improvements. Little research has examined social support received from artificial agents in everyday contexts. Objective: Considering that social support manifests in not only crises but also everyday situations and that everyday social support forms the basis of support received during more stressful events, we aimed to investigate the types of everyday social support that can be received from artificial agents. Methods: In Study 1, we examined publicly available user reviews (N=1854) of Replika, a popular companion chatbot. In Study 2, a sample (n=66) of Replika users provided detailed open-ended responses regarding their experiences of using Replika. We conducted thematic analysis on both datasets to gain insight into the kind of everyday social support that users receive through interactions with Replika. Results: Replika provides some level of companionship that can help curtail loneliness, provide a ``safe space'' in which users can discuss any topic without the fear of judgment or retaliation, increase positive affect through uplifting and nurturing messages, and provide helpful information/advice when normal sources of informational support are not available. Conclusions: Artificial agents may be a promising source of everyday social support, particularly companionship, emotional, informational, and appraisal support, but not as tangible support. Future studies are needed to determine who might benefit from these types of everyday social support the most and why. These results could potentially be used to help address global health issues or other crises early on in everyday situations before they potentially manifest into larger issues. ", doi="10.2196/16235", url="http://www.jmir.org/2020/2/e16235/", url="https://doi.org/10.2196/16235", url="http://www.ncbi.nlm.nih.gov/pubmed/32141837" } @Article{info:doi/10.2196/16866, author="Wolff, Justus and Pauling, Josch and Keck, Andreas and Baumbach, Jan", title="The Economic Impact of Artificial Intelligence in Health Care: Systematic Review", journal="J Med Internet Res", year="2020", month="Feb", day="20", volume="22", number="2", pages="e16866", keywords="telemedicine; artificial intelligence; machine learning; cost-benefit analysis", abstract="Background: Positive economic impact is a key decision factor in making the case for or against investing in an artificial intelligence (AI) solution in the health care industry. It is most relevant for the care provider and insurer as well as for the pharmaceutical and medical technology sectors. Although the broad economic impact of digital health solutions in general has been assessed many times in literature and the benefit for patients and society has also been analyzed, the specific economic impact of AI in health care has been addressed only sporadically. Objective: This study aimed to systematically review and summarize the cost-effectiveness studies dedicated to AI in health care and to assess whether they meet the established quality criteria. Methods: In a first step, the quality criteria for economic impact studies were defined based on the established and adapted criteria schemes for cost impact assessments. In a second step, a systematic literature review based on qualitative and quantitative inclusion and exclusion criteria was conducted to identify relevant publications for an in-depth analysis of the economic impact assessment. In a final step, the quality of the identified economic impact studies was evaluated based on the defined quality criteria for cost-effectiveness studies. Results: Very few publications have thoroughly addressed the economic impact assessment, and the economic assessment quality of the reviewed publications on AI shows severe methodological deficits. Only 6 out of 66 publications could be included in the second step of the analysis based on the inclusion criteria. Out of these 6 studies, none comprised a methodologically complete cost impact analysis. There are two areas for improvement in future studies. First, the initial investment and operational costs for the AI infrastructure and service need to be included. Second, alternatives to achieve similar impact must be evaluated to provide a comprehensive comparison. Conclusions: This systematic literature analysis proved that the existing impact assessments show methodological deficits and that upcoming evaluations require more comprehensive economic analyses to enable economic decisions for or against implementing AI technology in health care. ", doi="10.2196/16866", url="http://www.jmir.org/2020/2/e16866/", url="https://doi.org/10.2196/16866", url="http://www.ncbi.nlm.nih.gov/pubmed/32130134" } @Article{info:doi/10.2196/17061, author="Li, Xiaojin and Tao, Shiqiang and Jamal-Omidi, Shirin and Huang, Yan and Lhatoo, Samden D and Zhang, Guo-Qiang and Cui, Licong", title="Detection of Postictal Generalized Electroencephalogram Suppression: Random Forest Approach", journal="JMIR Med Inform", year="2020", month="Feb", day="14", volume="8", number="2", pages="e17061", keywords="epilepsy; generalized tonic-clonic seizure; postictal generalized EEG suppression; EEG; random forest", abstract="Background: Sudden unexpected death in epilepsy (SUDEP) is second only to stroke in neurological events resulting in years of potential life lost. Postictal generalized electroencephalogram (EEG) suppression (PGES) is a period of suppressed brain activity often occurring after generalized tonic-clonic seizure, a most significant risk factor for SUDEP. Therefore, PGES has been considered as a potential biomarker for SUDEP risk. Automatic PGES detection tools can address the limitations of labor-intensive, and sometimes inconsistent, visual analysis. A successful approach to automatic PGES detection must overcome computational challenges involved in the detection of subtle amplitude changes in EEG recordings, which may contain physiological and acquisition artifacts. Objective: This study aimed to present a random forest approach for automatic PGES detection using multichannel human EEG recordings acquired in epilepsy monitoring units. Methods: We used a combination of temporal, frequency, wavelet, and interchannel correlation features derived from EEG signals to train a random forest classifier. We also constructed and applied confidence-based correction rules based on PGES state changes. Motivated by practical utility, we introduced a new, time distance--based evaluation method for assessing the performance of PGES detection algorithms. Results: The time distance--based evaluation showed that our approach achieved a 5-second tolerance-based positive prediction rate of 0.95 for artifact-free signals. For signals with different artifact levels, our prediction rates varied from 0.68 to 0.81. Conclusions: We introduced a feature-based, random forest approach for automatic PGES detection using multichannel EEG recordings. Our approach achieved increasingly better time distance--based performance with reduced signal artifact levels. Further study is needed for PGES detection algorithms to perform well irrespective of the levels of signal artifacts. ", doi="10.2196/17061", url="https://medinform.jmir.org/2020/2/e17061", url="https://doi.org/10.2196/17061", url="http://www.ncbi.nlm.nih.gov/pubmed/32130173" } @Article{info:doi/10.2196/15510, author="Song, Xing and Waitman, Lemuel R and Yu, Alan SL and Robbins, David C and Hu, Yong and Liu, Mei", title="Longitudinal Risk Prediction of Chronic Kidney Disease in Diabetic Patients Using a Temporal-Enhanced Gradient Boosting Machine: Retrospective Cohort Study", journal="JMIR Med Inform", year="2020", month="Jan", day="31", volume="8", number="1", pages="e15510", keywords="diabetic kidney disease; diabetic nephropathy; chronic kidney disease; machine learning", abstract="Background: Artificial intelligence--enabled electronic health record (EHR) analysis can revolutionize medical practice from the diagnosis and prediction of complex diseases to making recommendations in patient care, especially for chronic conditions such as chronic kidney disease (CKD), which is one of the most frequent complications in patients with diabetes and is associated with substantial morbidity and mortality. Objective: The longitudinal prediction of health outcomes requires effective representation of temporal data in the EHR. In this study, we proposed a novel temporal-enhanced gradient boosting machine (GBM) model that dynamically updates and ensembles learners based on new events in patient timelines to improve the prediction accuracy of CKD among patients with diabetes. Methods: Using a broad spectrum of deidentified EHR data on a retrospective cohort of 14,039 adult patients with type 2 diabetes and GBM as the base learner, we validated our proposed Landmark-Boosting model against three state-of-the-art temporal models for rolling predictions of 1-year CKD risk. Results: The proposed model uniformly outperformed other models, achieving an area under receiver operating curve of 0.83 (95{\%} CI 0.76-0.85), 0.78 (95{\%} CI 0.75-0.82), and 0.82 (95{\%} CI 0.78-0.86) in predicting CKD risk with automatic accumulation of new data in later years (years 2, 3, and 4 since diabetes mellitus onset, respectively). The Landmark-Boosting model also maintained the best calibration across moderate- and high-risk groups and over time. The experimental results demonstrated that the proposed temporal model can not only accurately predict 1-year CKD risk but also improve performance over time with additionally accumulated data, which is essential for clinical use to improve renal management of patients with diabetes. Conclusions: Incorporation of temporal information in EHR data can significantly improve predictive model performance and will particularly benefit patients who follow-up with their physicians as recommended. ", doi="10.2196/15510", url="http://medinform.jmir.org/2020/1/e15510/", url="https://doi.org/10.2196/15510", url="http://www.ncbi.nlm.nih.gov/pubmed/32012067" } @Article{info:doi/10.2196/14679, author="Meyer, Ashley N D and Giardina, Traber D and Spitzmueller, Christiane and Shahid, Umber and Scott, Taylor M T and Singh, Hardeep", title="Patient Perspectives on the Usefulness of an Artificial Intelligence--Assisted Symptom Checker: Cross-Sectional Survey Study", journal="J Med Internet Res", year="2020", month="Jan", day="30", volume="22", number="1", pages="e14679", keywords="clinical decision support systems; technology; diagnosis; patient safety; symptom checker; computer-assisted diagnosis", abstract="Background: Patients are increasingly seeking Web-based symptom checkers to obtain diagnoses. However, little is known about the characteristics of the patients who use these resources, their rationale for use, and whether they find them accurate and useful. Objective: The study aimed to examine patients' experiences using an artificial intelligence (AI)--assisted online symptom checker. Methods: An online survey was administered between March 2, 2018, through March 15, 2018, to US users of the Isabel Symptom Checker within 6 months of their use. User characteristics, experiences of symptom checker use, experiences discussing results with physicians, and prior personal history of experiencing a diagnostic error were collected. Results: A total of 329 usable responses was obtained. The mean respondent age was 48.0 (SD 16.7) years; most were women (230/304, 75.7{\%}) and white (271/304, 89.1{\%}). Patients most commonly used the symptom checker to better understand the causes of their symptoms (232/304, 76.3{\%}), followed by for deciding whether to seek care (101/304, 33.2{\%}) or where (eg, primary or urgent care: 63/304, 20.7{\%}), obtaining medical advice without going to a doctor (48/304, 15.8{\%}), and understanding their diagnoses better (39/304, 12.8{\%}). Most patients reported receiving useful information for their health problems (274/304, 90.1{\%}), with half reporting positive health effects (154/302, 51.0{\%}). Most patients perceived it to be useful as a diagnostic tool (253/301, 84.1{\%}), as a tool providing insights leading them closer to correct diagnoses (231/303, 76.2{\%}), and reported they would use it again (278/304, 91.4{\%}). Patients who discussed findings with their physicians (103/213, 48.4{\%}) more often felt physicians were interested (42/103, 40.8{\%}) than not interested in learning about the tool's results (24/103, 23.3{\%}) and more often felt physicians were open (62/103, 60.2{\%}) than not open (21/103, 20.4{\%}) to discussing the results. Compared with patients who had not previously experienced diagnostic errors (missed or delayed diagnoses: 123/304, 40.5{\%}), patients who had previously experienced diagnostic errors (181/304, 59.5{\%}) were more likely to use the symptom checker to determine where they should seek care (15/123, 12.2{\%} vs 48/181, 26.5{\%}; P=.002), but they less often felt that physicians were interested in discussing the tool's results (20/34, 59{\%} vs 22/69, 32{\%}; P=.04). Conclusions: Despite ongoing concerns about symptom checker accuracy, a large patient-user group perceived an AI-assisted symptom checker as useful for diagnosis. Formal validation studies evaluating symptom checker accuracy and effectiveness in real-world practice could provide additional useful information about their benefit. ", doi="10.2196/14679", url="http://www.jmir.org/2020/1/e14679/", url="https://doi.org/10.2196/14679", url="http://www.ncbi.nlm.nih.gov/pubmed/32012052" } @Article{info:doi/10.2196/15645, author="Prieto, Jos{\'e} Tom{\'a}s and Scott, Kenneth and McEwen, Dean and Podewils, Laura J and Al-Tayyib, Alia and Robinson, James and Edwards, David and Foldy, Seth and Shlay, Judith C and Davidson, Arthur J", title="The Detection of Opioid Misuse and Heroin Use From Paramedic Response Documentation: Machine Learning for Improved Surveillance", journal="J Med Internet Res", year="2020", month="Jan", day="3", volume="22", number="1", pages="e15645", keywords="naloxone; emergency medical services; natural language processing; heroin; substance-related disorders; opioid crisis; artificial intelligence", abstract="Background: Timely, precise, and localized surveillance of nonfatal events is needed to improve response and prevention of opioid-related problems in an evolving opioid crisis in the United States. Records of naloxone administration found in prehospital emergency medical services (EMS) data have helped estimate opioid overdose incidence, including nonhospital, field-treated cases. However, as naloxone is often used by EMS personnel in unconsciousness of unknown cause, attributing naloxone administration to opioid misuse and heroin use (OM) may misclassify events. Better methods are needed to identify OM. Objective: This study aimed to develop and test a natural language processing method that would improve identification of potential OM from paramedic documentation. Methods: First, we searched Denver Health paramedic trip reports from August 2017 to April 2018 for keywords naloxone, heroin, and both combined, and we reviewed narratives of identified reports to determine whether they constituted true cases of OM. Then, we used this human classification as reference standard and trained 4 machine learning models (random forest, k-nearest neighbors, support vector machines, and L1-regularized logistic regression). We selected the algorithm that produced the highest area under the receiver operating curve (AUC) for model assessment. Finally, we compared positive predictive value (PPV) of the highest performing machine learning algorithm with PPV of searches of keywords naloxone, heroin, and combination of both in the binary classification of OM in unseen September 2018 data. Results: In total, 54,359 trip reports were filed from August 2017 to April 2018. Approximately 1.09{\%} (594/54,359) indicated naloxone administration. Among trip reports with reviewer agreement regarding OM in the narrative, 57.6{\%} (292/516) were considered to include information revealing OM. Approximately 1.63{\%} (884/54,359) of all trip reports mentioned heroin in the narrative. Among trip reports with reviewer agreement, 95.5{\%} (784/821) were considered to include information revealing OM. Combined results accounted for 2.39{\%} (1298/54,359) of trip reports. Among trip reports with reviewer agreement, 77.79{\%} (907/1166) were considered to include information consistent with OM. The reference standard used to train and test machine learning models included details of 1166 trip reports. L1-regularized logistic regression was the highest performing algorithm (AUC=0.94; 95{\%} CI 0.91-0.97) in identifying OM. Tested on 5983 unseen reports from September 2018, the keyword naloxone inaccurately identified and underestimated probable OM trip report cases (63 cases; PPV=0.68). The keyword heroin yielded more cases with improved performance (129 cases; PPV=0.99). Combined keyword and L1-regularized logistic regression classifier further improved performance (146 cases; PPV=0.99). Conclusions: A machine learning application enhanced the effectiveness of finding OM among documented paramedic field responses. This approach to refining OM surveillance may lead to improved first-responder and public health responses toward prevention of overdoses and other opioid-related problems in US communities. ", doi="10.2196/15645", url="https://www.jmir.org/2020/1/e15645", url="https://doi.org/10.2196/15645", url="http://www.ncbi.nlm.nih.gov/pubmed/31899451" } @Article{info:doi/10.2196/13244, author="Holdener, Marianne and Gut, Alain and Angerer, Alfred", title="Applicability of the User Engagement Scale to Mobile Health: A Survey-Based Quantitative Study", journal="JMIR Mhealth Uhealth", year="2020", month="Jan", day="3", volume="8", number="1", pages="e13244", keywords="mobile health; mhealth; mobile apps; user engagement; measurement; user engagement scale; chatbot", abstract="Background: There has recently been exponential growth in the development and use of health apps on mobile phones. As with most mobile apps, however, the majority of users abandon them quickly and after minimal use. One of the most critical factors for the success of a health app is how to support users' commitment to their health. Despite increased interest from researchers in mobile health, few studies have examined the measurement of user engagement with health apps. Objective: User engagement is a multidimensional, complex phenomenon. The aim of this study was to understand the concept of user engagement and, in particular, to demonstrate the applicability of a user engagement scale (UES) to mobile health apps. Methods: To determine the measurability of user engagement in a mobile health context, a UES was employed, which is a psychometric tool to measure user engagement with a digital system. This was adapted to Ada, developed by Ada Health, an artificial intelligence--powered personalized health guide that helps people understand their health. A principal component analysis (PCA) with varimax rotation was conducted on 30 items. In addition, sum scores as means of each subscale were calculated. Results: Survey data from 73 Ada users were analyzed. PCA was determined to be suitable, as verified by the sampling adequacy of Kaiser-Meyer-Olkin=0.858, a significant Bartlett test of sphericity ($\chi$2300=1127.1; P<.001), and communalities mostly within the 0.7 range. Although 5 items had to be removed because of low factor loadings, the results of the remaining 25 items revealed 4 attributes: perceived usability, aesthetic appeal, reward, and focused attention. Ada users showed the highest engagement level with perceived usability, with a value of 294, followed by aesthetic appeal, reward, and focused attention. Conclusions: Although the UES was deployed in German and adapted to another digital domain, PCA yielded consistent subscales and a 4-factor structure. This indicates that user engagement with health apps can be assessed with the German version of the UES. These results can benefit related mobile health app engagement research and may be of importance to marketers and app developers. ", doi="10.2196/13244", url="https://mhealth.jmir.org/2020/1/e13244", url="https://doi.org/10.2196/13244", url="http://www.ncbi.nlm.nih.gov/pubmed/31899454" } @Article{info:doi/10.2196/15381, author="Martin-Hammond, Aqueasha and Vemireddy, Sravani and Rao, Kartik", title="Exploring Older Adults' Beliefs About the Use of Intelligent Assistants for Consumer Health Information Management: A Participatory Design Study", journal="JMIR Aging", year="2019", month="Dec", day="11", volume="2", number="2", pages="e15381", keywords="intelligent assistants; artificial intelligence; chatbots; conversational agents; digital health; elderly; aging in place; participatory design; co-design; health information seeking", abstract="Background: Intelligent assistants (IAs), also known as intelligent agents, use artificial intelligence to help users achieve a goal or complete a task. IAs represent a potential solution for providing older adults with individualized assistance at home, for example, to reduce social isolation, serve as memory aids, or help with disease management. However, to design IAs for health that are beneficial and accepted by older adults, it is important to understand their beliefs about IAs, how they would like to interact with IAs for consumer health, and how they desire to integrate IAs into their homes. Objective: We explore older adults' mental models and beliefs about IAs, the tasks they want IAs to support, and how they would like to interact with IAs for consumer health. For the purpose of this study, we focus on IAs in the context of consumer health information management and search. Methods: We present findings from an exploratory, qualitative study that investigated older adults' perspectives of IAs that aid with consumer health information search and management tasks. Eighteen older adults participated in a multiphase, participatory design workshop in which we engaged them in discussion, brainstorming, and design activities that helped us identify their current challenges managing and finding health information at home. We also explored their beliefs and ideas for an IA to assist them with consumer health tasks. We used participatory design activities to identify areas in which they felt IAs might be useful, but also to uncover the reasoning behind the ideas they presented. Discussions were audio-recorded and later transcribed. We compiled design artifacts collected during the study to supplement researcher transcripts and notes. Thematic analysis was used to analyze data. Results: We found that participants saw IAs as potentially useful for providing recommendations, facilitating collaboration between themselves and other caregivers, and for alerts of serious illness. However, they also desired familiar and natural interactions with IAs (eg, using voice) that could, if need be, provide fluid and unconstrained interactions, reason about their symptoms, and provide information or advice. Other participants discussed the need for flexible IAs that could be used by those with low technical resources or skills. Conclusions: From our findings, we present a discussion of three key components of participants' mental models, including the people, behaviors, and interactions they described that were important for IAs for consumer health information management and seeking. We then discuss the role of access, transparency, caregivers, and autonomy in design for addressing participants' concerns about privacy and trust as well as its role in assisting others that may interact with an IA on the older adults' behalf. International Registered Report Identifier (IRRID): RR2-10.1145/3240925.3240972 ", doi="10.2196/15381", url="http://aging.jmir.org/2019/2/e15381/", url="https://doi.org/10.2196/15381", url="http://www.ncbi.nlm.nih.gov/pubmed/31825322" } @Article{info:doi/10.2196/13430, author="Afzal, Muhammad and Hussain, Maqbool and Malik, Khalid Mahmood and Lee, Sungyoung", title="Impact of Automatic Query Generation and Quality Recognition Using Deep Learning to Curate Evidence From Biomedical Literature: Empirical Study", journal="JMIR Med Inform", year="2019", month="Dec", day="9", volume="7", number="4", pages="e13430", keywords="data curation; evidence-based medicine; clinical decision support systems; precision medicine; biomedical research; machine learning; deep learning", abstract="Background: The quality of health care is continuously improving and is expected to improve further because of the advancement of machine learning and knowledge-based techniques along with innovation and availability of wearable sensors. With these advancements, health care professionals are now becoming more interested and involved in seeking scientific research evidence from external sources for decision making relevant to medical diagnosis, treatments, and prognosis. Not much work has been done to develop methods for unobtrusive and seamless curation of data from the biomedical literature. Objective: This study aimed to design a framework that can enable bringing quality publications intelligently to the users' desk to assist medical practitioners in answering clinical questions and fulfilling their informational needs. Methods: The proposed framework consists of methods for efficient biomedical literature curation, including the automatic construction of a well-built question, the recognition of evidence quality by proposing extended quality recognition model (E-QRM), and the ranking and summarization of the extracted evidence. Results: Unlike previous works, the proposed framework systematically integrates the echelons of biomedical literature curation by including methods for searching queries, content quality assessments, and ranking and summarization. Using an ensemble approach, our high-impact classifier E-QRM obtained significantly improved accuracy than the existing quality recognition model (1723/1894, 90.97{\%} vs 1462/1894, 77.21{\%}). Conclusions: Our proposed methods and evaluation demonstrate the validity and rigorousness of the results, which can be used in different applications, including evidence-based medicine, precision medicine, and medical education. ", doi="10.2196/13430", url="http://medinform.jmir.org/2019/4/e13430/", url="https://doi.org/10.2196/13430", url="http://www.ncbi.nlm.nih.gov/pubmed/31815673" } @Article{info:doi/10.2196/16048, author="Paranjape, Ketan and Schinkel, Michiel and Nannan Panday, Rishi and Car, Josip and Nanayakkara, Prabath", title="Introducing Artificial Intelligence Training in Medical Education", journal="JMIR Med Educ", year="2019", month="Dec", day="3", volume="5", number="2", pages="e16048", keywords="algorithm; artificial intelligence; black box; deep learning; machine learning; medical education; continuing education; data sciences; curriculum", doi="10.2196/16048", url="http://mededu.jmir.org/2019/2/e16048/", url="https://doi.org/10.2196/16048", url="http://www.ncbi.nlm.nih.gov/pubmed/31793895" } @Article{info:doi/10.2196/15406, author="Fernandes, Chrystinne Oliveira and Miles, Simon and Lucena, Carlos Jos{\'e} Pereira De and Cowan, Donald", title="Artificial Intelligence Technologies for Coping with Alarm Fatigue in Hospital Environments Because of Sensory Overload: Algorithm Development and Validation", journal="J Med Internet Res", year="2019", month="Nov", day="26", volume="21", number="11", pages="e15406", keywords="alert fatigue health personnel; health information systems; patient monitoring; alert systems; artificial intelligence", abstract="Background: Informed estimates claim that 80{\%} to 99{\%} of alarms set off in hospital units are false or clinically insignificant, representing a cacophony of sounds that do not present a real danger to patients. These false alarms can lead to an alert overload that causes a health care provider to miss important events that could be harmful or even life-threatening. As health care units become more dependent on monitoring devices for patient care purposes, the alarm fatigue issue has to be addressed as a major concern for the health care team as well as to enhance patient safety. Objective: The main goal of this paper was to propose a feasible solution for the alarm fatigue problem by using an automatic reasoning mechanism to decide how to notify members of the health care team. The aim was to reduce the number of notifications sent by determining whether or not to group a set of alarms that occur over a short period of time to deliver them together, without compromising patient safety. Methods: This paper describes: (1) a model for supporting reasoning algorithms that decide how to notify caregivers to avoid alarm fatigue; (2) an architecture for health systems that support patient monitoring and notification capabilities; and (3) a reasoning algorithm that specifies how to notify caregivers by deciding whether to aggregate a group of alarms to avoid alarm fatigue. Results: Experiments were used to demonstrate that providing a reasoning system can reduce the notifications received by the caregivers by up to 99.3{\%} (582/586) of the total alarms generated. Our experiments were evaluated through the use of a dataset comprising patient monitoring data and vital signs recorded during 32 surgical cases where patients underwent anesthesia at the Royal Adelaide Hospital. We present the results of our algorithm by using graphs we generated using the R language, where we show whether the algorithm decided to deliver an alarm immediately or after a delay. Conclusions: The experimental results strongly suggest that this reasoning algorithm is a useful strategy for avoiding alarm fatigue. Although we evaluated our algorithm in an experimental environment, we tried to reproduce the context of a clinical environment by using real-world patient data. Our future work is to reproduce the evaluation study based on more realistic clinical conditions by increasing the number of patients, monitoring parameters, and types of alarm. ", doi="10.2196/15406", url="http://www.jmir.org/2019/11/e15406/", url="https://doi.org/10.2196/15406", url="http://www.ncbi.nlm.nih.gov/pubmed/31769762" } @Article{info:doi/10.2196/16295, author="Mesk{\'o}, Bertalan", title="The Real Era of the Art of Medicine Begins with Artificial Intelligence", journal="J Med Internet Res", year="2019", month="Nov", day="18", volume="21", number="11", pages="e16295", keywords="future; artificial intelligence; digital health; technology; art of medicine", doi="10.2196/16295", url="http://www.jmir.org/2019/11/e16295/", url="https://doi.org/10.2196/16295", url="http://www.ncbi.nlm.nih.gov/pubmed/31738169" } @Article{info:doi/10.2196/14245, author="Piau, Antoine and Lepage, Benoit and Bernon, Carole and Gleizes, Marie-Pierre and Nourhashemi, Fati", title="Real-Time Detection of Behavioral Anomalies of Older People Using Artificial Intelligence (The 3-PEGASE Study): Protocol for a Real-Life Prospective Trial", journal="JMIR Res Protoc", year="2019", month="Nov", day="18", volume="8", number="11", pages="e14245", keywords="frailty; monitoring; sensors; artificial intelligence; older adults; participatory design", abstract="Background: Most frail older persons are living at home, and we face difficulties in achieving seamless monitoring to detect adverse health changes. Even more important, this lack of follow-up could have a negative impact on the living choices made by older individuals and their care partners. People could give up their homes for the more reassuring environment of a medicalized living facility. We have developed a low-cost unobtrusive sensor-based solution to trigger automatic alerts in case of an acute event or subtle changes over time. It could facilitate older adults' follow-up in their own homes, and thus support independent living. Objective: The primary objective of this prospective open-label study is to evaluate the relevance of the automatic alerts generated by our artificial intelligence--driven monitoring solution as judged by the recipients: older adults, caregivers, and professional support workers. The secondary objective is to evaluate its ability to detect subtle functional and cognitive decline and major medical events. Methods: The primary outcome will be evaluated for each successive 2-month follow-up period to estimate the progression of our learning algorithm performance over time. In total, 25 frail or disabled participants, aged 75 years and above and living alone in their own homes, will be enrolled for a 6-month follow-up period. Results: The first phase with 5 participants for a 4-month feasibility period has been completed and the expected completion date for the second phase of the study (20 participants for 6 months) is July 2020. Conclusions: The originality of our real-life project lies in the choice of the primary outcome and in our user-centered evaluation. We will evaluate the relevance of the alerts and the algorithm performance over time according to the end users. The first-line recipients of the information are the older adults and their care partners rather than health care professionals. Despite the fast pace of electronic health devices development, few studies have addressed the specific everyday needs of older adults and their families. Trial Registration: ClinicalTrials.gov NCT03484156; https://clinicaltrials.gov/ct2/show/NCT03484156 International Registered Report Identifier (IRRID): PRR1-10.2196/14245 ", doi="10.2196/14245", url="http://www.researchprotocols.org/2019/11/e14245/", url="https://doi.org/10.2196/14245", url="http://www.ncbi.nlm.nih.gov/pubmed/31738180" } @Article{info:doi/10.2196/16607, author="Lovis, Christian", title="Unlocking the Power of Artificial Intelligence and Big Data in Medicine", journal="J Med Internet Res", year="2019", month="Nov", day="8", volume="21", number="11", pages="e16607", keywords="medical informatics; artificial intelligence; big data", doi="10.2196/16607", url="https://www.jmir.org/2019/11/e16607", url="https://doi.org/10.2196/16607", url="http://www.ncbi.nlm.nih.gov/pubmed/31702565" } @Article{info:doi/10.2196/15360, author="Kocaballi, Ahmet Baki and Berkovsky, Shlomo and Quiroz, Juan C and Laranjo, Liliana and Tong, Huong Ly and Rezazadegan, Dana and Briatore, Agustina and Coiera, Enrico", title="The Personalization of Conversational Agents in Health Care: Systematic Review", journal="J Med Internet Res", year="2019", month="Nov", day="7", volume="21", number="11", pages="e15360", keywords="conversational interfaces; conversational agents; dialogue systems; personalization; customization; adaptive systems; health care", abstract="Background: The personalization of conversational agents with natural language user interfaces is seeing increasing use in health care applications, shaping the content, structure, or purpose of the dialogue between humans and conversational agents. Objective: The goal of this systematic review was to understand the ways in which personalization has been used with conversational agents in health care and characterize the methods of its implementation. Methods: We searched on PubMed, Embase, CINAHL, PsycInfo, and ACM Digital Library using a predefined search strategy. The studies were included if they: (1) were primary research studies that focused on consumers, caregivers, or health care professionals; (2) involved a conversational agent with an unconstrained natural language interface; (3) tested the system with human subjects; and (4) implemented personalization features. Results: The search found 1958 publications. After abstract and full-text screening, 13 studies were included in the review. Common examples of personalized content included feedback, daily health reports, alerts, warnings, and recommendations. The personalization features were implemented without a theoretical framework of customization and with limited evaluation of its impact. While conversational agents with personalization features were reported to improve user satisfaction, user engagement and dialogue quality, the role of personalization in improving health outcomes was not assessed directly. Conclusions: Most of the studies in our review implemented the personalization features without theoretical or evidence-based support for them and did not leverage the recent developments in other domains of personalization. Future research could incorporate personalization as a distinct design factor with a more careful consideration of its impact on health outcomes and its implications on patient safety, privacy, and decision-making. ", doi="10.2196/15360", url="https://www.jmir.org/2019/11/e15360", url="https://doi.org/10.2196/15360", url="http://www.ncbi.nlm.nih.gov/pubmed/31697237" } @Article{info:doi/10.2196/15511, author="Tran, Bach Xuan and Nghiem, Son and Sahin, Oz and Vu, Tuan Manh and Ha, Giang Hai and Vu, Giang Thu and Pham, Hai Quang and Do, Hoa Thi and Latkin, Carl A and Tam, Wilson and Ho, Cyrus S H and Ho, Roger C M", title="Modeling Research Topics for Artificial Intelligence Applications in Medicine: Latent Dirichlet Allocation Application Study", journal="J Med Internet Res", year="2019", month="Nov", day="1", volume="21", number="11", pages="e15511", keywords="artificial intelligence; applications; medicine; scientometric; bibliometric; latent Dirichlet allocation", abstract="Background: Artificial intelligence (AI)--based technologies develop rapidly and have myriad applications in medicine and health care. However, there is a lack of comprehensive reporting on the productivity, workflow, topics, and research landscape of AI in this field. Objective: This study aimed to evaluate the global development of scientific publications and constructed interdisciplinary research topics on the theory and practice of AI in medicine from 1977 to 2018. Methods: We obtained bibliographic data and abstract contents of publications published between 1977 and 2018 from the Web of Science database. A total of 27,451 eligible articles were analyzed. Research topics were classified by latent Dirichlet allocation, and principal component analysis was used to identify the construct of the research landscape. Results: The applications of AI have mainly impacted clinical settings (enhanced prognosis and diagnosis, robot-assisted surgery, and rehabilitation), data science and precision medicine (collecting individual data for precision medicine), and policy making (raising ethical and legal issues, especially regarding privacy and confidentiality of data). However, AI applications have not been commonly used in resource-poor settings due to the limit in infrastructure and human resources. Conclusions: The application of AI in medicine has grown rapidly and focuses on three leading platforms: clinical practices, clinical material, and policies. AI might be one of the methods to narrow down the inequality in health care and medicine between developing and developed countries. Technology transfer and support from developed countries are essential measures for the advancement of AI application in health care in developing countries. ", doi="10.2196/15511", url="https://www.jmir.org/2019/11/e15511", url="https://doi.org/10.2196/15511", url="http://www.ncbi.nlm.nih.gov/pubmed/31682577" } @Article{info:doi/10.2196/14452, author="Faruqui, Syed Hasib Akhter and Du, Yan and Meka, Rajitha and Alaeddini, Adel and Li, Chengdong and Shirinkam, Sara and Wang, Jing", title="Development of a Deep Learning Model for Dynamic Forecasting of Blood Glucose Level for Type 2 Diabetes Mellitus: Secondary Analysis of a Randomized Controlled Trial", journal="JMIR Mhealth Uhealth", year="2019", month="Nov", day="1", volume="7", number="11", pages="e14452", keywords="type 2 diabetes; long short-term memory (LSTM)-based recurrent neural networks (RNNs); glucose level prediction; mobile health lifestyle data", abstract="Background: Type 2 diabetes mellitus (T2DM) is a major public health burden. Self-management of diabetes including maintaining a healthy lifestyle is essential for glycemic control and to prevent diabetes complications. Mobile-based health data can play an important role in the forecasting of blood glucose levels for lifestyle management and control of T2DM. Objective: The objective of this work was to dynamically forecast daily glucose levels in patients with T2DM based on their daily mobile health lifestyle data including diet, physical activity, weight, and glucose level from the day before. Methods: We used data from 10 T2DM patients who were overweight or obese in a behavioral lifestyle intervention using mobile tools for daily monitoring of diet, physical activity, weight, and blood glucose over 6 months. We developed a deep learning model based on long short-term memory--based recurrent neural networks to forecast the next-day glucose levels in individual patients. The neural network used several layers of computational nodes to model how mobile health data (food intake including consumed calories, fat, and carbohydrates; exercise; and weight) were progressing from one day to another from noisy data. Results: The model was validated based on a data set of 10 patients who had been monitored daily for over 6 months. The proposed deep learning model demonstrated considerable accuracy in predicting the next day glucose level based on Clark Error Grid and {\textpm}10{\%} range of the actual values. Conclusions: Using machine learning methodologies may leverage mobile health lifestyle data to develop effective individualized prediction plans for T2DM management. However, predicting future glucose levels is challenging as glucose level is determined by multiple factors. Future study with more rigorous study design is warranted to better predict future glucose levels for T2DM management. ", doi="10.2196/14452", url="https://mhealth.jmir.org/2019/11/e14452", url="https://doi.org/10.2196/14452", url="http://www.ncbi.nlm.nih.gov/pubmed/31682586" } @Article{info:doi/10.2196/15980, author="Spasic, Irena and Krzeminski, Dominik and Corcoran, Padraig and Balinsky, Alexander", title="Cohort Selection for Clinical Trials From Longitudinal Patient Records: Text Mining Approach", journal="JMIR Med Inform", year="2019", month="Oct", day="31", volume="7", number="4", pages="e15980", keywords="natural language processing; machine learning; electronic medical records; clinical trial; eligibility determination", abstract="Background: Clinical trials are an important step in introducing new interventions into clinical practice by generating data on their safety and efficacy. Clinical trials need to ensure that participants are similar so that the findings can be attributed to the interventions studied and not to some other factors. Therefore, each clinical trial defines eligibility criteria, which describe characteristics that must be shared by the participants. Unfortunately, the complexities of eligibility criteria may not allow them to be translated directly into readily executable database queries. Instead, they may require careful analysis of the narrative sections of medical records. Manual screening of medical records is time consuming, thus negatively affecting the timeliness of the recruitment process. Objective: Track 1 of the 2018 National Natural Language Processing Clinical Challenge focused on the task of cohort selection for clinical trials, aiming to answer the following question: Can natural language processing be applied to narrative medical records to identify patients who meet eligibility criteria for clinical trials? The task required the participating systems to analyze longitudinal patient records to determine if the corresponding patients met the given eligibility criteria. We aimed to describe a system developed to address this task. Methods: Our system consisted of 13 classifiers, one for each eligibility criterion. All classifiers used a bag-of-words document representation model. To prevent the loss of relevant contextual information associated with such representation, a pattern-matching approach was used to extract context-sensitive features. They were embedded back into the text as lexically distinguishable tokens, which were consequently featured in the bag-of-words representation. Supervised machine learning was chosen wherever a sufficient number of both positive and negative instances was available to learn from. A rule-based approach focusing on a small set of relevant features was chosen for the remaining criteria. Results: The system was evaluated using microaveraged F measure. Overall, 4 machine algorithms, including support vector machine, logistic regression, na{\"i}ve Bayesian classifier, and gradient tree boosting (GTB), were evaluated on the training data using 10--fold cross-validation. Overall, GTB demonstrated the most consistent performance. Its performance peaked when oversampling was used to balance the training data. The final evaluation was performed on previously unseen test data. On average, the F measure of 89.04{\%} was comparable to 3 of the top ranked performances in the shared task (91.11{\%}, 90.28{\%}, and 90.21{\%}). With an F measure of 88.14{\%}, we significantly outperformed these systems (81.03{\%}, 78.50{\%}, and 70.81{\%}) in identifying patients with advanced coronary artery disease. Conclusions: The holdout evaluation provides evidence that our system was able to identify eligible patients for the given clinical trial with high accuracy. Our approach demonstrates how rule-based knowledge infusion can improve the performance of machine learning algorithms even when trained on a relatively small dataset. ", doi="10.2196/15980", url="http://medinform.jmir.org/2019/4/e15980/", url="https://doi.org/10.2196/15980", url="http://www.ncbi.nlm.nih.gov/pubmed/31674914" } @Article{info:doi/10.2196/16222, author="Powell, John", title="Trust Me, I'm a Chatbot: How Artificial Intelligence in Health Care Fails the Turing Test", journal="J Med Internet Res", year="2019", month="Oct", day="28", volume="21", number="10", pages="e16222", keywords="artificial intelligence; machine learning; medical informatics; digital health; ehealth; chatbots; conversational agents", doi="10.2196/16222", url="http://www.jmir.org/2019/10/e16222/", url="https://doi.org/10.2196/16222", url="http://www.ncbi.nlm.nih.gov/pubmed/31661083" } @Article{info:doi/10.2196/14166, author="Gaffney, Hannah and Mansell, Warren and Tai, Sara", title="Conversational Agents in the Treatment of Mental Health Problems: Mixed-Method Systematic Review", journal="JMIR Ment Health", year="2019", month="Oct", day="18", volume="6", number="10", pages="e14166", keywords="artificial intelligence; mental health; stress, pychological; psychiatry; therapy, computer-assisted; conversational agent; chatbot; digital health", abstract="Background: The use of conversational agent interventions (including chatbots and robots) in mental health is growing at a fast pace. Recent existing reviews have focused exclusively on a subset of embodied conversational agent interventions despite other modalities aiming to achieve the common goal of improved mental health. Objective: This study aimed to review the use of conversational agent interventions in the treatment of mental health problems. Methods: We performed a systematic search using relevant databases (MEDLINE, EMBASE, PsycINFO, Web of Science, and Cochrane library). Studies that reported on an autonomous conversational agent that simulated conversation and reported on a mental health outcome were included. Results: A total of 13 studies were included in the review. Among them, 4 full-scale randomized controlled trials (RCTs) were included. The rest were feasibility, pilot RCTs and quasi-experimental studies. Interventions were diverse in design and targeted a range of mental health problems using a wide variety of therapeutic orientations. All included studies reported reductions in psychological distress postintervention. Furthermore, 5 controlled studies demonstrated significant reductions in psychological distress compared with inactive control groups. In addition, 3 controlled studies comparing interventions with active control groups failed to demonstrate superior effects. Broader utility in promoting well-being in nonclinical populations was unclear. Conclusions: The efficacy and acceptability of conversational agent interventions for mental health problems are promising. However, a more robust experimental design is required to demonstrate efficacy and efficiency. A focus on streamlining interventions, demonstrating equivalence to other treatment modalities, and elucidating mechanisms of action has the potential to increase acceptance by users and clinicians and maximize reach. ", doi="10.2196/14166", url="https://mental.jmir.org/2019/10/e14166", url="https://doi.org/10.2196/14166", url="http://www.ncbi.nlm.nih.gov/pubmed/31628789" } @Article{info:doi/10.2196/14316, author="Ye, Tiantian and Xue, Jiaolong and He, Mingguang and Gu, Jing and Lin, Haotian and Xu, Bin and Cheng, Yu", title="Psychosocial Factors Affecting Artificial Intelligence Adoption in Health Care in China: Cross-Sectional Study", journal="J Med Internet Res", year="2019", month="Oct", day="17", volume="21", number="10", pages="e14316", keywords="artificial intelligence; adoption; technology acceptance model; structural equation model; intention; subjective norms; trust; moderation", abstract="Background: Poor quality primary health care is a major issue in China, particularly in blindness prevention. Artificial intelligence (AI) could provide early screening and accurate auxiliary diagnosis to improve primary care services and reduce unnecessary referrals, but the application of AI in medical settings is still an emerging field. Objective: This study aimed to investigate the general public's acceptance of ophthalmic AI devices, with reference to those already used in China, and the interrelated influencing factors that shape people's intention to use these devices. Methods: We proposed a model of ophthalmic AI acceptance based on technology acceptance theories and variables from other health care--related studies. The model was verified via a 32-item questionnaire with 7-point Likert scales completed by 474 respondents (nationally random sampled). Structural equation modeling was used to evaluate item and construct reliability and validity via a confirmatory factor analysis, and the model's path effects, significance, goodness of fit, and mediation and moderation effects were analyzed. Results: Standardized factor loadings of items were between 0.583 and 0.876. Composite reliability of 9 constructs ranged from 0.673 to 0.841. The discriminant validity of all constructs met the Fornell and Larcker criteria. Model fit indicators such as standardized root mean square residual (0.057), comparative fit index (0.915), and root mean squared error of approximation (0.049) demonstrated good fit. Intention to use (R2=0.515) is significantly affected by subjective norms (beta=.408; P<.001), perceived usefulness (beta=.336; P=.03), and resistance bias (beta=--.237; P=.02). Subjective norms and perceived behavior control had an indirect impact on intention to use through perceived usefulness and perceived ease of use. Eye health consciousness had an indirect positive effect on intention to use through perceived usefulness. Trust had a significant moderation effect (beta=--.095; P=.049) on the effect path of perceived usefulness to intention to use. Conclusions: The item, construct, and model indicators indicate reliable interpretation power and help explain the levels of public acceptance of ophthalmic AI devices in China. The influence of subjective norms can be linked to Confucian culture, collectivism, authoritarianism, and conformity mentality in China. Overall, the use of AI in diagnostics and clinical laboratory analysis is underdeveloped, and the Chinese public are generally mistrustful of medical staff and the Chinese medical system. Stakeholders such as doctors and AI suppliers should therefore avoid making misleading or over-exaggerated claims in the promotion of AI health care products. ", doi="10.2196/14316", url="http://www.jmir.org/2019/10/e14316/", url="https://doi.org/10.2196/14316", url="http://www.ncbi.nlm.nih.gov/pubmed/31625950" } @Article{info:doi/10.2196/14806, author="Peine, Arne and Hallawa, Ahmed and Sch{\"o}ffski, Oliver and Dartmann, Guido and Fazlic, Lejla Begic and Schmeink, Anke and Marx, Gernot and Martin, Lukas", title="A Deep Learning Approach for Managing Medical Consumable Materials in Intensive Care Units via Convolutional Neural Networks: Technical Proof-of-Concept Study", journal="JMIR Med Inform", year="2019", month="Oct", day="10", volume="7", number="4", pages="e14806", keywords="convolutional neural networks; deep learning, critical care; intensive care; image recognition; medical economics; medical consumables; artificial intelligence; machine learning", abstract="Background: High numbers of consumable medical materials (eg, sterile needles and swabs) are used during the daily routine of intensive care units (ICUs) worldwide. Although medical consumables largely contribute to total ICU hospital expenditure, many hospitals do not track the individual use of materials. Current tracking solutions meeting the specific requirements of the medical environment, like barcodes or radio frequency identification, require specialized material preparation and high infrastructure investment. This impedes the accurate prediction of consumption, leads to high storage maintenance costs caused by large inventories, and hinders scientific work due to inaccurate documentation. Thus, new cost-effective and contactless methods for object detection are urgently needed. Objective: The goal of this work was to develop and evaluate a contactless visual recognition system for tracking medical consumable materials in ICUs using a deep learning approach on a distributed client-server architecture. Methods: We developed Consumabot, a novel client-server optical recognition system for medical consumables, based on the convolutional neural network model MobileNet implemented in Tensorflow. The software was designed to run on single-board computer platforms as a detection unit. The system was trained to recognize 20 different materials in the ICU, while 100 sample images of each consumable material were provided. We assessed the top-1 recognition rates in the context of different real-world ICU settings: materials presented to the system without visual obstruction, 50{\%} covered materials, and scenarios of multiple items. We further performed an analysis of variance with repeated measures to quantify the effect of adverse real-world circumstances. Results: Consumabot reached a >99{\%} reliability of recognition after about 60 steps of training and 150 steps of validation. A desirable low cross entropy of <0.03 was reached for the training set after about 100 iteration steps and after 170 steps for the validation set. The system showed a high top-1 mean recognition accuracy in a real-world scenario of 0.85 (SD 0.11) for objects presented to the system without visual obstruction. Recognition accuracy was lower, but still acceptable, in scenarios where the objects were 50{\%} covered (P<.001; mean recognition accuracy 0.71; SD 0.13) or multiple objects of the target group were present (P=.01; mean recognition accuracy 0.78; SD 0.11), compared to a nonobstructed view. The approach met the criteria of absence of explicit labeling (eg, barcodes, radio frequency labeling) while maintaining a high standard for quality and hygiene with minimal consumption of resources (eg, cost, time, training, and computational power). Conclusions: Using a convolutional neural network architecture, Consumabot consistently achieved good results in the classification of consumables and thus is a feasible way to recognize and register medical consumables directly to a hospital's electronic health record. The system shows limitations when the materials are partially covered, therefore identifying characteristics of the consumables are not presented to the system. Further development of the assessment in different medical circumstances is needed. ", doi="10.2196/14806", url="http://medinform.jmir.org/2019/4/e14806/", url="https://doi.org/10.2196/14806", url="http://www.ncbi.nlm.nih.gov/pubmed/31603430" } @Article{info:doi/10.2196/14401, author="Tran, Bach Xuan and Latkin, Carl A and Sharafeldin, Noha and Nguyen, Katherina and Vu, Giang Thu and Tam, Wilson W S and Cheung, Ngai-Man and Nguyen, Huong Lan Thi and Ho, Cyrus S H and Ho, Roger C M", title="Characterizing Artificial Intelligence Applications in Cancer Research: A Latent Dirichlet Allocation Analysis", journal="JMIR Med Inform", year="2019", month="Sep", day="15", volume="7", number="4", pages="e14401", keywords="scientometrics; cancer; artificial intelligence; global; mapping", abstract="Background: Artificial intelligence (AI)--based therapeutics, devices, and systems are vital innovations in cancer control; particularly, they allow for diagnosis, screening, precise estimation of survival, informing therapy selection, and scaling up treatment services in a timely manner. Objective: The aim of this study was to analyze the global trends, patterns, and development of interdisciplinary landscapes in AI and cancer research. Methods: An exploratory factor analysis was conducted to identify research domains emerging from abstract contents. The Jaccard similarity index was utilized to identify the most frequently co-occurring terms. Latent Dirichlet Allocation was used for classifying papers into corresponding topics. Results: From 1991 to 2018, the number of studies examining the application of AI in cancer care has grown to 3555 papers covering therapeutics, capacities, and factors associated with outcomes. Topics with the highest volume of publications include (1) machine learning, (2) comparative effectiveness evaluation of AI-assisted medical therapies, and (3) AI-based prediction. Noticeably, this classification has revealed topics examining the incremental effectiveness of AI applications, the quality of life, and functioning of patients receiving these innovations. The growing research productivity and expansion of multidisciplinary approaches are largely driven by machine learning, artificial neural networks, and AI in various clinical practices. Conclusions: The research landscapes show that the development of AI in cancer care is focused on not only improving prediction in cancer screening and AI-assisted therapeutics but also on improving other corresponding areas such as precision and personalized medicine and patient-reported outcomes. ", doi="10.2196/14401", url="https://medinform.jmir.org/2019/4/e14401", url="https://doi.org/10.2196/14401", url="http://www.ncbi.nlm.nih.gov/pubmed/31573929" } @Article{info:doi/10.2196/12163, author="Sena, Gabrielle Ribeiro and Lima, Tiago Pessoa Ferreira and Mello, Maria Julia Gon{\c{c}}alves and Thuler, Luiz Claudio Santos and Lima, Jurema Telles Oliveira", title="Developing Machine Learning Algorithms for the Prediction of Early Death in Elderly Cancer Patients: Usability Study", journal="JMIR Cancer", year="2019", month="Sep", day="26", volume="5", number="2", pages="e12163", keywords="geriatric assessment; aged; machine learning; medical oncology; death", abstract="Background: The importance of classifying cancer patients into high- or low-risk groups has led many research teams, from the biomedical and bioinformatics fields, to study the application of machine learning (ML) algorithms. The International Society of Geriatric Oncology recommends the use of the comprehensive geriatric assessment (CGA), a multidisciplinary tool to evaluate health domains, for the follow-up of elderly cancer patients. However, no applications of ML have been proposed using CGA to classify elderly cancer patients. Objective: The aim of this study was to propose and develop predictive models, using ML and CGA, to estimate the risk of early death in elderly cancer patients. Methods: The ability of ML algorithms to predict early mortality in a cohort involving 608 elderly cancer patients was evaluated. The CGA was conducted during admission by a multidisciplinary team and included the following questionnaires: mini-mental state examination (MMSE), geriatric depression scale-short form, international physical activity questionnaire-short form, timed up and go, Katz index of independence in activities of daily living, Charlson comorbidity index, Karnofsky performance scale (KPS), polypharmacy, and mini nutritional assessment-short form (MNA-SF). The 10-fold cross-validation algorithm was used to evaluate all possible combinations of these questionnaires to estimate the risk of early death, considered when occurring within 6 months of diagnosis, in a variety of ML classifiers, including Naive Bayes (NB), decision tree algorithm J48 (J48), and multilayer perceptron (MLP). On each fold of evaluation, tiebreaking is handled by choosing the smallest set of questionnaires. Results: It was possible to select CGA questionnaire subsets with high predictive capacity for early death, which were either statistically similar (NB) or higher (J48 and MLP) when compared with the use of all questionnaires investigated. These results show that CGA questionnaire selection can improve accuracy rates and decrease the time spent to evaluate elderly cancer patients. Conclusions: A simplified predictive model aiming to estimate the risk of early death in elderly cancer patients is proposed herein, minimally composed by the MNA-SF and KPS. We strongly recommend that these questionnaires be incorporated into regular geriatric assessment of older patients with cancer. ", doi="10.2196/12163", url="https://cancer.jmir.org/2019/2/e12163", url="https://doi.org/10.2196/12163", url="http://www.ncbi.nlm.nih.gov/pubmed/31573896" } @Article{info:doi/10.2196/14830, author="Li, Fei and Jin, Yonghao and Liu, Weisong and Rawat, Bhanu Pratap Singh and Cai, Pengshan and Yu, Hong", title="Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)--Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study", journal="JMIR Med Inform", year="2019", month="Sep", day="12", volume="7", number="3", pages="e14830", keywords="natural language processing; entity normalization; deep learning; electronic health record note; BERT", abstract="Background: The bidirectional encoder representations from transformers (BERT) model has achieved great success in many natural language processing (NLP) tasks, such as named entity recognition and question answering. However, little prior work has explored this model to be used for an important task in the biomedical and clinical domains, namely entity normalization. Objective: We aim to investigate the effectiveness of BERT-based models for biomedical or clinical entity normalization. In addition, our second objective is to investigate whether the domains of training data influence the performances of BERT-based models as well as the degree of influence. Methods: Our data was comprised of 1.5 million unlabeled electronic health record (EHR) notes. We first fine-tuned BioBERT on this large collection of unlabeled EHR notes. This generated our BERT-based model trained using 1.5 million electronic health record notes (EhrBERT). We then further fine-tuned EhrBERT, BioBERT, and BERT on three annotated corpora for biomedical and clinical entity normalization: the Medication, Indication, and Adverse Drug Events (MADE) 1.0 corpus, the National Center for Biotechnology Information (NCBI) disease corpus, and the Chemical-Disease Relations (CDR) corpus. We compared our models with two state-of-the-art normalization systems, namely MetaMap and disease name normalization (DNorm). Results: EhrBERT achieved 40.95{\%} F1 in the MADE 1.0 corpus for mapping named entities to the Medical Dictionary for Regulatory Activities and the Systematized Nomenclature of Medicine---Clinical Terms (SNOMED-CT), which have about 380,000 terms. In this corpus, EhrBERT outperformed MetaMap by 2.36{\%} in F1. For the NCBI disease corpus and CDR corpus, EhrBERT also outperformed DNorm by improving the F1 scores from 88.37{\%} and 89.92{\%} to 90.35{\%} and 93.82{\%}, respectively. Compared with BioBERT and BERT, EhrBERT outperformed them on the MADE 1.0 corpus and the CDR corpus. Conclusions: Our work shows that BERT-based models have achieved state-of-the-art performance for biomedical and clinical entity normalization. BERT-based models can be readily fine-tuned to normalize any kind of named entities. ", doi="10.2196/14830", url="http://medinform.jmir.org/2019/3/e14830/", url="https://doi.org/10.2196/14830", url="http://www.ncbi.nlm.nih.gov/pubmed/31516126" } @Article{info:doi/10.2196/11966, author="Tobore, Igbe and Li, Jingzhen and Yuhang, Liu and Al-Handarish, Yousef and Kandwal, Abhishek and Nie, Zedong and Wang, Lei", title="Deep Learning Intervention for Health Care Challenges: Some Biomedical Domain Considerations", journal="JMIR Mhealth Uhealth", year="2019", month="Aug", day="02", volume="7", number="8", pages="e11966", keywords="machine learning; deep learning; big data; mHealth; medical imaging; electronic health record; biologicals; biomedical; ECG; EEG; artificial intelligence", doi="10.2196/11966", url="https://mhealth.jmir.org/2019/8/e11966/", url="https://doi.org/10.2196/11966", url="http://www.ncbi.nlm.nih.gov/pubmed/31376272" } @Article{info:doi/10.2196/14499, author="Lin, Chin and Lou, Yu-Sheng and Tsai, Dung-Jang and Lee, Chia-Cheng and Hsu, Chia-Jung and Wu, Ding-Chung and Wang, Mei-Chuen and Fang, Wen-Hui", title="Projection Word Embedding Model With Hybrid Sampling Training for Classifying ICD-10-CM Codes: Longitudinal Observational Study", journal="JMIR Med Inform", year="2019", month="Jul", day="23", volume="7", number="3", pages="e14499", keywords="word embedding; convolutional neural network; artificial intelligence; natural language processing; electronic health records", abstract="Background: Most current state-of-the-art models for searching the International Classification of Diseases, Tenth Revision Clinical Modification (ICD-10-CM) codes use word embedding technology to capture useful semantic properties. However, they are limited by the quality of initial word embeddings. Word embedding trained by electronic health records (EHRs) is considered the best, but the vocabulary diversity is limited by previous medical records. Thus, we require a word embedding model that maintains the vocabulary diversity of open internet databases and the medical terminology understanding of EHRs. Moreover, we need to consider the particularity of the disease classification, wherein discharge notes present only positive disease descriptions. Objective: We aimed to propose a projection word2vec model and a hybrid sampling method. In addition, we aimed to conduct a series of experiments to validate the effectiveness of these methods. Methods: We compared the projection word2vec model and traditional word2vec model using two corpora sources: English Wikipedia and PubMed journal abstracts. We used seven published datasets to measure the medical semantic understanding of the word2vec models and used these embeddings to identify the three--character-level ICD-10-CM diagnostic codes in a set of discharge notes. On the basis of embedding technology improvement, we also tried to apply the hybrid sampling method to improve accuracy. The 94,483 labeled discharge notes from the Tri-Service General Hospital of Taipei, Taiwan, from June 1, 2015, to June 30, 2017, were used. To evaluate the model performance, 24,762 discharge notes from July 1, 2017, to December 31, 2017, from the same hospital were used. Moreover, 74,324 additional discharge notes collected from seven other hospitals were tested. The F-measure, which is the major global measure of effectiveness, was adopted. Results: In medical semantic understanding, the original EHR embeddings and PubMed embeddings exhibited superior performance to the original Wikipedia embeddings. After projection training technology was applied, the projection Wikipedia embeddings exhibited an obvious improvement but did not reach the level of original EHR embeddings or PubMed embeddings. In the subsequent ICD-10-CM coding experiment, the model that used both projection PubMed and Wikipedia embeddings had the highest testing mean F-measure (0.7362 and 0.6693 in Tri-Service General Hospital and the seven other hospitals, respectively). Moreover, the hybrid sampling method was found to improve the model performance (F-measure=0.7371/0.6698). Conclusions: The word embeddings trained using EHR and PubMed could understand medical semantics better, and the proposed projection word2vec model improved the ability of medical semantics extraction in Wikipedia embeddings. Although the improvement from the projection word2vec model in the real ICD-10-CM coding task was not substantial, the models could effectively handle emerging diseases. The proposed hybrid sampling method enables the model to behave like a human expert. ", doi="10.2196/14499", url="http://medinform.jmir.org/2019/3/e14499/", url="https://doi.org/10.2196/14499" } @Article{info:doi/10.2196/13659, author="Shaw, James and Rudzicz, Frank and Jamieson, Trevor and Goldfarb, Avi", title="Artificial Intelligence and the Implementation Challenge", journal="J Med Internet Res", year="2019", month="Jul", day="10", volume="21", number="7", pages="e13659", keywords="artificial intelligence; machine learning; implementation science; ethics", abstract="Background: Applications of artificial intelligence (AI) in health care have garnered much attention in recent years, but the implementation issues posed by AI have not been substantially addressed. Objective: In this paper, we have focused on machine learning (ML) as a form of AI and have provided a framework for thinking about use cases of ML in health care. We have structured our discussion of challenges in the implementation of ML in comparison with other technologies using the framework of Nonadoption, Abandonment, and Challenges to the Scale-Up, Spread, and Sustainability of Health and Care Technologies (NASSS). Methods: After providing an overview of AI technology, we describe use cases of ML as falling into the categories of decision support and automation. We suggest these use cases apply to clinical, operational, and epidemiological tasks and that the primary function of ML in health care in the near term will be decision support. We then outline unique implementation issues posed by ML initiatives in the categories addressed by the NASSS framework, specifically including meaningful decision support, explainability, privacy, consent, algorithmic bias, security, scalability, the role of corporations, and the changing nature of health care work. Results: Ultimately, we suggest that the future of ML in health care remains positive but uncertain, as support from patients, the public, and a wide range of health care stakeholders is necessary to enable its meaningful implementation. Conclusions: If the implementation science community is to facilitate the adoption of ML in ways that stand to generate widespread benefits, the issues raised in this paper will require substantial attention in the coming years. ", doi="10.2196/13659", url="https://www.jmir.org/2019/7/e13659/", url="https://doi.org/10.2196/13659", url="http://www.ncbi.nlm.nih.gov/pubmed/31293245" } @Article{info:doi/10.2196/13664, author="Loveys, Kate and Fricchione, Gregory and Kolappa, Kavitha and Sagar, Mark and Broadbent, Elizabeth", title="Reducing Patient Loneliness With Artificial Agents: Design Insights From Evolutionary Neuropsychiatry", journal="J Med Internet Res", year="2019", month="Jul", day="08", volume="21", number="7", pages="e13664", keywords="loneliness; neuropsychiatry; biological evolution; psychological bonding; interpersonal relations; artificial intelligence; social support; eHealth", doi="10.2196/13664", url="https://www.jmir.org/2019/7/e13664/", url="https://doi.org/10.2196/13664", url="http://www.ncbi.nlm.nih.gov/pubmed/31287067" } @Article{info:doi/10.2196/13930, author="Chan, Kai Siang and Zary, Nabil", title="Applications and Challenges of Implementing Artificial Intelligence in Medical Education: Integrative Review", journal="JMIR Med Educ", year="2019", month="Jun", day="15", volume="5", number="1", pages="e13930", keywords="medical education; evaluation of AIED systems; real world applications of AIED systems; artificial intelligence", abstract="Background: Since the advent of artificial intelligence (AI) in 1955, the applications of AI have increased over the years within a rapidly changing digital landscape where public expectations are on the rise, fed by social media, industry leaders, and medical practitioners. However, there has been little interest in AI in medical education until the last two decades, with only a recent increase in the number of publications and citations in the field. To our knowledge, thus far, a limited number of articles have discussed or reviewed the current use of AI in medical education. Objective: This study aims to review the current applications of AI in medical education as well as the challenges of implementing AI in medical education. Methods: Medline (Ovid), EBSCOhost Education Resources Information Center (ERIC) and Education Source, and Web of Science were searched with explicit inclusion and exclusion criteria. Full text of the selected articles was analyzed using the Extension of Technology Acceptance Model and the Diffusions of Innovations theory. Data were subsequently pooled together and analyzed quantitatively. Results: A total of 37 articles were identified. Three primary uses of AI in medical education were identified: learning support (n=32), assessment of students' learning (n=4), and curriculum review (n=1). The main reasons for use of AI are its ability to provide feedback and a guided learning pathway and to decrease costs. Subgroup analysis revealed that medical undergraduates are the primary target audience for AI use. In addition, 34 articles described the challenges of AI implementation in medical education; two main reasons were identified: difficulty in assessing the effectiveness of AI in medical education and technical challenges while developing AI applications. Conclusions: The primary use of AI in medical education was for learning support mainly due to its ability to provide individualized feedback. Little emphasis was placed on curriculum review and assessment of students' learning due to the lack of digitalization and sensitive nature of examinations, respectively. Big data manipulation also warrants the need to ensure data integrity. Methodological improvements are required to increase AI adoption by addressing the technical difficulties of creating an AI application and using novel methods to assess the effectiveness of AI. To better integrate AI into the medical profession, measures should be taken to introduce AI into the medical school curriculum for medical professionals to better understand AI algorithms and maximize its use. ", doi="10.2196/13930", url="http://mededu.jmir.org/2019/1/e13930/", url="https://doi.org/10.2196/13930", url="http://www.ncbi.nlm.nih.gov/pubmed/31199295" } @Article{info:doi/10.2196/13216, author="Fiske, Amelia and Henningsen, Peter and Buyx, Alena", title="Your Robot Therapist Will See You Now: Ethical Implications of Embodied Artificial Intelligence in Psychiatry, Psychology, and Psychotherapy", journal="J Med Internet Res", year="2019", month="May", day="09", volume="21", number="5", pages="e13216", keywords="artificial intelligence; robotics; ethics; psychiatry; psychology; psychotherapy; medicine", abstract="Background: Research in embodied artificial intelligence (AI) has increasing clinical relevance for therapeutic applications in mental health services. With innovations ranging from `virtual psychotherapists' to social robots in dementia care and autism disorder, to robots for sexual disorders, artificially intelligent virtual and robotic agents are increasingly taking on high-level therapeutic interventions that used to be offered exclusively by highly trained, skilled health professionals. In order to enable responsible clinical implementation, ethical and social implications of the increasing use of embodied AI in mental health need to be identified and addressed. Objective: This paper assesses the ethical and social implications of translating embodied AI applications into mental health care across the fields of Psychiatry, Psychology and Psychotherapy. Building on this analysis, it develops a set of preliminary recommendations on how to address ethical and social challenges in current and future applications of embodied AI. Methods: Based on a thematic literature search and established principles of medical ethics, an analysis of the ethical and social aspects of currently embodied AI applications was conducted across the fields of Psychiatry, Psychology, and Psychotherapy. To enable a comprehensive evaluation, the analysis was structured around the following three steps: assessment of potential benefits; analysis of overarching ethical issues and concerns; discussion of specific ethical and social issues of the interventions. Results: From an ethical perspective, important benefits of embodied AI applications in mental health include new modes of treatment, opportunities to engage hard-to-reach populations, better patient response, and freeing up time for physicians. Overarching ethical issues and concerns include: harm prevention and various questions of data ethics; a lack of guidance on development of AI applications, their clinical integration and training of health professionals; `gaps' in ethical and regulatory frameworks; the potential for misuse including using the technologies to replace established services, thereby potentially exacerbating existing health inequalities. Specific challenges identified and discussed in the application of embodied AI include: matters of risk-assessment, referrals, and supervision; the need to respect and protect patient autonomy; the role of non-human therapy; transparency in the use of algorithms; and specific concerns regarding long-term effects of these applications on understandings of illness and the human condition. Conclusions: We argue that embodied AI is a promising approach across the field of mental health; however, further research is needed to address the broader ethical and societal concerns of these technologies to negotiate best research and medical practices in innovative mental health care. We conclude by indicating areas of future research and developing recommendations for high-priority areas in need of concrete ethical guidance. ", doi="10.2196/13216", url="https://www.jmir.org/2019/5/e13216/", url="https://doi.org/10.2196/13216", url="http://www.ncbi.nlm.nih.gov/pubmed/31094356" } @Article{info:doi/10.2196/11030, author="Woldaregay, Ashenafi Zebene and {\AA}rsand, Eirik and Botsis, Taxiarchis and Albers, David and Mamykina, Lena and Hartvigsen, Gunnar", title="Data-Driven Blood Glucose Pattern Classification and Anomalies Detection: Machine-Learning Applications in Type 1 Diabetes", journal="J Med Internet Res", year="2019", month="May", day="01", volume="21", number="5", pages="e11030", keywords="type 1 diabetes; blood glucose dynamics; anomalies detection; machine learning", abstract="Background: Diabetes mellitus is a chronic metabolic disorder that results in abnormal blood glucose (BG) regulations. The BG level is preferably maintained close to normality through self-management practices, which involves actively tracking BG levels and taking proper actions including adjusting diet and insulin medications. BG anomalies could be defined as any undesirable reading because of either a precisely known reason (normal cause variation) or an unknown reason (special cause variation) to the patient. Recently, machine-learning applications have been widely introduced within diabetes research in general and BG anomaly detection in particular. However, irrespective of their expanding and increasing popularity, there is a lack of up-to-date reviews that materialize the current trends in modeling options and strategies for BG anomaly classification and detection in people with diabetes. Objective: This review aimed to identify, assess, and analyze the state-of-the-art machine-learning strategies and their hybrid systems focusing on BG anomaly classification and detection including glycemic variability (GV), hyperglycemia, and hypoglycemia in type 1 diabetes within the context of personalized decision support systems and BG alarm events applications, which are important constituents for optimal diabetes self-management. Methods: A rigorous literature search was conducted between September 1 and October 1, 2017, and October 15 and November 5, 2018, through various Web-based databases. Peer-reviewed journals and articles were considered. Information from the selected literature was extracted based on predefined categories, which were based on previous research and further elaborated through brainstorming. Results: The initial results were vetted using the title, abstract, and keywords and retrieved 496 papers. After a thorough assessment and screening, 47 articles remained, which were critically analyzed. The interrater agreement was measured using a Cohen kappa test, and disagreements were resolved through discussion. The state-of-the-art classes of machine learning have been developed and tested up to the task and achieved promising performance including artificial neural network, support vector machine, decision tree, genetic algorithm, Gaussian process regression, Bayesian neural network, deep belief network, and others. Conclusions: Despite the complexity of BG dynamics, there are many attempts to capture hypoglycemia and hyperglycemia incidences and the extent of an individual's GV using different approaches. Recently, the advancement of diabetes technologies and continuous accumulation of self-collected health data have paved the way for popularity of machine learning in these tasks. According to the review, most of the identified studies used a theoretical threshold, which suffers from inter- and intrapatient variation. Therefore, future studies should consider the difference among patients and also track its temporal change over time. Moreover, studies should also give more emphasis on the types of inputs used and their associated time lag. Generally, we foresee that these developments might encourage researchers to further develop and test these systems on a large-scale basis. ", doi="10.2196/11030", url="https://www.jmir.org/2019/5/e11030/", url="https://doi.org/10.2196/11030", url="http://www.ncbi.nlm.nih.gov/pubmed/31042157" } @Article{info:doi/10.2196/13445, author="Aboueid, Stephanie and Liu, Rebecca H and Desta, Binyam Negussie and Chaurasia, Ashok and Ebrahim, Shanil", title="The Use of Artificially Intelligent Self-Diagnosing Digital Platforms by the General Public: Scoping Review", journal="JMIR Med Inform", year="2019", month="May", day="01", volume="7", number="2", pages="e13445", keywords="diagnosis; artificial intelligence; symptom checkers; diagnostic self evaluation; self-care", abstract="Background: Self-diagnosis is the process of diagnosing or identifying a medical condition in oneself. Artificially intelligent digital platforms for self-diagnosis are becoming widely available and are used by the general public; however, little is known about the body of knowledge surrounding this technology. Objective: The objectives of this scoping review were to (1) systematically map the extent and nature of the literature and topic areas pertaining to digital platforms that use computerized algorithms to provide users with a list of potential diagnoses and (2) identify key knowledge gaps. Methods: The following databases were searched: PubMed (Medline), Scopus, Association for Computing Machinery Digital Library, Institute of Electrical and Electronics Engineers, Google Scholar, Open Grey, and ProQuest Dissertations and Theses. The search strategy was developed and refined with the assistance of a librarian and consisted of 3 main concepts: (1) self-diagnosis; (2) digital platforms; and (3) public or patients. The search generated 2536 articles from which 217 were duplicates. Following the Tricco et al 2018 checklist, 2 researchers screened the titles and abstracts (n=2316) and full texts (n=104), independently. A total of 19 articles were included for review, and data were retrieved following a data-charting form that was pretested by the research team. Results: The included articles were mainly conducted in the United States (n=10) or the United Kingdom (n=4). Among the articles, topic areas included accuracy or correspondence with a doctor's diagnosis (n=6), commentaries (n=2), regulation (n=3), sociological (n=2), user experience (n=2), theoretical (n=1), privacy and security (n=1), ethical (n=1), and design (n=1). Individuals who do not have access to health care and perceive to have a stigmatizing condition are more likely to use this technology. The accuracy of this technology varied substantially based on the disease examined and platform used. Women and those with higher education were more likely to choose the right diagnosis out of the potential list of diagnoses. Regulation of this technology is lacking in most parts of the world; however, they are currently under development. Conclusions: There are prominent research gaps in the literature surrounding the use of artificially intelligent self-diagnosing digital platforms. Given the variety of digital platforms and the wide array of diseases they cover, measuring accuracy is cumbersome. More research is needed to understand the user experience and inform regulations. ", doi="10.2196/13445", url="http://medinform.jmir.org/2019/2/e13445/", url="https://doi.org/10.2196/13445", url="http://www.ncbi.nlm.nih.gov/pubmed/31042151" } @Article{info:doi/10.2196/13822, author="Tariq, Qandeel and Fleming, Scott Lanyon and Schwartz, Jessey Nicole and Dunlap, Kaitlyn and Corbin, Conor and Washington, Peter and Kalantarian, Haik and Khan, Naila Z and Darmstadt, Gary L and Wall, Dennis Paul", title="Detecting Developmental Delay and Autism Through Machine Learning Models Using Home Videos of Bangladeshi Children: Development and Validation Study", journal="J Med Internet Res", year="2019", month="Apr", day="24", volume="21", number="4", pages="e13822", keywords="autism; autism spectrum disorder; machine learning; developmental delays; clinical resources; Bangladesh; Biomedical Data Science", abstract="Background: Autism spectrum disorder (ASD) is currently diagnosed using qualitative methods that measure between 20-100 behaviors, can span multiple appointments with trained clinicians, and take several hours to complete. In our previous work, we demonstrated the efficacy of machine learning classifiers to accelerate the process by collecting home videos of US-based children, identifying a reduced subset of behavioral features that are scored by untrained raters using a machine learning classifier to determine children's ``risk scores'' for autism. We achieved an accuracy of 92{\%} (95{\%} CI 88{\%}-97{\%}) on US videos using a classifier built on five features. Objective: Using videos of Bangladeshi children collected from Dhaka Shishu Children's Hospital, we aim to scale our pipeline to another culture and other developmental delays, including speech and language conditions. Methods: Although our previously published and validated pipeline and set of classifiers perform reasonably well on Bangladeshi videos (75{\%} accuracy, 95{\%} CI 71{\%}-78{\%}), this work improves on that accuracy through the development and application of a powerful new technique for adaptive aggregation of crowdsourced labels. We enhance both the utility and performance of our model by building two classification layers: The first layer distinguishes between typical and atypical behavior, and the second layer distinguishes between ASD and non-ASD. In each of the layers, we use a unique rater weighting scheme to aggregate classification scores from different raters based on their expertise. We also determine Shapley values for the most important features in the classifier to understand how the classifiers' process aligns with clinical intuition. Results: Using these techniques, we achieved an accuracy (area under the curve [AUC]) of 76{\%} (SD 3{\%}) and sensitivity of 76{\%} (SD 4{\%}) for identifying atypical children from among developmentally delayed children, and an accuracy (AUC) of 85{\%} (SD 5{\%}) and sensitivity of 76{\%} (SD 6{\%}) for identifying children with ASD from those predicted to have other developmental delays. Conclusions: These results show promise for using a mobile video-based and machine learning--directed approach for early and remote detection of autism in Bangladeshi children. This strategy could provide important resources for developmental health in developing countries with few clinical resources for diagnosis, helping children get access to care at an early age. Future research aimed at extending the application of this approach to identify a range of other conditions and determine the population-level burden of developmental disabilities and impairments will be of high value. ", doi="10.2196/13822", url="http://www.jmir.org/2019/4/e13822/", url="https://doi.org/10.2196/13822", url="http://www.ncbi.nlm.nih.gov/pubmed/31017583" } @Article{info:doi/10.2196/12887, author="Palanica, Adam and Flaschner, Peter and Thommandram, Anirudh and Li, Michael and Fossat, Yan", title="Physicians' Perceptions of Chatbots in Health Care: Cross-Sectional Web-Based Survey", journal="J Med Internet Res", year="2019", month="Apr", day="05", volume="21", number="4", pages="e12887", keywords="physician satisfaction; health care; telemedicine; mobile health; health surveys", abstract="Background: Many potential benefits for the uses of chatbots within the context of health care have been theorized, such as improved patient education and treatment compliance. However, little is known about the perspectives of practicing medical physicians on the use of chatbots in health care, even though these individuals are the traditional benchmark of proper patient care. Objective: This study aimed to investigate the perceptions of physicians regarding the use of health care chatbots, including their benefits, challenges, and risks to patients. Methods: A total of 100 practicing physicians across the United States completed a Web-based, self-report survey to examine their opinions of chatbot technology in health care. Descriptive statistics and frequencies were used to examine the characteristics of participants. Results: A wide variety of positive and negative perspectives were reported on the use of health care chatbots, including the importance to patients for managing their own health and the benefits on physical, psychological, and behavioral health outcomes. More consistent agreement occurred with regard to administrative benefits associated with chatbots; many physicians believed that chatbots would be most beneficial for scheduling doctor appointments (78{\%}, 78/100), locating health clinics (76{\%}, 76/100), or providing medication information (71{\%}, 71/100). Conversely, many physicians believed that chatbots cannot effectively care for all of the patients' needs (76{\%}, 76/100), cannot display human emotion (72{\%}, 72/100), and cannot provide detailed diagnosis and treatment because of not knowing all of the personal factors associated with the patient (71{\%}, 71/100). Many physicians also stated that health care chatbots could be a risk to patients if they self-diagnose too often (714{\%}, 74/100) and do not accurately understand the diagnoses (74{\%}, 74/100). Conclusions: Physicians believed in both costs and benefits associated with chatbots, depending on the logistics and specific roles of the technology. Chatbots may have a beneficial role to play in health care to support, motivate, and coach patients as well as for streamlining organizational tasks; in essence, chatbots could become a surrogate for nonmedical caregivers. However, concerns remain on the inability of chatbots to comprehend the emotional state of humans as well as in areas where expert medical knowledge and intelligence is required. ", doi="10.2196/12887", url="https://www.jmir.org/2019/4/e12887/", url="https://doi.org/10.2196/12887", url="http://www.ncbi.nlm.nih.gov/pubmed/30950796" } @Article{info:doi/10.2196/12286, author="Triantafyllidis, Andreas K and Tsanas, Athanasios", title="Applications of Machine Learning in Real-Life Digital Health Interventions: Review of the Literature", journal="J Med Internet Res", year="2019", month="Apr", day="05", volume="21", number="4", pages="e12286", keywords="machine learning; data mining; artificial intelligence; digital health; review; telemedicine", abstract="Background: Machine learning has attracted considerable research interest toward developing smart digital health interventions. These interventions have the potential to revolutionize health care and lead to substantial outcomes for patients and medical professionals. Objective: Our objective was to review the literature on applications of machine learning in real-life digital health interventions, aiming to improve the understanding of researchers, clinicians, engineers, and policy makers in developing robust and impactful data-driven interventions in the health care domain. Methods: We searched the PubMed and Scopus bibliographic databases with terms related to machine learning, to identify real-life studies of digital health interventions incorporating machine learning algorithms. We grouped those interventions according to their target (ie, target condition), study design, number of enrolled participants, follow-up duration, primary outcome and whether this had been statistically significant, machine learning algorithms used in the intervention, and outcome of the algorithms (eg, prediction). Results: Our literature search identified 8 interventions incorporating machine learning in a real-life research setting, of which 3 (37{\%}) were evaluated in a randomized controlled trial and 5 (63{\%}) in a pilot or experimental single-group study. The interventions targeted depression prediction and management, speech recognition for people with speech disabilities, self-efficacy for weight loss, detection of changes in biopsychosocial condition of patients with multiple morbidity, stress management, treatment of phantom limb pain, smoking cessation, and personalized nutrition based on glycemic response. The average number of enrolled participants in the studies was 71 (range 8-214), and the average follow-up study duration was 69 days (range 3-180). Of the 8 interventions, 6 (75{\%}) showed statistical significance (at the P=.05 level) in health outcomes. Conclusions: This review found that digital health interventions incorporating machine learning algorithms in real-life studies can be useful and effective. Given the low number of studies identified in this review and that they did not follow a rigorous machine learning evaluation methodology, we urge the research community to conduct further studies in intervention settings following evaluation principles and demonstrating the potential of machine learning in clinical practice. ", doi="10.2196/12286", url="https://www.jmir.org/2019/4/e12286/", url="https://doi.org/10.2196/12286", url="http://www.ncbi.nlm.nih.gov/pubmed/30950797" } @Article{info:doi/10.2196/12100, author="van Hartskamp, Michael and Consoli, Sergio and Verhaegh, Wim and Petkovic, Milan and van de Stolpe, Anja", title="Artificial Intelligence in Clinical Health Care Applications: Viewpoint", journal="Interact J Med Res", year="2019", month="Apr", day="05", volume="8", number="2", pages="e12100", keywords="artificial intelligence; deep learning; clinical data; Bayesian modeling; medical informatics", doi="10.2196/12100", url="https://www.i-jmr.org/2019/2/e12100/", url="https://doi.org/10.2196/12100", url="http://www.ncbi.nlm.nih.gov/pubmed/30950806" } @Article{info:doi/10.2196/12422, author="Oh, Songhee and Kim, Jae Heon and Choi, Sung-Woo and Lee, Hee Jeong and Hong, Jungrak and Kwon, Soon Hyo", title="Physician Confidence in Artificial Intelligence: An Online Mobile Survey", journal="J Med Internet Res", year="2019", month="Mar", day="25", volume="21", number="3", pages="e12422", keywords="artificial intelligence; AI; awareness; physicians", abstract="Background: It is expected that artificial intelligence (AI) will be used extensively in the medical field in the future. Objective: The purpose of this study is to investigate the awareness of AI among Korean doctors and to assess physicians' attitudes toward the medical application of AI. Methods: We conducted an online survey composed of 11 closed-ended questions using Google Forms. The survey consisted of questions regarding the recognition of and attitudes toward AI, the development direction of AI in medicine, and the possible risks of using AI in the medical field. Results: A total of 669 participants completed the survey. Only 40 (5.9{\%}) answered that they had good familiarity with AI. However, most participants considered AI useful in the medical field (558/669, 83.4{\%} agreement). The advantage of using AI was seen as the ability to analyze vast amounts of high-quality, clinically relevant data in real time. Respondents agreed that the area of medicine in which AI would be most useful is disease diagnosis (558/669, 83.4{\%} agreement). One possible problem cited by the participants was that AI would not be able to assist in unexpected situations owing to inadequate information (196/669, 29.3{\%}). Less than half of the participants(294/669, 43.9{\%}) agreed that AI is diagnostically superior to human doctors. Only 237 (35.4{\%}) answered that they agreed that AI could replace them in their jobs. Conclusions: This study suggests that Korean doctors and medical students have favorable attitudes toward AI in the medical field. The majority of physicians surveyed believed that AI will not replace their roles in the future. ", doi="10.2196/12422", url="http://www.jmir.org/2019/3/e12422/", url="https://doi.org/10.2196/12422", url="http://www.ncbi.nlm.nih.gov/pubmed/30907742" } @Article{info:doi/10.2196/12802, author="Blease, Charlotte and Kaptchuk, Ted J and Bernstein, Michael H and Mandl, Kenneth D and Halamka, John D and DesRoches, Catherine M", title="Artificial Intelligence and the Future of Primary Care: Exploratory Qualitative Study of UK General Practitioners' Views", journal="J Med Internet Res", year="2019", month="Mar", day="20", volume="21", number="3", pages="e12802", keywords="artificial intelligence; attitudes; future; general practice; machine learning; opinions; primary care; qualitative research; technology", abstract="Background: The potential for machine learning to disrupt the medical profession is the subject of ongoing debate within biomedical informatics and related fields. Objective: This study aimed to explore general practitioners' (GPs') opinions about the potential impact of future technology on key tasks in primary care. Methods: In June 2018, we conducted a Web-based survey of 720 UK GPs' opinions about the likelihood of future technology to fully replace GPs in performing 6 key primary care tasks, and, if respondents considered replacement for a particular task likely, to estimate how soon the technological capacity might emerge. This study involved qualitative descriptive analysis of written responses (``comments'') to an open-ended question in the survey. Results: Comments were classified into 3 major categories in relation to primary care: (1) limitations of future technology, (2) potential benefits of future technology, and (3) social and ethical concerns. Perceived limitations included the beliefs that communication and empathy are exclusively human competencies; many GPs also considered clinical reasoning and the ability to provide value-based care as necessitating physicians' judgments. Perceived benefits of technology included expectations about improved efficiencies, in particular with respect to the reduction of administrative burdens on physicians. Social and ethical concerns encompassed multiple, divergent themes including the need to train more doctors to overcome workforce shortfalls and misgivings about the acceptability of future technology to patients. However, some GPs believed that the failure to adopt technological innovations could incur harms to both patients and physicians. Conclusions: This study presents timely information on physicians' views about the scope of artificial intelligence (AI) in primary care. Overwhelmingly, GPs considered the potential of AI to be limited. These views differ from the predictions of biomedical informaticians. More extensive, stand-alone qualitative work would provide a more in-depth understanding of GPs' views. ", doi="10.2196/12802", url="http://www.jmir.org/2019/3/e12802/", url="https://doi.org/10.2196/12802", url="http://www.ncbi.nlm.nih.gov/pubmed/30892270" } @Article{info:doi/10.2196/11990, author="Chen, Jinying and Lalor, John and Liu, Weisong and Druhl, Emily and Granillo, Edgard and Vimalananda, Varsha G and Yu, Hong", title="Detecting Hypoglycemia Incidents Reported in Patients' Secure Messages: Using Cost-Sensitive Learning and Oversampling to Reduce Data Imbalance", journal="J Med Internet Res", year="2019", month="Mar", day="11", volume="21", number="3", pages="e11990", keywords="secure messaging; natural language processing; hypoglycemia; supervised machine learning; imbalanced data; adverse event detection; drug-related side effects and adverse reactions", abstract="Background: Improper dosing of medications such as insulin can cause hypoglycemic episodes, which may lead to severe morbidity or even death. Although secure messaging was designed for exchanging nonurgent messages, patients sometimes report hypoglycemia events through secure messaging. Detecting these patient-reported adverse events may help alert clinical teams and enable early corrective actions to improve patient safety. Objective: We aimed to develop a natural language processing system, called HypoDetect (Hypoglycemia Detector), to automatically identify hypoglycemia incidents reported in patients' secure messages. Methods: An expert in public health annotated 3000 secure message threads between patients with diabetes and US Department of Veterans Affairs clinical teams as containing patient-reported hypoglycemia incidents or not. A physician independently annotated 100 threads randomly selected from this dataset to determine interannotator agreement. We used this dataset to develop and evaluate HypoDetect. HypoDetect incorporates 3 machine learning algorithms widely used for text classification: linear support vector machines, random forest, and logistic regression. We explored different learning features, including new knowledge-driven features. Because only 114 (3.80{\%}) messages were annotated as positive, we investigated cost-sensitive learning and oversampling methods to mitigate the challenge of imbalanced data. Results: The interannotator agreement was Cohen kappa=.976. Using cross-validation, logistic regression with cost-sensitive learning achieved the best performance (area under the receiver operating characteristic curve=0.954, sensitivity=0.693, specificity 0.974, F1 score=0.590). Cost-sensitive learning and the ensembled synthetic minority oversampling technique improved the sensitivity of the baseline systems substantially (by 0.123 to 0.728 absolute gains). Our results show that a variety of features contributed to the best performance of HypoDetect. Conclusions: Despite the challenge of data imbalance, HypoDetect achieved promising results for the task of detecting hypoglycemia incidents from secure messages. The system has a great potential to facilitate early detection and treatment of hypoglycemia. ", doi="10.2196/11990", url="http://www.jmir.org/2019/3/e11990/", url="https://doi.org/10.2196/11990", url="http://www.ncbi.nlm.nih.gov/pubmed/30855231" } @Article{info:doi/10.2196/10788, author="Li, Rumeng and Hu, Baotian and Liu, Feifan and Liu, Weisong and Cunningham, Francesca and McManus, David D and Yu, Hong", title="Detection of Bleeding Events in Electronic Health Record Notes Using Convolutional Neural Network Models Enhanced With Recurrent Neural Network Autoencoders: Deep Learning Approach", journal="JMIR Med Inform", year="2019", month="Feb", day="08", volume="7", number="1", pages="e10788", keywords="autoencoder; BiLSTM; bleeding; convolutional neural networks; electronic health record", abstract="Background: Bleeding events are common and critical and may cause significant morbidity and mortality. High incidences of bleeding events are associated with cardiovascular disease in patients on anticoagulant therapy. Prompt and accurate detection of bleeding events is essential to prevent serious consequences. As bleeding events are often described in clinical notes, automatic detection of bleeding events from electronic health record (EHR) notes may improve drug-safety surveillance and pharmacovigilance. Objective: We aimed to develop a natural language processing (NLP) system to automatically classify whether an EHR note sentence contains a bleeding event. Methods: We expert annotated 878 EHR notes (76,577 sentences and 562,630 word-tokens) to identify bleeding events at the sentence level. This annotated corpus was used to train and validate our NLP systems. We developed an innovative hybrid convolutional neural network (CNN) and long short-term memory (LSTM) autoencoder (HCLA) model that integrates a CNN architecture with a bidirectional LSTM (BiLSTM) autoencoder model to leverage large unlabeled EHR data. Results: HCLA achieved the best area under the receiver operating characteristic curve (0.957) and F1 score (0.938) to identify whether a sentence contains a bleeding event, thereby surpassing the strong baseline support vector machines and other CNN and autoencoder models. Conclusions: By incorporating a supervised CNN model and a pretrained unsupervised BiLSTM autoencoder, the HCLA achieved high performance in detecting bleeding events. ", doi="10.2196/10788", url="http://medinform.jmir.org/2019/1/e10788/", url="https://doi.org/10.2196/10788", url="http://www.ncbi.nlm.nih.gov/pubmed/30735140" } @Article{info:doi/10.2196/mental.9782, author="Fulmer, Russell and Joerin, Angela and Gentile, Breanna and Lakerink, Lysanne and Rauws, Michiel", title="Using Psychological Artificial Intelligence (Tess) to Relieve Symptoms of Depression and Anxiety: Randomized Controlled Trial", journal="JMIR Ment Health", year="2018", month="Dec", day="13", volume="5", number="4", pages="e64", keywords="artificial intelligence; mental health services; depression; anxiety; students", abstract="Background: Students in need of mental health care face many barriers including cost, location, availability, and stigma. Studies show that computer-assisted therapy and 1 conversational chatbot delivering cognitive behavioral therapy (CBT) offer a less-intensive and more cost-effective alternative for treating depression and anxiety. Although CBT is one of the most effective treatment methods, applying an integrative approach has been linked to equally effective posttreatment improvement. Integrative psychological artificial intelligence (AI) offers a scalable solution as the demand for affordable, convenient, lasting, and secure support grows. Objective: This study aimed to assess the feasibility and efficacy of using an integrative psychological AI, Tess, to reduce self-identified symptoms of depression and anxiety in college students. Methods: In this randomized controlled trial, 75 participants were recruited from 15 universities across the United States. All participants completed Web-based surveys, including the Patient Health Questionnaire (PHQ-9), Generalized Anxiety Disorder Scale (GAD-7), and Positive and Negative Affect Scale (PANAS) at baseline and 2 to 4 weeks later (T2). The 2 test groups consisted of 50 participants in total and were randomized to receive unlimited access to Tess for either 2 weeks (n=24) or 4 weeks (n=26). The information-only control group participants (n=24) received an electronic link to the National Institute of Mental Health's (NIMH) eBook on depression among college students and were only granted access to Tess after completion of the study. Results: A sample of 74 participants completed this study with 0{\%} attrition from the test group and less than 1{\%} attrition from the control group (1/24). The average age of participants was 22.9 years, with 70{\%} of participants being female (52/74), mostly Asian (37/74, 51{\%}), and white (32/74, 41{\%}). Group 1 received unlimited access to Tess, with daily check-ins for 2 weeks. Group 2 received unlimited access to Tess with biweekly check-ins for 4 weeks. The information-only control group was provided with an electronic link to the NIMH's eBook. Multivariate analysis of covariance was conducted. We used an alpha level of .05 for all statistical tests. Results revealed a statistically significant difference between the control group and group 1, such that group 1 reported a significant reduction in symptoms of depression as measured by the PHQ-9 (P=.03), whereas those in the control group did not. A statistically significant difference was found between the control group and both test groups 1 and 2 for symptoms of anxiety as measured by the GAD-7. Group 1 (P=.045) and group 2 (P=.02) reported a significant reduction in symptoms of anxiety, whereas the control group did not. A statistically significant difference was found on the PANAS between the control group and group 1 (P=.03) and suggests that Tess did impact scores. Conclusions: This study offers evidence that AI can serve as a cost-effective and accessible therapeutic agent. Although not designed to appropriate the role of a trained therapist, integrative psychological AI emerges as a feasible option for delivering support. Trial Registration: International Standard Randomized Controlled Trial Number: ISRCTN61214172; https://doi.org/10.1186/ISRCTN61214172. ", doi="10.2196/mental.9782", url="http://mental.jmir.org/2018/4/e64/", url="https://doi.org/10.2196/mental.9782", url="http://www.ncbi.nlm.nih.gov/pubmed/30545815" } @Article{info:doi/10.2196/mhealth.8127, author="Lo, Wai Leung Ambrose and Lei, Di and Li, Le and Huang, Dong Feng and Tong, Kin-Fai", title="The Perceived Benefits of an Artificial Intelligence--Embedded Mobile App Implementing Evidence-Based Guidelines for the Self-Management of Chronic Neck and Back Pain: Observational Study", journal="JMIR Mhealth Uhealth", year="2018", month="Nov", day="26", volume="6", number="11", pages="e198", keywords="low back pain; neck pain; mobile app; exercise therapy; mHealth", abstract="Background: Chronic musculoskeletal neck and back pain are disabling conditions among adults. Use of technology has been suggested as an alternative way to increase adherence to exercise therapy, which may improve clinical outcomes. Objective: The aim was to investigate the self-perceived benefits of an artificial intelligence (AI)--embedded mobile app to self-manage chronic neck and back pain. Methods: A total of 161 participants responded to the invitation. The evaluation questionnaire included 14 questions that were intended to explore if using the AI rehabilitation system may (1) increase time spent on therapeutic exercise, (2) affect pain level (assessed by the 0-10 Numerical Pain Rating Scale), and (3) reduce the need for other interventions. Results: An increase in time spent on therapeutic exercise per day was observed. The median Numerical Pain Rating Scale scores were 6 (interquartile range [IQR] 5-8) before and 4 (IQR 3-6) after using the AI-embedded mobile app (95{\%} CI 1.18-1.81). A 3-point reduction was reported by the participants who used the AI-embedded mobile app for more than 6 months. Reduction in the usage of other interventions while using the AI-embedded mobile app was also reported. Conclusions: This study demonstrated the positive self-perceived beneficiary effect of using the AI-embedded mobile app to provide a personalized therapeutic exercise program. The positive results suggest that it at least warrants further study to investigate the physiological effect of the AI-embedded mobile app and how it compares with routine clinical care. ", doi="10.2196/mhealth.8127", url="http://mhealth.jmir.org/2018/11/e198/", url="https://doi.org/10.2196/mhealth.8127", url="http://www.ncbi.nlm.nih.gov/pubmed/30478019" } @Article{info:doi/10.2196/11510, author="Bickmore, Timothy W and Trinh, Ha and Olafsson, Stefan and O'Leary, Teresa K and Asadi, Reza and Rickles, Nathaniel M and Cruz, Ricardo", title="Patient and Consumer Safety Risks When Using Conversational Assistants for Medical Information: An Observational Study of Siri, Alexa, and Google Assistant", journal="J Med Internet Res", year="2018", month="Sep", day="04", volume="20", number="9", pages="e11510", keywords="conversational assistant; conversational interface; dialogue system; medical error; patient safety", abstract="Background: Conversational assistants, such as Siri, Alexa, and Google Assistant, are ubiquitous and are beginning to be used as portals for medical services. However, the potential safety issues of using conversational assistants for medical information by patients and consumers are not understood. Objective: To determine the prevalence and nature of the harm that could result from patients or consumers using conversational assistants for medical information. Methods: Participants were given medical problems to pose to Siri, Alexa, or Google Assistant, and asked to determine an action to take based on information from the system. Assignment of tasks and systems were randomized across participants, and participants queried the conversational assistants in their own words, making as many attempts as needed until they either reported an action to take or gave up. Participant-reported actions for each medical task were rated for patient harm using an Agency for Healthcare Research and Quality harm scale. Results: Fifty-four subjects completed the study with a mean age of 42 years (SD 18). Twenty-nine (54{\%}) were female, 31 (57{\%}) Caucasian, and 26 (50{\%}) were college educated. Only 8 (15{\%}) reported using a conversational assistant regularly, while 22 (41{\%}) had never used one, and 24 (44{\%}) had tried one ``a few times.`` Forty-four (82{\%}) used computers regularly. Subjects were only able to complete 168 (43{\%}) of their 394 tasks. Of these, 49 (29{\%}) reported actions that could have resulted in some degree of patient harm, including 27 (16{\%}) that could have resulted in death. Conclusions: Reliance on conversational assistants for actionable medical information represents a safety risk for patients and consumers. Patients should be cautioned to not use these technologies for answers to medical questions they intend to act on without further consultation from a health care provider. ", doi="10.2196/11510", url="http://www.jmir.org/2018/9/e11510/", url="https://doi.org/10.2196/11510", url="http://www.ncbi.nlm.nih.gov/pubmed/30181110" } @Article{info:doi/10.2196/10454, author="Suganuma, Shinichiro and Sakamoto, Daisuke and Shimoyama, Haruhiko", title="An Embodied Conversational Agent for Unguided Internet-Based Cognitive Behavior Therapy in Preventative Mental Health: Feasibility and Acceptability Pilot Trial", journal="JMIR Ment Health", year="2018", month="Jul", day="31", volume="5", number="3", pages="e10454", keywords="embodied conversational agent; cognitive behavioral therapy; psychological distress; mental well‐being; artificial intelligence technology", abstract="Background: Recent years have seen an increase in the use of internet-based cognitive behavioral therapy in the area of mental health. Although lower effectiveness and higher dropout rates of unguided than those of guided internet-based cognitive behavioral therapy remain critical issues, not incurring ongoing human clinical resources makes it highly advantageous. Objective: Current research in psychotherapy, which acknowledges the importance of therapeutic alliance, aims to evaluate the feasibility and acceptability, in terms of mental health, of an application that is embodied with a conversational agent. This application was enabled for use as an internet-based cognitive behavioral therapy preventative mental health measure. Methods: Analysis of the data from the 191 participants of the experimental group with a mean age of 38.07 (SD 10.75) years and the 263 participants of the control group with a mean age of 38.05 (SD 13.45) years using a 2-way factorial analysis of variance (group {\texttimes} time) was performed. Results: There was a significant main effect (P=.02) and interaction for time on the variable of positive mental health (P=.02), and for the treatment group, a significant simple main effect was also found (P=.002). In addition, there was a significant main effect (P=.02) and interaction for time on the variable of negative mental health (P=.005), and for the treatment group, a significant simple main effect was also found (P=.001). Conclusions: This research can be seen to represent a certain level of evidence for the mental health application developed herein, indicating empirically that internet-based cognitive behavioral therapy with the embodied conversational agent can be used in mental health care. In the pilot trial, given the issues related to feasibility and acceptability, it is necessary to pursue higher quality evidence while continuing to further improve the application, based on the findings of the current research. ", doi="10.2196/10454", url="http://mental.jmir.org/2018/3/e10454/", url="https://doi.org/10.2196/10454", url="http://www.ncbi.nlm.nih.gov/pubmed/30064969" } @Article{info:doi/10.2196/10148, author="Morris, Robert R and Kouddous, Kareem and Kshirsagar, Rohan and Schueller, Stephen M", title="Towards an Artificially Empathic Conversational Agent for Mental Health Applications: System Design and User Perceptions", journal="J Med Internet Res", year="2018", month="Jun", day="26", volume="20", number="6", pages="e10148", keywords="conversational agents; mental health; empathy; crowdsourcing; peer support", abstract="Background: Conversational agents cannot yet express empathy in nuanced ways that account for the unique circumstances of the user. Agents that possess this faculty could be used to enhance digital mental health interventions. Objective: We sought to design a conversational agent that could express empathic support in ways that might approach, or even match, human capabilities. Another aim was to assess how users might appraise such a system. Methods: Our system used a corpus-based approach to simulate expressed empathy. Responses from an existing pool of online peer support data were repurposed by the agent and presented to the user. Information retrieval techniques and word embeddings were used to select historical responses that best matched a user's concerns. We collected ratings from 37,169 users to evaluate the system. Additionally, we conducted a controlled experiment (N=1284) to test whether the alleged source of a response (human or machine) might change user perceptions. Results: The majority of responses created by the agent (2986/3770, 79.20{\%}) were deemed acceptable by users. However, users significantly preferred the efforts of their peers (P<.001). This effect was maintained in a controlled study (P=.02), even when the only difference in responses was whether they were framed as coming from a human or a machine. Conclusions: Our system illustrates a novel way for machines to construct nuanced and personalized empathic utterances. However, the design had significant limitations and further research is needed to make this approach viable. Our controlled study suggests that even in ideal conditions, nonhuman agents may struggle to express empathy as well as humans. The ethical implications of empathic agents, as well as their potential iatrogenic effects, are also discussed. ", doi="10.2196/10148", url="http://www.jmir.org/2018/6/e10148/", url="https://doi.org/10.2196/10148", url="http://www.ncbi.nlm.nih.gov/pubmed/29945856" } @Article{info:doi/10.2196/mental.9423, author="Martinez-Martin, Nicole and Kreitmair, Karola", title="Ethical Issues for Direct-to-Consumer Digital Psychotherapy Apps: Addressing Accountability, Data Protection, and Consent", journal="JMIR Ment Health", year="2018", month="Apr", day="23", volume="5", number="2", pages="e32", keywords="ethics; ethical issues; mental health; technology; telemedicine; mHealth; psychotherapy", doi="10.2196/mental.9423", url="http://mental.jmir.org/2018/2/e32/", url="https://doi.org/10.2196/mental.9423", url="http://www.ncbi.nlm.nih.gov/pubmed/29685865" } @Article{info:doi/10.2196/iproc.8585, author="Howe, Esther and Pedrelli, Paola and Morris, Robert and Nyer, Maren and Mischoulon, David and Picard, Rosalind", title="Feasibility of an Automated System Counselor for Survivors of Sexual Assault", journal="iproc", year="2017", month="Sep", day="22", volume="3", number="1", pages="e37", keywords="CBT; web chat", abstract="Background: Sexual assault (SA) is common and costly to individuals and society, and increases risk of mental health disorders. Stigma and cost of care discourage survivors from seeking help. Norms profiling survivors as heterosexual, cisgendered women dissuade LGBTQIA+ individuals and men from accessing care. Because individuals prefer disclosing sensitive information online rather than in-person, online systems---like instant messaging and chatbots---for counseling may bypass concerns about stigma. These systems' anonymity may increase disclosure and decrease impression management, the process by which individuals attempt to influence others' perceptions. Their low cost may expand reach of care. There are no known evidence-based chat platforms for SA survivors. Objective: To examine feasibility of a chat platform with peer and automated system (chatbot) counseling interfaces to provide cognitive reappraisals (a cognitive behavioral therapy technique) to survivors. Methods: Participants are English-speaking, US-based survivors, 18+ years old. Participants are told they will be randomized to chat with a peer or automated system counselor 5 times over 2 weeks. In reality, all participants chat with a peer counselor. Chats employ a modified-for-context evidence-based cognitive reappraisal script developed by Koko, a company offering support services for emotional distress via social networks. At baseline, participants indicate counselor type preference and complete a basic demographic form, the Brief Fear of Negative Evaluation Scale, and self-disclosure items from the International Personality Item Pool. After 5 chats, participants complete questions from the Client Satisfaction Questionnaire (CSQ), Self-Reported Attitudes Toward Agent, and the Working Alliance Inventory. Hypotheses: 1) Online chatting and automated systems will be acceptable and feasible means of delivering cognitive reappraisals to survivors. 2) High impression management (IM≥25) and low self-disclosure (SD≤45) will be associated with preference for an automated system. 3) IM and SD will separately moderate the relationship between counselor assignment and participant satisfaction. Results: Ten participants have completed the study. Recruitment is ongoing. We will enroll 50+ participants by 10/2017 and outline findings at the Connected Health Conference. To date, 70{\%} of participants completed all chats within 24 hours of enrollment, and 60{\%} indicated a pre-chat preference for an automated system, suggesting acceptability of the concept. The post-chat CSQ mean total score of 3.98 on a 5-point Likert scale (1=Poor; 5=Excellent) suggests platform acceptability. Of the 50{\%} reporting high IM, 60{\%} indicated preference for an automated system. Of the 30{\%} reporting low SD, 33{\%} reported preference for an automated system. At recruitment completion, ANOVA analyses will elucidate relationships between IM, SD, and counselor assignment. Correlation and linear regression analyses will show any moderating effect of IM and SD on the relationship between counselor assignment and participant satisfaction. Conclusions: Preliminary results suggest acceptability and feasibility of cognitive reappraisals via chat for survivors, and of the automated system counselor concept. Final results will explore relationships between SD, IM, counselor type, and participant satisfaction to inform the development of new platforms for survivors. ", doi="10.2196/iproc.8585", url="http://www.iproc.org/2017/1/e37/", url="https://doi.org/10.2196/iproc.8585" }