This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
Physicians intuitively apply pattern recognition when evaluating a patient. Rational diagnosis making requires that clinical patterns be put in the context of disease prior probability, yet physicians often exhibit flawed probabilistic reasoning. Difficulties in making a diagnosis are reflected in the high rates of deadly and costly diagnostic errors. Introduced 6 decades ago, computerized diagnosis support systems are still not widely used by internists. These systems cannot efficiently recognize patterns and are unable to consider the base rate of potential diagnoses. We review the limitations of current computer-aided diagnosis support systems. We then portray future diagnosis support systems and provide a conceptual framework for their development. We argue for capturing physician knowledge using a novel knowledge representation model of the clinical picture. This model (based on structured patient presentation patterns) holds not only symptoms and signs but also their temporal and semantic interrelations. We call for the collection of crowdsourced, automatically deidentified, structured patient patterns as means to support distributed knowledge accumulation and maintenance. In this approach, each structured patient pattern adds to a self-growing and -maintaining knowledge base, sharing the experience of physicians worldwide. Besides supporting diagnosis by relating the symptoms and signs with the final diagnosis recorded, the collective pattern map can also provide disease base-rate estimates and real-time surveillance for early detection of outbreaks. We explain how health care in resource-limited settings can benefit from using this approach and how it can be applied to provide feedback-rich medical education for both students and practitioners.
Two main questions are key when evaluating a patient in the context of constructing a differential diagnosis: The first is “How representative is the presentation of the patient to a set of manifestations of a known disease?” In other words, to what degree is there a match between a set of symptoms, signs, and laboratory results and the clinical features of the disease. The second is “What is the likelihood of encountering that disease in a patient like this?” Answering this question requires knowing the base rate (ie, incidence) of the disease and accounting for any patient risk factors that may alter the patient’s prior probability of having the disease.
Good clinicians are characterized by their ability to cluster findings around a single process or cause. Their intuitive clinical assessment heavily relies on pattern recognition [
However, a perfect match between a patient’s presentation and a typical clinical picture of a disease is no guarantee that the patient indeed has that disease. The aphorism coined by Dr Theodore Woodward, “When you hear hoofbeats, think of horses not zebras” [
Although probabilistic reasoning is key to medical diagnosis, physicians, like humans in general, perform poorly in this aspect; probability overestimation and low between-physician agreement are common [
Attempts to develop efficient computer-aided diagnosis support systems (DSSs) [
Poor specificity of DSSs is reflected by the large number of possible diagnoses suggested. Berner and coworkers [
Indeed, by design, DSSs focus primarily on sensitivity at the expense of specificity [
Current DSSs cannot efficiently match patients and diseases on patterns, since they rely on a unidimensional projection of clinical information; typically, the system uses a vector of “findings” (symptoms, signs and laboratory results) provided by the user to generate a differential diagnosis. Some systems differentiate between acute and more prolonged processes [
An incomplete knowledge base further limits DSS performance. INTERNIST-1 included 570 diseases [
Almost 60 years after Ledley and Lusted [
Finally, DSSs do not align well with clinicians’ work flow. A few DSSs now offer variable degrees of direct connectivity to the electronic health record (EHR) [
Examples of the view of a set of findings in a patient by a physician and a traditional diagnosis support system (DSS): chest pain and shortness of breath (upper panel), fever and rash (lower panel). Temporal and semantic interrelations between findings are crucial in putting findings in the right clinical context. RMSF, Rocky mountain spotted fever.
Reviewing DSSs in 1994 [
We may understand, in theory, how to develop systems that take into account gradations of symptoms, the degrees of uncertainty...the severity of each illness under consideration, the pathophysiologic mechanisms of disease, and/or the time courses of illness. However, it is not yet practical to build such broad-based systems for patient care.
More recently, Weber et al note that “industries have figured out...that big data becomes transformative when disparate data sets can be linked at the individual person level” [
Here we portray NGDSSs and provide a conceptual framework for their development (
An integrated approach to computer-aided diagnosis. The process addresses the 2 questions that lead to likely diagnoses: Left side: How similar is the presentation of the patient to a set of manifestations of a known disease? Right side: What is the likelihood of encountering that disease in a patient like this?
For hundreds of years, physicians have been documenting their thoughts in the form of free text. Computer systems work best with structured data, but free-text input prevails even in the age of EHRs. A digital form of Ledley and Lusted’s learning device could provide a structured representation of a patient’s symptoms, extracted using natural language processing, as they relate to a disease. However, such a representation would, again, be of the “bag of findings” kind, as natural language processing techniques cannot reliably generate structured representations of complex clinical concepts documented in the EHR notes. In particular, this is true for temporal and semantic interrelations between symptoms and signs, which are key in forming the clinical patterns recognized by physicians. Thus, new ways to provide DSSs with a structured clinical picture are needed.
We suggest that a structured, higher cognitive-level patient representation can be constructed in real time through a (graphical) machine-physician interaction. We refer to this representation as a “structured presentation pattern” or “structured pattern.” A structured pattern can be thought of as a model, which can represent physician knowledge and reasoning in a machine-interpretable format. A structured pattern should ideally represent key symptoms and signs associated with a particular patient’s presentation and their temporal and semantic interrelations. This allows for translation of a list of findings (symptoms and signs) into multiple distinct structured patterns according to the temporal course of the disease and other relations between findings. Through this approach, a differential diagnosis constructed by NGDSSs is likely to be more specific than one based on a list of findings.
The creators of the pioneering INTERNIST-1 attributed its insufficient clinical reliability in part to it’s being temporally naive [
One piece of evidence can be the cause of another, one may support or contradict the other, or one may be more reliable than another. Interpretation of a symptom or sign is ever dependent on the clinical context, which is, in a sense, a sum of all such interrelations. For example, a patient with suspected brucellosis may be unsure of having consumed potentially unpasteurized milk products in the preceding weeks. A physician auscultating the heart may hear an extra sound during diastole but have doubts as to whether this is an opening snap or a third heart sound. A patient’s record may document contradicting views of the etiology of a prior illness (eg, convulsion vs transient ischemic attack). Making at least some of these semantic or contextual interrelations interpretable by NGDSSs is likely to improve their performance on pattern recognition.
Patient notes include protected health information, which is why individual medical records cannot be readily shared. Free text can only be considered to be deidentified after it has been manually reviewed. In contrast, user-generated structured patient patterns are readily automatically de-deidentifiable.
This opens the way to
Disease prevalence by parameters such as age, sex, race, and geotemporal distribution can be extracted from various sources, including published reports, large EHR repositories, administrative claims data, social media, and environmental data (eg, weather). These sources can feed an NGDSS knowledge base. Patient demographic data automatically extracted from the EHR can personalize prior-probability estimates. Few findings typical of a particular disease are invariably present in every case of it. The probability of a certain symptom occurring, a certain sign being noted, or a particular laboratory abnormality being found in a given disease is available from published reports [
DSSs partly rely on the fact that disease manifestations change relatively little over time, yet as new diseases arise (with obvious examples being human immunodeficiency virus and Zika virus infections) and new disease correlates are found (eg, genetic traits), continuous updates are necessary [
Using structured patterns, crowdsourcing of knowledge collection and reuse becomes possible. Crowdsourcing may be a sustainable strategy in a reality of exploding knowledge and limited resources (
Each time a physician adds a patient pattern (subsequently labeled with a diagnosis code assigned to that patient), the NGDSS knowledge base is enriched. An initial core body of knowledge may be manually curated by translating disease entries in a textbook into structured patterns of diseases (
Generating a real-time structured representation of a patient presentation supports a computer-aided diagnostic process (blue arrows) and a learning health care system through knowledge reuse (gray arrows).
Structured patient and disease representation. (A) A simulated view of an electronic health record with admission notes. Key terms are highlighted automatically using a real-time natural language processing engine or marked by the user. (B) Selected terms are then manipulated by the user by means of a touch screen to create a pattern representing key temporal and semantic interrelations between terms in a structured format. This pattern is augmented by automatically extracted relevant clinical data, demographics, and other metadata. (C) Structured patterned representation of the manifestations of Rocky Mountain spotted fever derived from a review article. Applying analytics schemes for assessing patient and disease similarity in context of disease prevalence can inform the generation of a ranked differential diagnosis for the patient in question.
Considerations beyond the prior probability of potential diagnoses on the differential diagnosis list come into play when making clinical decisions on investigation and treatment. The most probable disease may be of little practical importance to the patient’s outcome. On the other hand, missing the diagnosis of a severe, albeit less-likely, disease on the differential diagnosis may have grave consequences. Thus, the test and test-treat thresholds [
Likewise, the degree of urgency of conditions on the differential diagnosis list also has practical implications. Some conditions are considered medical emergencies (eg, stroke, malignant hypertension, or myocardial infarction) and require immediate measures to be taken by the physician, whereas in others the course and outcome are not changed by delaying treatment. In presenting information to the user, an NGDSS may indicate the need to act fast when such conditions are considered.
NGDSSs should provide next-step advice to optimize the diagnostic workup. Listing questions that, if answered, could narrow the differential diagnosis can be useful. Performance measures of diagnostic tests, contraindications for their use, and complication rates could be incorporated in their knowledge bases [
Experience with current DSSs shows that their use is hindered by poor alignment with the clinical workflow. Most DSSs require at least some degree of redundant input of clinical information. NGDSSs must seamlessly integrate with EHR systems. Cognitive computing approaches can facilitate the interaction of physicians with NGDSSs. For example, a structured pattern may be interactively created using graphic user interfaces and touch screens (see
1.
2.
3.
A Bayesian network could be continuously trained to match a new patient pattern on a large set of existing patterns, and to rank the diagnoses to which similar patterns are attributed by their prior probability.
Implementation of the proposed approach for NGDSSs requires major health care stakeholders to make substantial, prolonged, and coordinated efforts. To bring NGDSSs to life, major technical and regulatory challenges will have to be met. Here, we mention some of the barriers NGDSSs face and propose ways to overcome them.
The medical domain is characterized by tight regulation of knowledge to assure quality. In this sense, crowdsourcing is an unorthodox approach. While offering access to much more knowledge than is possible using traditional methods, crowdsourcing carries an obvious risk of collecting unreliable information. Labeling structured patterns with the diagnosis subsequently made would be accurate in some cases; however, with misdiagnosis being not uncommon [
Real-time sharing of structured patterns would not be possible unless authorities and other stakeholders are convinced that patient privacy is protected. The use of structured patterns eliminates the need to manually deidentify clinical notes and may facilitate sharing. However, the use of many different formats for presenting clinical data will require efforts to align EHR data from various products. The widely used Observational Medical Outcomes Partnership common data model [
Medical students are encouraged to find a single disease that would explain a patient’s symptoms and signs. With an ageing population, comorbidities and polypharmacy are common. Comorbid conditions and medications used to treat them may alter the manifestations of a disease; the interplay between multiple factors with potential bearing on the clinical picture may make it impossible to attribute a pattern to a single etiology. This is true for an NGDSS but also, in many cases, for clinicians as well. For example, shortness of breath in the context of a respiratory tract infection may be caused by pneumonia, but in a patient with known heart failure, decompensation with pulmonary congestion may also explain the symptoms. Data extracted from the EHR of a patient and anchored to his or her structured pattern may help gain better understanding of the clinical picture. For instance, a patient’s problem list and past laboratory results may put current findings in the right context. Unfortunately, variable quality of EHR data and mixture of clinical and billing information may limit the degree to which uncertainty could be reduced. New tests enhance our knowledge but may bring instances where current medical practice is simplistic to the surface. For instance, in a recent retrospective analysis, almost 5% of patients with a molecular diagnosis had 2 to 4 diagnosis accounting for their phenotype [
Admittedly, at least in the foreseeable future, even the most user-friendly NGDSSs would require clinicians to invest time in acquainting themselves with their use and interacting with them in the clinical setting. This is a challenge for work-overloaded physicians, many of whom do not trust DSSs. Attempts to structure history taking (eg, [
NGDSSs can realize the vision of Ledley and Lusted [
There are probably as-yet unidentified diseases and syndromes. Some syndromes go unnoticed due to their rarity, and those could be identified through analysis of very large datasets. A structured clinical pattern can serve as an anchor for all patient-related structured data (eg, laboratory results, imaging tests) in a patient record. The result would be a rich representation of the manifestations of the patient’s disease(s), in which laboratory and imaging results are put in clinical context. This can help us understand accumulating genetic, proteomic, and microbiomic information and its association with clinical disease at the individual patient level. Integrated knowledge can help break down syndromes (eg, systemic lupus erythematosus and inflammatory bowel disease) to their underlying causes. Cohorting patients with similar structured patterns could potentially support more accurate outcome prediction and more reliable detection of adverse reactions to medications and other interventions.
Apprenticeship is a major pillar in the training of clinicians, appreciating that effective learning takes place through practice and direct, immediate feedback [
Users’ Internet activity has been shown to detect disease outbreaks before regulatory agencies can detect them [
Populations living in limited-resource settings are typically underrepresented in published medical reports. Cultural issues, limited availability of health professionals and diagnostic tests, and other factors may influence the ways diseases are first encountered and diagnosed by clinicians in such settings. Indeed, diagnostic errors in primary care are more common in low- and medium-income countries [
Computer-aided diagnosis has for decades been the Holy Grail of medical informaticians. The extreme complexity of constructing an efficient and sustainable system is reflected by the infrequent clinical use of DSSs despite the vast efforts that have been put into developing them.
On the one hand is an expanding domain knowledge, increasingly complex patients, and a high burden of diagnostic errors. On the other, EHR systems have become ubiquitous; powerful computers enable sophisticated analytics; the Internet can connect physicians from around the globe in real time; and human-computer interaction technologies have ripened. Taken together, there is both a real need for NGDSSs and the technology to meet it. We are laying a conceptual framework for developing NGDSSs that relies on structuring clinical notes; real-time sharing of patient structured patterns; democratization of knowledge generation, maintenance, and reuse; and integration of epidemiologic data to support the complicated task of making a diagnosis.
Development of NGDSSs will be very demanding, yet we argue that their potential utility justifies the investment required to realize them. The future of computer-aided medical diagnosis lies ahead and will likely change the way medicine is practiced.
diagnosis support system
electronic health record
next-generation diagnosis support system
No financial support was provided for this study.
The authors thank Peter Santhanam, Yishai Feldman, and Mohammad Sandhogi-Hamedani for their input.
An abstract (poster) related to this work was presented at the Diagnostic Error in Medicine 1st European Conference in Rotterdam, the Netherlands on June 30, 2016.
AC is employed by IBM. JJC has no conflicts of interest to disclose.