This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
Phenotypes characterize the clinical manifestations of diseases and provide important information for diagnosis. Therefore, the construction of phenotype knowledge graphs for diseases is valuable to the development of artificial intelligence in medicine. However, phenotype knowledge graphs in current knowledge bases such as WikiData and DBpedia are coarse-grained knowledge graphs because they only consider the core concepts of phenotypes while neglecting the details (attributes) associated with these phenotypes.
To characterize the details of disease phenotypes for clinical guidelines, we proposed a fine-grained semantic information model named PhenoSSU (semantic structured unit of phenotypes).
PhenoSSU is an “entity-attribute-value” model by its very nature, and it aims to capture the full semantic information underlying phenotype descriptions with a series of attributes and values. A total of 193 clinical guidelines for infectious diseases from Wikipedia were selected as the study corpus, and 12 attributes from SNOMED-CT were introduced into the PhenoSSU model based on the co-occurrences of phenotype concepts and attribute values. The expressive power of the PhenoSSU model was evaluated by analyzing whether PhenoSSU instances could capture the full semantics underlying the descriptions of the corresponding phenotypes. To automatically construct fine-grained phenotype knowledge graphs, a hybrid strategy that first recognized phenotype concepts with the MetaMap tool and then predicted the attribute values of phenotypes with machine learning classifiers was developed.
Fine-grained phenotype knowledge graphs of 193 infectious diseases were manually constructed with the BRAT annotation tool. A total of 4020 PhenoSSU instances were annotated in these knowledge graphs, and 3757 of them (89.5%) were found to be able to capture the full semantics underlying the descriptions of the corresponding phenotypes listed in clinical guidelines. By comparison, other information models, such as the clinical element model and the HL7 fast health care interoperability resource model, could only capture the full semantics underlying 48.4% (2034/4020) and 21.8% (914/4020) of the descriptions of phenotypes listed in clinical guidelines, respectively. The hybrid strategy achieved an F1-score of 0.732 for the subtask of phenotype concept recognition and an average weighted accuracy of 0.776 for the subtask of attribute value prediction.
PhenoSSU is an effective information model for the precise representation of phenotype knowledge for clinical guidelines, and machine learning can be used to improve the efficiency of constructing PhenoSSU-based knowledge graphs. Our work will potentially shift the focus of medical knowledge engineering from a coarse-grained level to a more fine-grained level.
When people are sick, their bodies present a series of observable or perceptible abnormalities, which are called phenotypes. In medicine, the phenotype concept covers signs and symptoms, laboratory test results, and imaging findings [
To date, many structured knowledge bases, such as WikiData [
To precisely represent phenotype knowledge in clinical guidelines, it is necessary to introduce fine-grained semantic information models [
In this work, we aimed to develop a semantic information model that could effectively characterize the details of disease phenotypes for clinical guidelines. A semantic information model named PhenoSSU (semantic structured unit of phenotype) was developed based on the clinical guidelines for 193 infectious diseases from Wikipedia. A total of 12 attributes were included in PhenoSSU, which characterized the details of phenotypes from various aspects. Based on PhenoSSU, we constructed fine-grained phenotype knowledge graphs for these infectious diseases. Considering the increased annotation costs associated with the introduction of PhenoSSU, we also explored the potential of machine learning for performing automatic recognition for PhenoSSU based on free text. It is hoped that our work will contribute to the large-scale construction of fine-grained phenotype knowledge graphs for more diseases.
We collected the clinical guidelines for 193 infectious diseases from Wikipedia [
PhenoSSU, by its very nature, is an entity-attribute-value model that consists of a phenotype concept along with a collection of attributes. Determining the attributes associated with various phenotypes is the key to the design of PhenoSSU. Four inclusion criteria for attributes were considered in this study:
Introduced attribute and value set should come from a standard medical ontology to avoid the arbitrariness of defining new attributes. Systematized Nomenclature of Medicine–Clinical Terms (SNOMED-CT) [
Introduced attribute should be a modifier associated with phenotypes rather than an entity independent of phenotypes. The concepts found in SNOMED-CT were organized into 19 distinct hierarchies. Phenotypes and attributes were mainly located in the clinical finding and qualifier value hierarchies, respectively (
Value set of the introduced attribute should contain categorical variables with limited dimensionality. For example, the severity attribute in SNOMED-CT contains a value set including mild, moderate, and severe. This criterion is for convenience when configuring attributes in the brat rapid annotation tool (BRAT) [
Introduced attribute should occur at least once in the studied corpus. This criterion is for reducing redundancy when introducing many unused attributes.
To effectively find the attributes associated with various phenotypes, we developed a simple co-occurrence–based method for attribute filtering (
Modeling process of PhenoSSU: (A) modeling PhenoSSU based on sentence-level cooccurrences of phenotype concepts and attribute values in clinical guidelines and (B) components of the PhenoSSU model consist of a phenotype concept and 12 attributes.
The final PhenoSSU model contained 12 attributes, which could be classified into 3 categories according to the phenotypic details they characterized (
Expressive power of PhenoSSU: (A) prevalence of the 12 attributes in the studied corpus, (B) examples of precise and imprecise representations for original phenotype descriptions with the PhenoSSU model, and (C) comparisons of precise representation percentages among different information models.
The annotation task of PhenoSSU can be divided into 2 steps: annotating a phenotype and annotating the attributes associated with that phenotype. Some annotation examples of different phenotypes attributes defined in PhenoSSU are presented in
The phenotypes annotated in BRAT were normalized with SNOMED-CT. To facilitate the normalization process, we also leveraged the MetaMap tool to obtain candidate concepts from the SNOMED-CT database and then manually selected the concept corresponding to each query phenotype. There was no need to normalize the attribute values because they were already normalized in SNOMED-CT.
One aspect to note about the normalization process is the special treatment used for finding sites of phenotypes. Finding sites were not explicitly included in the PhenoSSU model because they are entities independent of phenotypes. In SNOMED-CT, there were more than 39,000 concepts of finding sites in the body structure hierarchy, and these were hard to set as a value list in the BRAT. However, finding sites are indispensable information for describing phenotypes. Therefore, we also annotated the entities of finding sites associated with phenotypes. Taking the annotation of “bleeding from the nose and gum” as an example, the entities of the phenotype (bleeding) and two finding sites (nose, gum) were annotated separately and connected with a relation curve named locate (
The manual annotation of a PhenoSSU model is a very time-consuming process because annotators not only need to find the mention of a phenotype but also need to determine the existence of attribute trigger terms in the context surrounding a phenotype. To reduce annotation costs, it is necessary to develop algorithms for the automatic annotation of PhenoSSU models.
The recognition task of PhenoSSU can be divided into 2 subtasks: phenotype concept recognition and attribute value prediction. The first subtask aims to recognize the text spans corresponding to phenotypes, and the second subtask aims to select appropriate values for 12 attributes based on a phenotype’s context.
The 193 annotated clinical guides were randomly divided into a training set and a test set at a ratio of 6:4. For the subtask of phenotype concept recognition, we still used the MetaMap tool, which can recognize phenotype concepts based on the Metathesaurus in the Unified Medical Language System (2020AA release) [
The subtask of attribute value prediction can be regarded as a classification problem, and two machine learning-based models were explored for this subtask. One model was based on a support vector machine (SVM), and the other model was based on a bidirectional long short-term memory (BiLSTM) neural network. For the value classification model of a specific attribute, the input was the encoded feature vectors of a phenotype’s context and the output was one of the normalized values for this attribute.
We chose an SVM for developing attribute value prediction models because SVM-based models have proven their efficiency in the 2010 Informatics for Integrating Biology & the Bedside/Veterans Affairs challenge [
Inspired by recent methodology developments for the assertion status prediction task [
To evaluate the performance of the proposed algorithm in extracting PhenoSSU models from free text, we used the evaluation metrics from SemEval-2015 Task 14: Analysis of Clinical Text [
The evaluation metric for the subtask of phenotype concept recognition was the F1-score. A predicted phenotype concept was regarded as a true positive if its text span overlapped with a gold standard text span. The precision metric was calculated as the fraction of correctly predicted phenotypes among all phenotypes identified by MetaMap, and the recall metric was calculated as the fraction of correctly predicted phenotypes among all phenotypes identified by the annotators. The F1-score was calculated as the harmonic mean of precision and recall.
We chose the average weighted accuracy as the evaluation metric for the subtask of attribute value prediction because the distributions of different attribute values were very imbalanced. The average weighted accuracy metric considers the prevalence of an attribute value in the corpus, so it can measure how good an algorithm is at predicting the rare values of an attribute. The detailed calculating process of the average weighted accuracy can be found in
Since the aim of this work was to develop a semantic information model that was more suitable than current approaches for representing phenotype knowledge in clinical guidelines, it was necessary to evaluate whether the annotated PhenoSSU model could capture the full semantics underlying the original descriptions of phenotypes. For example, in
To evaluate the expressive power of PhenoSSU, we introduced a virtual attribute named “equal to the original description” into the PhenoSSU model. If the annotated PhenoSSU did not capture the full semantics of an original description, we set the value of this attribute to “partial.” Two annotators (TY and SL) independently evaluated the expressive power of the annotated PhenoSSU model. The initial interannotator agreement as measured with Cohen kappa statistic was 0.903 (3631/4020). We reached a consensus for those inconsistent judgments by an adjudication process (TJ).
To characterize the details of phenotypes for clinical guidelines, a semantic information model named PhenoSSU was proposed. With the introduction of 12 attributes associated with various phenotypes, the obtained knowledge graphs based on PhenoSSU were more fine-grained than those based on phenotype concepts. In this work, 193 PhenoSSU-based knowledge graphs for infectious diseases were constructed. At the concept level, we annotated 4020 phenotypic terms, 3962 of which could be normalized with 1508 concepts in SNOMED-CT. At the attribute level, we annotated 5278 nondefault attribute values (“present” was the default attribute value for the assertion attribute, and “none” was the default attribute value for other attributes), which indicated the widespread presence of contextual properties for phenotypes in clinical guides. The most commonly used attributes included assertion, frequency in a population, age specificity, phenotype severity, and temporal pattern (
Since the knowledge graphs in WikiData were also extracted from Wikipedia, we compared our knowledge graphs with those in WikiData at the concept level. WikiData built knowledge graphs for 66 of the 193 diseases, and these graphs included 354 phenotype concepts. Our annotations covered 297 of the 354 (83.9%) phenotypes from WikiData. For the uncovered phenotypes, we could not confirm their existence on the corresponding webpages of Wikipedia (including current and historical webpages). Most of these uncovered phenotypes may come from the manual additions of volunteers, who made use of sources other than Wikipedia (
To evaluate the expressive power of the PhenoSSU model quantitatively, we manually analyzed whether a PhenoSSU instance could capture the full semantics underlying the corresponding descriptions of phenotypes (
In this study, we annotated 4020 PhenoSSU instances, 3757 of which (89.5%) were determined to precisely represent the original phenotype knowledge described by natural language (
With the introduction of attributes, it would take more time to annotate a PhenoSSU model than to annotate phenotype concepts. To increase the efficiency of annotating PhenoSSU models, we developed a hybrid strategy that first recognized phenotype concepts with the MetaMap tool and then predicted the attribute values of phenotypes with SVM-based or BiLSTM-based classifiers (
Automatic recognition of PhenoSSU.
In this work, we designed a fine-grained information model named PhenoSSU, which can precisely represent phenotype knowledge for clinical guidelines. We also developed an automatic strategy to extract PhenoSSU models from clinical guidelines and found that machine learning could be used to improve the efficiency of PhenoSSU annotation. Taken together, our work will provide a useful theoretical and technical guide for the construction of fine-grained phenotype knowledge graphs.
From the design of PhenoSSU, it can be seen that PhenoSSU was derived from SNOMED-CT because both the phenotype concepts and attribute values in PhenoSSU came from SNOMED-CT. PhenoSSU strengthened the expressive power of SNOMED-CT by combining 12 attributes with phenotype concepts. In SNOMED-CT, there was a technique named postcoordination expression [
In recent years, machine learning, especially deep learning, has been widely used for processing medical information [
The improvement of knowledge granularity for disease phenotypes may potentially benefit knowledge-based diagnosis systems because the differential diagnostic capability of a PhenoSSU model is theoretically stronger than that of a single phenotype concept. From the perspective of coarse-grained knowledge graphs, some diseases (eg, the flu and common cold) have many similar symptoms (eg, fever and cough); however, these similar symptoms may have obvious differences from the perspectives of fine-grained knowledge graphs. For example, fever may be present in both flu and common cold. However, fever is more common in flu patients and usually appears suddenly with a body temperature of 38 degrees or above. By comparison, fever is rarely seen in common cold cases and usually appears gradually. Therefore, a diagnosis system cannot exclude the common cold if a patient has fever; however, it can safely exclude the common cold if a patient has such a PhenoSSU instance like “phenotype: fever; temporal pattern: acute; severity: severe.” PhenoSSU-based knowledge graphs should be very suitable for dialogue-based symptom checkers such as babylon [
One limitation of this work is that we only considered the corpus of infectious diseases during the modeling process of PhenoSSU. In addition, we only considered attributes with categorical values and did not consider attributes with numeric values. Another limitation of this study is that we only tested the effectiveness of the PhenoSSU model for 193 infectious diseases, which is a small number considering that thousands of other diseases exist. In addition, attributes suitable for infectious diseases may not be suitable for other types of diseases. We will solve these limitations during the process of constructing PhenoSSU-based knowledge graphs for more diseases in future work.
The annotation guidelines for PhenoSSU and the PhenoSSU-based knowledge graphs for 193 infectious diseases can be found by visiting our website [
PhenoSSU is a fine-grained semantic information model that can precisely represent phenotype knowledge in clinical guidelines, and machine learning can be used to improve the efficiency of constructing PhenoSSU-based knowledge graphs.
Supplementary figures, tables and texts.
bidirectional encoder representation from transformers
bidirectional long short-term memory
brat rapid annotation tool
clinical element model
fast health care interoperability resource
semantic structured unit of phenotype
Systematized Nomenclature of Medicine–Clinical Terms
support vector machine
This work was supported by grants 32070678 and 31671371 from the National Natural Science Foundation of China, and grant EKPG21-12 from Emergency Key Program of Guangzhou Laboratory,and grants 2016-I2M-1-005 and 2020-I2M-2-003 from the Chinese Academy of Medical Sciences Initiative for Innovative Medicine. We sincerely thank colleagues in our lab and experts in the biomedical field for their thoughtful suggestions to improve this work.
None declared.