This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
Named entity recognition (NER) plays an important role in extracting the features of descriptions such as the name and location of a disease for mining free-text radiology reports. However, the performance of existing NER tools is limited because the number of entities that can be extracted depends on the dictionary lookup. In particular, the recognition of compound terms is very complicated because of the variety of patterns.
The aim of this study is to develop and evaluate an NER tool concerned with compound terms using RadLex for mining free-text radiology reports.
We leveraged the clinical Text Analysis and Knowledge Extraction System (cTAKES) to develop customized pipelines using both RadLex and SentiWordNet (a general purpose dictionary). We manually annotated 400 radiology reports for compound terms in noun phrases and used them as the gold standard for performance evaluation (precision, recall, and F-measure). In addition, we created a compound terms–enhanced dictionary (CtED) by analyzing false negatives and false positives and applied it to another 100 radiology reports for validation. We also evaluated the stem terms of compound terms by defining two measures: occurrence ratio (OR) and matching ratio (MR).
The F-measure of cTAKES+RadLex+general purpose dictionary was 30.9% (precision 73.3% and recall 19.6%) and that of the combined CtED was 63.1% (precision 82.8% and recall 51%). The OR indicated that the stem terms of
We developed a RadLex-based customized pipeline for parsing radiology reports and demonstrated that CtED and stem term analysis has the potential to improve dictionary-based NER performance with regard to expanding vocabularies.
The widespread adoption of electronic medical record (EMR) systems in recent years has increasingly brought opportunities to research communities regarding the secondary use of EMR data such as medical images and clinical notes [
In the field of radiology, a large amount of medical imaging data and diagnostic reporting data is stored in the EMR, which has become an important data source for acquiring knowledge. The use of standard code systems is critical for the effective mining of the data source. RadLex, produced by the
NER is usually used for preprocessing unstructured data for machine learning research, for example, extracting features from radiology reports [
Although an ontology such as RadLex can be leveraged to enhance data interoperability and track relationships and hierarchical structure, we consider that the ontology should also be applied to improve the NER of compound terms in radiology reports. However, few studies have been conducted to evaluate the coverage of RadLex for the NER of compound terms for mining radiology reports. To evaluate and extend the coverage of the lexicon for extracting features from radiology reports, the aim of this study is to develop and assess an NER tool based on RadLex, explore the entities included in RadLex, and subsequently extend the ontology for a higher F-measure on feature extraction by dictionary-based NER.
RadLex is a controlled-standard biomedical ontology produced by the
We used a general purpose dictionary (GPD), SentiWordNet [
The clinical Text Analysis Knowledge Extraction System (cTAKES), which is an NLP system for extraction of information from EMR clinical free text, contains an automatic NER tool using a dictionary lookup mechanism [
The Medical Information Mart for Intensive Care-III (MIMIC-III) is a free, open database provided by the Massachusetts Institute of Technology Laboratory for Computational Physiology, which includes approximately 60,000 deidentified admissions of patients at the Beth Israel Deaconess Medical Center from 2001 to 2012 [
The overall goal of our study is to clarify the coverage of RadLex-based dictionaries with compound terms and to construct and evaluate the NER tools that use the RadLex-based dictionaries for mining free-text radiology reports. First, we customized cTAKES to build the RadLex and GPD dictionaries. As previously mentioned, the default dictionaries of cTAKES provided by the UMLS are SNOMED-CT and RxNORM. Second, we combined these three dictionaries in the following patterns: Default, Default+RadLex, and Default+RadLex+GPD. Third, we removed single terms from each dictionary and evaluated their performance. Finally, we carried out the three processes of analysis (step 1 to step 3) to obtain profiles of the stem terms for improving the performance of NER (
Overview of methods. cTAKES: clinical Text Analysis and Knowledge Extraction System; CtED: compound terms–enhanced dictionary; FN: false negative; NPI: noun phrase identification; TP: true positive.
We randomly selected 400 reports of computed tomography (CT), magnetic resonance imaging (MRI), positron emission computed tomography (PET), and radiography (x-ray) from the MIMIC-III database (100 reports for each imaging modality type). These reports were in a free-text format and were categorized into sections; we used the
We first conducted stop word removal and exchanged all the characters to the lower case. Next, we leveraged the AggregatePlaintextProcessor of cTAKES to identify noun phrases in the radiology reports so that we could perform a manual annotation for noun phrases. Next, we applied manual reviews to annotate compound terms. The compound terms were also tagged with all conceivable patterns based on the
First, we created a customized NER tool using cTAKES, which uses a dictionary lookup–based parser for NER. It extracts terms that can be looked up in the installed dictionary. Some previous studies have attempted to create customized dictionaries (eg, UMLS) [
For each customized pipeline, we evaluated the performance of four different sets of the three dictionary patterns using standard measures (ie, precision, recall, and F-measure). The formulas for the measures are as follows:
Here, true positive (TP) is defined as the number of manual annotations matched with the dictionary phrases, false positive (FP) is defined as the number of dictionary phrases matched with entities other than manual annotations, and false negative (FN) is defined as the number of annotations not matched with the dictionary phrases. We also evaluated the performance of four major imaging modalities: CT, MRI, x-ray, and PET. GATE (General Architecture for Text Engineering) developer version 8.4.1 [
We also created a compound terms–enhanced dictionary (CtED) to improve performance (
To obtain the full benefit of using RadLex, which is an ontology-based tool, we created 2 measures for a stem term. We first defined a measure called the occurrence ratio (OR) to determine the frequency of stem terms in TPs and FNs from step 2. The OR gives priority measures to add compound terms with stem terms into RadLex. For example, if the value of the OR for a stem term in TPs is high, it means that the number of compound terms (containing the stem term) that are correctly identified by the pipeline is high. In contrast, if the value of the OR for the stem term in FNs is high, it means that the number of compound terms (containing the stem term) that are identified as negative by the pipeline is high. Moreover, if a high OR stem with both TP and FN is identified, we can hypothesize that this stem shows that there is a high demand to extract the entity of reports but still lacks the compound terms having the stem. In short, the OR can visualize a profile of the demand and supply of stem term–oriented compound terms in the corpus.
Second, we defined a measure called the matching ratio (MR) to describe the distribution of stem terms in FNs that are matched with the dictionaries. The MR (%) was calculated using the formula presented below. The MR can guide the basic concept of the RadLex or SNOMED-CT (cTAKES default dictionary) concept that matches the stem terms. For example, if a stem term of
The F-measure of the pipeline with the dictionaries Default+RadLex+GPD for compound terms was nearly the same as that of the pipeline with the dictionaries Default+RadLex (31.5% vs 31.4%;
F-measure, precision, and recall of each dictionary (step 1: number of reports=400).
Dictionaries | F-measure, % | Precision, % | Recall, % |
Default | 27.9 | 93.4 | 16.4 |
Default+RadLex | 31.4 | 94.9 | 18.8 |
Default+RadLex+GPDa | 31.5 | 93.2 | 19 |
aGPD: general purpose dictionary.
F-measure, precision, and recall of each dictionary (step 2: number of reports=100).
Dictionaries | F-measure, % | Precision, % | Recall, % |
Default+RadLex+GPDa without enhancement | 30.9 | 73.3 | 19.6 |
Default+RadLex+GPD with enhancement | 63.1 | 82.8 | 51 |
aGPD: general purpose dictionary.
Regarding each imaging modality (
F-measure of the compound terms–enhanced dictionary of each modality.
Modality | cTAKESa+RadLex+GPDb (%) | cTAKES+RadLex+GPD+CtEDc (%) |
Computed tomography | 33.5 | 62.4 |
MRId | 30.7 | 63.6 |
PETe | 30.3 | 63.4 |
x-ray | 26.7 | 64.3 |
All | 30.9 | 63.1 |
acTAKES: clinical Text Analysis and Knowledge Extraction System.
bGPD: general purpose dictionary.
cCtED: compound terms–enhanced dictionary.
dMRI: magnetic resonance imaging.
ePET: positron emission computed tomography.
Top five occurrence ratios in each imaging modality.
Modality | Stem | ORa, n (%) | |||
|
|||||
|
|
||||
|
|
lobe | 100 (8.87) | ||
|
|
effusion | 59 (5.24) | ||
|
|
node | 50 (4.44) | ||
|
|
artery | 39 (3.46) | ||
|
|
hemorrhage | 37 (3.28) | ||
|
|
||||
|
|
change | 125 (3.54) | ||
|
|
collection | 98 (2.77) | ||
|
|
lesion | 95 (2.69) | ||
|
|
effusion | 94 (2.66) | ||
|
|
evidence | 69 (1.95) | ||
|
|||||
|
|
||||
|
|
artery | 146 (17.38) | ||
|
|
lobe | 49 (5.83) | ||
|
|
sinus | 29 (3.45) | ||
|
|
matter | 20 (2.38) | ||
|
|
body | 20 (2.38) | ||
|
|
||||
|
|
change | 176 (4.72) | ||
|
|
lesion | 144 (3.86) | ||
|
|
enhancement | 132 (3.54) | ||
|
|
evidence | 95 (2.55) | ||
|
|
study | 89 (2.38) | ||
|
|||||
|
|
||||
|
|
node | 192 (17.1) | ||
|
|
lobe | 102 (9.08) | ||
|
|
gland | 69 (6.14) | ||
|
|
nodule | 39 (3.47) | ||
|
|
disease | 36 (3.21) | ||
|
|
||||
|
|
uptake | 567 (12.04) | ||
|
|
node | 250 (5.31) | ||
|
|
lesion | 180 (3.82) | ||
|
|
avidity | 169 (3.59) | ||
|
|
disease | 157 (3.33) | ||
|
|||||
|
|
||||
|
|
effusion | 46 (14.24) | ||
|
|
tube | 37 (11.45) | ||
|
|
lobe | 27 (8.36) | ||
|
|
edema | 18 (5.57) | ||
|
|
lung | 17 (5.26) | ||
|
|
||||
|
|
effusion | 117 (9.15) | ||
|
|
tube | 69 (5.39) | ||
|
|
opacity | 67 (5.24) | ||
|
|
pneumothorax | 62 (4.85) | ||
|
|
line | 57 (4.46) |
aOR: occurrence ratio.
bTP: true positive.
cFN: false negative.
dMRI: magnetic resonance imaging.
ePET: positron emission computed tomography.
In addition, the most frequent FPs that were removed from the cTAKES+RadLex+GPD dictionaries were
The ORs of the TPs and FNs in each imaging modality (step 3) are shown in
Occurrence ratio of true positives and false negatives in each imaging modality. CT: computed tomography; FN: false negative; MRI: magnetic resonance imaging; PET: positron emission computed tomography; TP: true positive.
Classification of stem terms in false negatives based on cTAKESa, RadLex, and combined dictionary (n=13,098).
Stem terms | Class | Proportion, n (%) |
|
||
|
N/Ac | 6349 (48.47) |
|
Body structure | 1428 (10.9) |
|
Over two categories | 1265 (9.66) |
|
Qualifier value | 935 (7.14) |
|
Clinical finding | 878 (6.7) |
|
SNOMED-CT model component | 723 (5.52) |
|
Procedure | 721 (5.5) |
|
Environment or geographical location | 217 (1.66) |
|
Physical object | 206 (1.57) |
|
Substance | 143 (1.09) |
|
Other | 233 (1.78) |
|
||
|
N/A | 6893 (52.63) |
|
Clinical finding | 1839 (14.04) |
|
Imaging observation | 1508 (11.51) |
|
Process | 1000 (7.63) |
|
Anatomical entity | 997 (7.61) |
|
Property | 295 (2.25) |
|
RadLex descriptor | 248 (1.89) |
|
Object | 210 (1.6) |
|
Procedure | 91 (0.69) |
|
Imaging modality | 11 (0.08) |
|
Nonanatomical substance | 5 (0.04) |
|
Report component | 1 (0.01) |
|
||
|
cTAKES+RadLex | 9411 (71.85) |
|
N/A | 3687 (28.15) |
acTAKES: clinical Text Analysis and Knowledge Extraction System.
bSNOMED-CT: Systematized Nomenclature of Medicine-Clinical Terms.
cN/A: not applicable.
In this study, we first constructed RadLex-based NER tools for mining free-text radiology reports and evaluated the coverage of the pipelines (step 1). Second, we built a CtED extracted from the FNs of step 1 to improve performance (step 2). Third, we defined OR and MR to consider the potential of expanding the dictionary using RadLex ontology (step 3).
First, the performance of cTAKES+RadLex+GPD was 30.9% (precision 73.3% and recall 19.6%) on its own and 63.1% (precision 82.8% and recall 51%) with the CtED. The CtED for compound terms increased the F-measure by 32.2%, but the F-measure was not obviously changed by the GPD (31.4% vs 31.5%). This indicated that the GPD did not cover the specific compound terms in radiology reports different from the single words. The merit of using RadLex is that we can use the standard vocabularies and relationships such as
Our tool using cTAKES was able to customize dictionaries by creating a BSV file, which provides a convenient way to leverage those vocabulary resources that are not covered by the default dictionary. In addition, the BSV file stores IDs that can be used to track the parent concepts for a particular term, which enables the classification or profiling of extracted terms using high-level concept classes defined in a vocabulary.
The OR provides profiles of
In contrast, our approach is based on an ontology, which enables interoperable processing and data mining of reports. For example, when we identify the term
The limitation of this study is that our pipeline is optimized for identifying short compound terms because we divided compound terms using stop words such as
The annotation tool GATE that we used can identify a partial match with TPs, which means that the types of NER are the same, but the span is not the same. In this study, such partial positives were treated as FNs. We reviewed these uncertainty negatives based on the rule of the stem words and found that 35.4% (90/254) of the partial positives had the potential to change into TPs. This was equivalent to 0.7% of the increased F-measure (cTAKES+RadLex+GPD+CtED). The details of the partial match require further analysis.
The study by Jiang et al [
Lately, Word2Vec technology has been explored for generating synonyms and expanding the radiology-specific dictionary [
In this study, we developed a customized NER tool based on RadLex for the recognition of technical terms. We demonstrated that the CtED and stem term analysis have the potential to improve the performance of the dictionary-based NER with regard to expanding vocabularies.
bar-separated value
conditional random field
computed tomography
clinical Text Analysis and Knowledge Extraction System
compound terms–enhanced dictionary
electronic medical record
false negative
false positive
General Architecture for Text Engineering
general purpose dictionary
matching ratio
magnetic resonance imaging
named entity recognition
natural language processing
occurrence ratio
positron emission computed tomography
parts of speech
Systematized Nomenclature of Medicine-Clinical Terms
true positive
Unified Medical Language System
None declared.