This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
Many comprehensive cancer centers incorporate tumor documentation software supplying structured information from the associated centers’ oncology patients for internal and external audit purposes. However, much of the documentation data included in these systems often remain unused and unknown by most of the clinicians at the sites.
To improve access to such data for analytical purposes, a prerollout of an analysis layer based on the business intelligence software QlikView was implemented. This software allows for the real-time analysis and inspection of oncology-related data. The system is meant to increase access to the data while simultaneously providing tools for user-friendly real-time analytics.
The system combines in-memory capabilities (based on QlikView software) with innovative techniques that compress the complexity of the data, consequently improving its readability as well as its accessibility for designated end users. Aside from the technical and conceptual components, the software’s implementation necessitated a complex system of permission and governance.
A continuously running system including daily updates with a user-friendly Web interface and real-time usage was established. This paper introduces its main components and major design ideas. A commented video summarizing and presenting the work can be found within the Multimedia Appendix.
The system has been well-received by a focus group of physicians within an initial prerollout. Aside from improving data transparency, the system’s main benefits are its quality and process control capabilities, knowledge discovery, and hypothesis generation. Limitations such as run time, governance, or misinterpretation of data are considered.
In recent years, hospitals have been gradually transitioning from paper-based toward electronic documentation systems. The ongoing digitalization of routine data often results in the creation of large and comprehensive datasets, which can, under the right circumstances, open doors to further analysis and research [
The development was tested within a focus group of 10 physicians at the University Hospital of the Ludwig-Maximilians-University (LMU) in Munich. The system was set up together with its partnering site, the University Hospital of the Technical University Munich, Rechts der Isar, with both sites sharing the Cancer Retrieval Evaluation and Documentation System (CREDOS) as their local tumor documentation system. These systems allow the institutes to compile and track most oncology-relevant data [
The primary purpose of the database is to measure specific key performance indicators such as summarizing the number of cancer cases of a specific organ, which thus serve as indicators of the eligibility of the sites to become, or remain, a certified center according to the German ONKOZERT guidelines [
The analysis layer was recently (October 2018) rolled out and tested within the previously mentioned focus group at the Comprehensive Cancer Center Munich
(CCCM)-LMU site. We refer to this analysis platform as the Munich Online Comprehensive Cancer Analysis platform (MOCCA). This paper describes how the system was rolled out at our site and explains the major components of this analysis platform, and should thus serve as an inspiration for other institutions interested in making their data more accessible and transparent. Focusing on how to manage and organize large sets of oncology-related data, we present a variety of innovative ideas in terms of browsing, handling, and visualizing large cohorts of medical data, while addressing challenges that arose during the development.
To provide general access to the data, the first step was to set up a server within the clinical intranet, which continuously runs an instance of the QlikView Enterprise Edition. The administrative user interface of the Enterprise Edition allows for data loading routines. Thus, a daily routine was established and implemented, which imports the whole CREDOS dataset into the MOCCA system. QlikView was chosen because it comes with a toolbox, enabling the construction of the contents that can be saved within a proprietary data container (*.qvw file) [
To control the Web access, we set up a connection to the hospital’s active directory using a lightweight directory access protocol (LDAP) [
The next major step involved in setting up the platform was the conversion of the CREDOS data model into a new QlikView data model. This involved interconnecting all tables by common, unique identifiers, similar to a standard relational database. We tried to stick as closely to the original data model of CREDOS as possible, and only incorporated changes when needed. When completed, this step enabled the system to fully unveil the in-memory database capabilities after loading the data. At this point, an end user can select, via a mouse click, a specific data field within the Web view (eg, male or female within the gender field). After selecting the relevant data field, all other data tables react in real time and restrict the contents to the selected cohort. In this way, it is possible, in a matter of seconds, to create different data cohorts constricted by arbitrary filters. Most normal business intelligence or query systems offer query interfaces where different filters can be applied. However, this interface is not necessary in QlikView since all data tables and fields displayed within the Web view can directly be used as a cohort filter. QlikView offers an additional functionality through which it shows the data model. A current view of the data model in our latest release is shown in
Screenshot of the current data model illustrating its complexity. The tables are framed in blue, displaying all of their included fields. The tables are linked via specifically chosen key attributes (eg, patient, tumor, or document ID).
We first organized both the data and the analysis layer in the same *.qvw data file. Subsequently, we decided to strictly separate the data modeling and analysis contents into two segregated QV files. The primary cleaning and preprocessing of the CREDOS data model into QlikView is done in the base module (base.qvw), while more advanced preprocessing as well as further analyses are done in the second module (analysis.qvw). This allowed us to simultaneously improve the data model along with the analytical tools and modules.
As mentioned above, the CREDOS data model is complex, comprising more than 1000 individual data fields, which themselves can have hundreds of different data values (eg, the data field diagnosis can have hundreds of different International Statistical Classification of Diseases and Related Health Problems [ICD]-10 codes [
First assessment
Diagnosis
TNM (tumor-node-metastasis; important tumor staging system)
Classifications
Patient-based/related data
Cohort view
Single patient view
Therapies
General information
Operations
Systemic therapies
Radiation
Progression (follow up)
Trial metadata
Survival
For most categories (including subcategories), we implemented detailed and comprehensive views, including embedded tools that enable easily browsing and visualizing category-specific contents. The views were designed as different tabs, analogous to a standard website navigation bar (see
For each of the remaining categories shown in
As shown in
The image (screen language in German) displays two different selections within the first assessment (diagnosis) view. By simply clicking on different buttons, in this case “tumor location/side” in (a) and “discussion of a chosen case within a tumor board” in (b), it is possible to specifically select and examine the number of cancer cases with the chosen feature (in the chart stratified by date of diagnosis).
With regard to the graphical objects, in addition to reactive standard descriptive techniques such as bar charts or pie charts, we used a QlikView add-on (svgReader) [
SVG-based organ map. Based on the diagnosis code (ICD-10), the map displays the relative amount of documented cases to each other as well as selected (via mouse clicks) ICD-10 groups (layer 2), or even specific organs (layer 3). In this example, the gastrointestinal subgroup had been selected by clicking within layer 1 (C15-C26), followed by clicking on the liver within layer 2, thereby restricting the cohort within module to C22* (liver) and displaying the relative abundance of affected segments within the liver (layer 3). A fully saturated color indicates the most commonly documented segments within the SVG, whereas lower levels of saturation linearly correspond to the amount of documentation for the associated segments. SVG: scalable vector graphic; ICD-10: International Statistical Classification of Diseases and Related Health Problems-10.
Next, we created two more complex, integrated modules. The first module displays therapy timelines and the second focuses on survival. The survival module and its implementation were previously presented at the International Conference on Informatics, Management, and Technology in Healthcare 2019 [
Median start at which a tumor of a chosen cohort has been treated with a specific therapy. The x-axis shows the time (in days), while the y-axis displays how many tumors have been treated in this cohort. In this example, for a cohort of 607 tumors, 563 have been treated with surgery (median after zero days), 116 with drug therapy (median after 59 days), and 73 with radiotherapy (median after 109 days).
As mentioned above, when granting access to the platform, we had to conform to European as well as to local Bavarian laws of privacy protection [
The Bavarian Law of Hospitals (Bayerisches Krankenhausgesetz (BayKRG) Art 27– Datenschutz (4)) served as the legal basis of our permission system [
This system functions for clinicians working for organ-specific treatment centers (eg, a women’s hospital) or clinicians who are involved in the care of patients suffering from tumors of a specific organ. However, this system does not take into consideration clinicians working in interdisciplinary areas such as radiology. Therefore, we also included the possibility to restrict according to specific types of therapy (radiation, surgery, or drug therapy). Consequently, a radiologist with given permission would only be enabled to view data of patients who had indeed received radiotherapy and had been treated at the radiologist’s center. Hence, our system provides the physicians no additional information than they would normally be allowed to access. However, instead of having to sift through the information in all of the individual doctors’ reports, they are now able to directly access the aggregated data extracted from these reports. Hence, QlikView basically facilitates analysis, and helps with visualizing the cohort for which they are already responsible.
Such permission arrangements explain how the system has currently been rolled out and how it has been accepted by the privacy protection commissioner of the hospital. We here turn to describing how the permission system should be extended in the future. According to the Bavarian Law of Hospitals, clinicians are also allowed to share data (eg, for research purposes) within the hospital. This is more of a governance problem and has not yet been implemented within our current system. However, for the sake of scientific progress, clinicians interested in organ-specific data should be allowed to request permission to access relevant data even for cases in which they were not part of the original patient care. Representatives of the organ center would be members of a committee that could initially process such requests. The request will then be referred to a board consisting of the initial committee members along with organ center–independent members of the overreaching comprehensive cancer center. If all parties of the committee accept the request, extended data access will be granted. However, extended data access would only be given in a pseudonymized form, since full data access seems only reasonable for clinicians directly involved with a patient’s care.
The permission system, including the not-yet implemented extended data access, is organized via a permission table, which directly controls the contents that may be shown to a given user and within which timeframe.
Example of a permission table for user access.
User | Diagnosis |
Center | Radiation | Operation | System-therapy | Pseudo |
dnasseh | C34 | LTCb | *c | * | * | * |
sopsch | * | * | STRd | * | * | * |
sopsch | C22 | * | * | * | * | PSDe |
aICD-10: International Statistical Classification of Diseases and Related Health Problems-10.
bLTC: lung tumor center.
c* represents a wildcard, meaning the user has full rights to view the contents within this column.
dSTR: radiation.
ePSD: pseudonymized.
We established a platform that is accessible through the clinical intranet via a Web browser and does not require the installation of additional software at the end users’ sites. The data within the platform are updated daily, and provide preprocessed, compact visual access to the vast majority of the CREDOS contents. The platform can only be accessed after a single user license has been acquired. Based on this, the data from cohorts that the users can view are limited by a permission system, which was developed in parallel to the technical implementation. A nonlegal contract describes the rules for licensing and accesses to the platform. Before each login, a disclaimer has to be ticked (see
We structured the contents into six main categories, five of which (first assessment, patient baseline data, therapies, progression, survival) have been included in the first rollout version. To facilitate understanding of this complex system,
As it is hard to describe the dynamic analytical possibilities of the platform, we provide a 15-minute-long commented video in
One of the major benefits of the MOCCA system is transparency. Until creation of the framework, clinicians themselves did not have the option to directly access aggregated CREDOS data. Instead, they had to assess individual patient records or send a request for help to the local information technology team. This process restricts interest in the data. Hence, for most physicians, the CREDOS dataset is comparable to a black box that is primarily used by documentation clerks and the information technology department. Thus, most clinicians were not aware of the rich contents of the database. The system was rolled out and evaluated at one of the partnering sites (CCCM-LMU). After providing the doctors access to the platform within our prerollout phase, we received extensive feedback about the contents of our database.
This feedback reflected the high quality of the data, along with areas of further improvement for some aspects. Along these lines, due to the richness of charts and graphs, it is very easy to spot incorrectly documented information. Since a doctor can browse through the data, they will quickly realize if any data are missing or not documented in a correct manner. As an example, some wrongly documented dates could easily be spotted as they showed up within the therapy time chart as a negative time value on the x-axis. Spotting inconsistencies is of high relevance, since high data quality is one of the requirements for clinical research and is a precondition for clinical trials or network activities such as those of the national Network Genomic Medicine for lung cancer, which locally relies on correct, complete, and valid tumor documentation data [
Improving the data quality itself is only one of the purposes of MOCCA; it can also support the control of processes of routine care and identify potential risks. Regarding this aspect of quality control, clinicians did not have the right tools to directly and quickly assess whether quality of care and associated processes were acceptable. As the centers are certified once a year by OnkoZert [
When critically assessed, a potential shortcoming of the system is misinterpretation caused by visual inspection of data, without considering the influences of confounding factors, sample size, and sample bias [
Although our system presents and analyzes data in graphic detail, we recommend that the results and conclusions mined from our system should always be examined with the support of statisticians, medical computer scientists, and in comparison to larger datasets (eg, epidemiological registries) [
In addition to the means and methods with which this system can support research, the system can also quickly and easily provide numbers for formulating scientific proposals. Furthermore, it can be used to discover as-yet-unknown information, also referred to as knowledge or data discovery, that can contribute to creating research ideas by quickly browsing through the data (hypothesis generation) [
Comparison of two different cohorts within the organ map module. Different patterns of occurrence for female and male stomach cancer cases are evident. The color saturation is linear to the occurrence, with 100% saturation being the most affected segment.
The benefits of such data visualization and business intelligence have been previously discussed in multiple contexts [
In terms of measurable benefit, as the system was only recently released, the physicians have not yet utilized it for extensive research projects. Nevertheless, it serves as a quick help for everyday routine requests and is already an essential part of our annual audits. In general, we can summarize that the feedback of the prerollout was primarily positive, and it is safe to say that it generated interest in, and led to voluntary confrontation of doctors with the data, which in turn opened the door to translational interaction, improved data quality, and possibly research. An objective measurement about the benefit of the tool (eg, based on surveys, citations, or login frequencies) is planned for the future.
As for future perspectives, due to the success and mostly positive feedback of our testing rollout, we aim to launch MOCCA at our partnering site (CCCM-Technical University Munich). In terms of contents, we would like to add further standard statistics and integrate some additional modules such as a single patient view. Moreover, these innovative data management techniques, and the handling of permissions and data governance, which is known as a major hurdle for many health-related projects, should serve as an inspiration for similar projects at other sites both nationally and internationally [
Overview over all components (with translations) of the given MOCCA system.
Video summarizing the capabilities of the platform.
Comprehensive Cancer Center Munich
Cancer Retrieval Evaluation and Documentation System
International Conference on Informatics, Management and Technology in Healthcare
lightweight directory access protocol
Ludwig-Maximilians University, Munich
Lung Tumor Center
Munich Online Comprehensive Cancer Analytics
Pseudonymized
Radiation
scalable vector graphics
The project was supported by funding from Deutsche Krebshilfe.
The analysis layer was conceptualized, designed, and partly implemented by DN. The initial data model was created by ML. Implementation of the analysis layer as well as the data model was completed by SS. DS provided support with the survival module. The manuscript was majorly written by DN and SS, and NE refined the manuscript content both linguistically and from a larger perspective. Server access was supported by KK and MM. SM and MN contributed to the project as important beta testers (physicians). RC and LB (students) created vector graphics. With leading positions within the CCCM-LMU, VH, TF, and CB supported the idea by providing the main authors with the time to commit to this work. As the CCCM-LMU coordinator, TF supported DN in creating the governance system, which was implemented by SS.
None declared.