This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
The status of the data-driven management of cancer care as well as the challenges, opportunities, and recommendations aimed at accelerating the rate of progress in this field are topics of great interest. Two international workshops, one conducted in June 2019 in Cordoba, Spain, and one in October 2019 in Athens, Greece, were organized by four Horizon 2020 (H2020) European Union (EU)–funded projects: BOUNCE, CATCH ITN, DESIREE, and MyPal. The issues covered included patient engagement, knowledge and data-driven decision support systems, patient journey, rehabilitation, personalized diagnosis, trust, assessment of guidelines, and interoperability of information and communication technology (ICT) platforms. A series of recommendations was provided as the complex landscape of data-driven technical innovation in cancer care was portrayed.
This study aims to provide information on the current state of the art of technology and data-driven innovations for the management of cancer care through the work of four EU H2020–funded projects.
Two international workshops on ICT in the management of cancer care were held, and several topics were identified through discussion among the participants. A focus group was formulated after the second workshop, in which the status of technological and data-driven cancer management as well as the challenges, opportunities, and recommendations in this area were collected and analyzed.
Technical and data-driven innovations provide promising tools for the management of cancer care. However, several challenges must be successfully addressed, such as patient engagement, interoperability of ICT-based systems, knowledge management, and trust. This paper analyzes these challenges, which can be opportunities for further research and practical implementation and can provide practical recommendations for future work.
Technology and data-driven innovations are becoming an integral part of cancer care management. In this process, specific challenges need to be addressed, such as increasing trust and engaging the whole stakeholder ecosystem, to fully benefit from these innovations.
The morbidity and mortality associated with cancer are rapidly increasing globally because of population growth and aging, reflecting the changes in the prevalence and distribution of major risk factors of cancer [
Pioneering research is now being conducted in the field of cancer care technology, resulting in the development of novel solutions to a diverse spectrum of problems in this area. However, the process of evaluating these innovations and their operation within a
Evidently, as researchers involved in the creation of novel technologies for cancer care, we must consider and share both the innovative concepts being developed as well as how they are being assessed and accepted in real-world or clinical settings.
To this end, an international workshop was convened to consider the current status of technological and data-driven innovations in cancer care, to identify key challenges and opportunities, and to formulate recommendations aimed at accelerating the rate of progress in the data-driven management of cancer. The two workshop instances led to a series of publications. This paper discusses key topics arising from the workshops and subsequent discussions among their participants.
An international expert consensus-building workshop named Tech4Cancer [
The workshop was supported by four European Union (EU) projects, namely BOUNCE, CATCH ITN, DESIREE, and MyPal, funded by Horizon 2020 (H2020). The BOUNCE project [
MyPal [
The primary deliverable from this workshop was a set of articles that summarize key issues already published by IEEE. During the paper presentation, several topics were identified through discussion among the participants. After the second instance of the workshop, participants were invited to participate in a focus group discussing status, challenges, opportunities, and recommendations in the area of technological and data-driven cancer management; the outcome is reported in this study.
The writing group met by teleconference, and participants were asked to propose topics according to their interest and expertise and to select topics that they could actively contribute to. Several topics depicting challenges, opportunities, and recommendations were selected. Subsequently, leaders for each topic were identified and a structure was proposed. A formal consensus process was not used; however, the structured and open discussions did not reveal any fundamental disagreements about the nature of the topics, although the discussions supported the refinement and specificity of topics. Once all contributions were collected, a homogenization and integration process led to a final draft, where all participants commented and discussed, leading to the final submitted version.
A fundamental requirement for the effectiveness of any eHealth intervention, including interventions addressed to patients with cancer and survivors, is a certain level of
Formulating a concrete definition of patient engagement in the context of eHealth is a challenge by itself [
The situation is not significantly better when it comes to the assessment of patient engagement. The lack of consensus in the conceptualization of user engagement makes the design of appropriate
The scientific literature sheds some light on the techniques that have been employed to achieve, maintain, and improve patient engagement. According to a psychology study [
Behavioral techniques, such as motivational interviewing, goal setting, and planning, which are related to patient actions when managing their health condition.
Cognitive techniques, such as question-asking tasks and psychoeducation sessions, which are related to patient thoughts and received information concerning their health condition.
Emotional techniques, such as positive psychology exercises and expressive writing tasks, which are related to experienced patient feelings and emotions when adjusting to their new health condition.
Most interventions developed for older adults employ patient engagement techniques from behavioral and/or cognitive categories, but not all categories [
However, the main barrier in building a critical mass of literature on patient engagement with eHealth systems is the fact that very few studies address or report the topic. For instance, a systematic review on published trials discovered that only 23 of 2777 reviewed trials reported any patient engagement activities [
In contrast to the other challenges discussed in this work, the solution to patient engagement may not be rooted in technology itself but rather related to the way technology is designed. To this end, there is a growing body of research demonstrating the value of cocreation and participatory design in the development of novel digital health services, including services for cancer care. Participatory design is one of the pillars of the revolutionary predictive, personalized, preventive, and participatory (P4) cancer medicine [
The main barrier is a lack of research culture for considering the involvement of the end user (ie, in our case, the patient with cancer) in the design process. Despite this barrier, participatory design is expected to become the norm in eHealth technology development in the upcoming years.
The MyPal project (see the
The use of the participatory approach as early as possible in the design of innovative technological solutions for patients with cancer presents a good opportunity to improve patient engagement [
Participatory design should start as early as possible in the development lifecycle of an eHealth system or service, and it should rely on established methodological tools.
Representative samples of the intended patient populations need to be selected for participation in the co-design and cocreation activities. This is especially important for heterogeneous patient populations.
Participatory design findings should be fused with other sources of knowledge (eg, a screening of unmet patient needs from cancer care in the scientific literature).
These recommendations can complement pre-existing published efforts to deliver more generic guidelines for developing engaging eHealth technologies. For example, the work presented in Karekla et al [
Data are fuel for any machine learning (ML) project [
ML, especially deep learning, can effectively learn with big data. However, it cannot effectively learn with small data because of various issues, for example, overfitting, noise, outliers, and sampling bias, which can render the learned model effectively useless. Effective learning with small data is a challenge.
The annotation problem can be addressed via the use of annotation tools or services in many ways, such as (1) providing annotation tools so that annotation can be performed more effectively and easily, existing tools include Lionbridge artificial intelligence (AI) [
Another research question is regarding the procedure to add the value of ML results under the constraint of the available knowledge. Moreover, as knowledge and the latest clinical evidence, such as clinical practice guidelines (CPGs), are usually in paper-based formats and written in natural language, another research question arises regarding the procedure to automatically represent knowledge in a structured and computerized manner.
To solve the annotation problem scientifically, a desirable approach is to design a new ML algorithm that requires minimal feedback from human experts. This will only be possible if domain-specific constraints can be imposed on the learning process. This will reduce the model space as well as the variance in learning. Thus, one research question is on the procedure to reduce model space by domain-specific constraints.
An interesting approach to solve the
The multitude of successful instantiations of digital interventions has opened exciting new directions for acquiring, delivering, and sharing data, and has already proven the potential to leverage cost-effective, patient-centered cancer care applications [
Guided by the current state of the art, we identified a set of opportunities for digital interventions related to patient-centered cancer care. We have considered the most cited systematic reviews in the last 3 years (ie, to extract the innovation opportunities and technological limitations) and older systematic reviews (ie, studies from 2011 to tackle the initial adoption and strategies). These opportunities tackle the inherent challenges that we identified in practice. The first challenge is data collection and integration. Here, the core opportunity relates to exploring and exploiting the aggregation of heterogeneous and distributed data, including personal, professional, and health-related information [
When considering clinical data heterogeneity and the specificity of patient data, data projection is another challenge that is present in any technological intervention. We delineate a good opportunity to build individualized clinical recommendations based on the data interpretation projected in actions upon the patient. This initiative will be based on the patient’s risk profile and evidence-based guidelines (see the
Finally, with regard to data sparsity, such clinical setups face the challenge of data completeness and augmentation. In contrast to clinical trial experience, data completeness improves with longitudinal care. This approach may be a solution to minimizing missing data of PROMs in research or clinical care settings in support of learning health care systems capable of augmenting the data [
On the basis of the identified challenges and opportunities, we established a set of recommendations for the systematic development of appropriate patient-centered digital interventions that ensure usefulness, adoption, and sustainability in cancer care [
Next, we map the challenges that we identified before practical recommendations. These recommendations can serve as a reference design for technological interventions when looking at integration and data management.
When addressing data collection and integration, we recommend embedding continuous user feedback and iterative prototyping in the intervention. This can be achieved by exploiting the multimodal nature of patient data (ie, personal, behavioral, professional, and health-related). ML techniques (ie, deep learning and hybrid neural networks) can fuse heterogeneous data in a common representation (ie, efficiently using very large data sets containing health care use data, clinical data, and data from personal devices and many other sources), as demonstrated by the recent deep learning systems used on multi-omics data sets to drive precision oncology care [
In terms of data interpretation and translation, we recommend the use of tools to extract and represent the medical substrate by synthesizing only relevant aspects in a declarative way. ML techniques (ie, deep learning—recurrent networks with word embeddings and distributed representations) can handle very large and sparse data (eg, device data may only be available for a small subset of individuals) to capture the sequential character of the data and are suitable for modeling context dependencies in inputs [
To address the challenges in the area, we recommend the development of clinical projections (ie, mappings) from individualized patient recommendations to therapy plans that embed temporal, procedural, and reasoning processes. ML techniques, such as temporal hierarchical task networks, can dynamically generate personalized therapy plans for oncology patients [
Finally, tackling data augmentation, we recommend incorporating the lived experiences of the patients [
Individual biomedical and nonbiomedical patient characteristics should guide any provided chronic care—digital or not. These insights are used to develop and validate
The series of opportunities we identified target the selection of relevant heterogeneous and multimodal data correlated with the diagnosis. To extract the opportunities and frame our recommendations, we systematically analyzed a series of studies ranging from genomic data and phenotypes to demographics and psychological data. This methodology allowed us to capture the most relevant dimensions for extracting a patient profile. More precisely, the focus was on exploiting the correlations among multiple data sources to build a digital patient portrait consistent along all dimensions.
Such an initiative requires powerful data mining and ML algorithms, which can provide an efficient and compressed representation of a patient's digital profile, subsequently guiding therapeutic schemes.
One challenge identified relates to data relevance. Here, there is an opportunity to identify relevant genetic, phenotypical, physiological, lifestyle, and medical data correlated with the diagnosis. Exploiting such an opportunity can improve a patient's profile and the overall effect of the intervention, especially in progression-free survival [
Given the data deluge describing each patient, exploring and exploiting data correlations is another challenge we identified in the context of extracting a patient’s profile. This challenge provides a clear opportunity to exploit the correlations among the multiple identified modalities toward building an individual or personalized patient digital portrait that consistently captures all dimensions of the patient’s disease evolution.
Such an exploration unveils another challenge, namely multimodal data fusion [
The last challenge we identified as being crucial in any effort to extract a patient’s profile is the possibility of embedding individualized data (eg, age, gender, ethnicity, health conditions, and social position) in patient cohorts [
On the basis of the identified opportunities and related challenges, we established a set of recommendations supporting a digital intervention design that (1) exploits available data, (2) extracts underlying correlations, and (3) integrates the multitude of representations in a structured object guiding therapeutic interventions in cancer care.
When addressing the challenge of identifying data relevance, we recommend selecting the data or feature subset that best characterizes the statistical property of a certain variable (ie, a certain patient) subject to the constraint that these data or features are as mutually dissimilar to each other as possible but as marginally similar to a certain class of patients as possible. For this task, ML tools such as minimum redundancy feature selection (ie, Minimum Redundancy Maximum Relevance) [
To address data correlations, we recommend the use of ML models, such as long short-term memory (LSTM) [
The next challenge identified is the fusion of the available multimodal data. Our recommendation is to use a data-driven feature learning class of approaches. Typically, they are based on deep networks that can directly learn the hidden characteristics of the data from different sources. As such, we recommend, for instance, the use of deep neural networks to extract features from genomic and clinical data [
Finally, the last identified challenge is the possibility of embedding individualization data in the patient profile. We recommend performing individual cognitive interviews and focus groups with patients to learn about their relevant needs, experiences, fears, aspirations, and expectations.
From the ML point of view, a solution for developing personalized patient embeddings that is capable of processing such data is a combination of well-proven autoencoder methods with extensions to some of the metrics to account for data sparsity and multimodality [
A patient portrait that can capture complex relationships in physiological signals, nonbiomedical data, and personal data embedding is key to accurately predict the stages of interventions for different patients and is necessary for successful personalized therapy.
Cancer is remarkably heterogeneous across individuals. This heterogeneity makes treatment difficult for caregivers because they cannot accurately predict how the disease will progress to guide treatment decisions. Therefore, tools that help to predict the individualized trajectory of cancer can help improve the quality of health care [
A significant need relates to making disease predictions by leveraging baseline information and additional time-dependent clinical markers as they are collected. Such an approach is the focal challenge of personalized medicine: integrative analysis of heterogeneous data from an individual’s medical history to improve cancer care. We identified several key challenges and associated opportunities linked to this.
The first challenge relates to the fact that markers in clinical data are irregularly and sparsely sampled. Here, we identified a valuable opportunity for handling data and choosing specific latent variable models to summarize and extract information from the irregularly sampled and sparse data. This should simultaneously ensure sidestepping the issue of jointly modeling the data-generating processes [
Another challenge identified relates to the learning of a disease trajectory and is linked to the inherent computational complexity. Imposed by the clinical setup, we identified an opportunity to predict the entire disease progression trajectory from the observed patient records without many training labels on the ground-truth stages that a patient acquired, in mechanistic models of disease progression. A joint approach is prone to alleviate the inherent variability in prediction.
This challenge opens the stage for the next challenging point, namely continuous adaptation and updates in the face of disease progression heterogeneity. In handling such a challenge, there is a clear opportunity for continuous-time adaptation and updates to new observations and new data (markers). This provides novel computational methods for predicting, for instance, disease phenotype from molecular and genetic profiling [
Finally, another challenge we identified refers to the observed versus latent data artifacts. Addressing such a challenge demands tools for capturing latent factors in disease expression and not only observed features as a crucial aspect for patient-tailored cancer therapies. We further elaborate this in the following
In this section, we match the challenges to practical recommendations in designing digital interventions when predicting disease trajectories for patients with cancer.
To handle irregularly and sparsely sampled markers in clinical data, we recommend, from the ML perspective, the use of discriminative models that condition on marker histories instead of jointly modeling them. Such an approach will not be sensitive to miss-specified dependencies across marker types and inherent irregularities and sparsity. For example, functional data analysis [
However, the task of predicting the disease trajectory comes with its inherent computational complexity. To address this challenge, we believe that an ideal candidate would be a machine model that grows linearly in the number of marker types included in the model. This makes such a task applicable to cancer prognosis, where many different markers are recorded over time. Generative models can account for disease trajectory shapes using components at the population, subpopulation, and individual levels, which simultaneously allows for heterogeneity across and within individuals and enables statistical strength to be shared across observations at different
Independent of the prediction model, the challenge is to continuously adapt and update in the face of progression heterogeneity. From the ML point of view, we recommend the use of a model capable of being applied dynamically in continuous time and updated as soon as any new data are available (eg, hidden Markov models). Such approaches can model the transition of disease stages or states, which implies that the progression is continuous, and the transition probability to the future state relies only on the current state and the time span. Instantiation of such causality-based ML was used to infer the underlying somatic staging of tumors from next-generation sequencing data [
Moving away from the modeling decision, the last challenge lies in the observed versus latent data artifacts. We believe that a powerful tool is an ML model that accounts for latent factors and covariates influencing disease expression, as standard regression models rely on observed features alone to explain variability [
Over the past decades, early diagnosis, new drugs, and more personalized treatment have led to impressive increases in survival rates of patients with cancer. However, the most mitigating side effects of commonly used therapies are a severe problem in oncology, leading to dose reduction, treatment delay, or discontinuation [
With the increasing number of cancer survivors, more attention is being paid to persistent sequelae of tumor treatment and supportive measures [
The first challenge we identified is the identification of therapeutic sequelae. This challenge offers the opportunity to develop interventions capable of assessing what deficits (sensory, motor, and/or cognitive) a specific patient has as a consequence of cancer therapy.
The next challenge arises when the intervention needs to quantify the magnitude of therapy sequelae. Here, we identify a clear opportunity to measure the level of deficit or dysfunction induced by the therapy. This is crucial in (1) designing the follow-up therapy scheme, (2) choosing a rehabilitation strategy, and (3) determining the therapy trajectory and dosage.
The last challenge we identified is the adaptive parametrization of rehabilitation. This challenge brings a valuable opportunity to take steps toward personalized treatment, namely, to parameterize the rehabilitation scheme according to the specific deficit type and level to drive rehabilitation.
To cope with patient sensory, motor, and cognitive deficit variability, it is necessary to perform a precise assessment of the 3 different dimensions. We believe that the 3 main challenges we identified as high-potential opportunities for digital interventions are also good candidates for ML algorithms. This technology can learn underlying correlations in patient data and generalize for robust prediction [
When tackling the identification of therapy sequelae, we recommend exploiting and mining large sets of structured and unstructured data describing a patient to identify correlations among various data types and how they map to a certain type of dysfunction. We propose the use of semisupervised techniques (eg, transductive support-vector machines [
Moreover, to address the magnitude of the therapy sequelae, we recommend the use of deep learning models, especially convolutional neural networks. This is because such networks are capable of learning relevant feature representation from unstructured data, such as pathological images or medical records. This, subsequently, allows deep learning methods to achieve good results in tasks such as regression, detection, and segmentation, which underlie the magnitude estimation.
Finally, addressing the challenge of adaptive parametrization of rehabilitation, we recommend using methods that combine learning capabilities for regressing arbitrary nonlinear functions (ie, deep learning—encoding the type of deficit covariance with the magnitude) and adaptation through guided searches in parameter spaces (ie, reinforcement learning—finding the best parameters of the rehabilitation scheme—dosage, type, and length that best fit the regressed function).
Patient data needed for the provision of the best treatment are often scattered across different systems rather than stored in a single location. Technologies that facilitate care coordination through interoperability are improving, but a seamless flow of information from one care setting to another still requires more progress. Without having a full picture, it is difficult to provide the best care in an era where cancer is considered to be a chronic disease, and there is an increased demand for consistent follow-up in terms of monitoring and early management of symptoms that indicate that cancer might have returned.
Interoperability is a primary consideration to achieve communication among applications, medical devices, and health care providers [
The currently popular use of EHRs has alleviated some of the barriers in using data from medical records for research, although fully interoperable electronic medical record systems are not yet a reality. Several efforts to develop and apply standards in the collection, extraction, and integration of data by standardization bodies, governments, the research community, and industry are in progress [
Standards, such as those developed in the United States and Canada, to guide EHR vendors and public health central cancer registries in the implementation of standardized electronic reporting [
It is a fact that some data resolution may be lost during the process of mapping EHR fields to a formally described abstraction layer, which may be alleviated by the use of knowledge models such as ontologies as a knowledge background mapper; however, these common interfaces support queries across EHRs or the extraction of patient data in the same format to allow the merging of patient sets between numerous institutions.
Health information technology (IT) brings clinical data and patient information together and guides oncologists in making evidence-based care decisions that lead to improved outcomes. The potential benefits of interoperable interconnected tools and health systems are particularly important for oncology [
Data must be able to provide a complete look at the patient’s medical history so that physicians can see what cancer medicines and treatments did or did not work. Clinicians also need to be able to avoid recommending the same procedure twice, prescribing a medicine the patient already tried, or missing results from a diagnostic test.
The need for consolidation and standardization efforts to create interoperable solutions [
As already proposed by [
As mentioned in [
The open use and sharing of big data, without compromising patients’ rights to privacy and confidentiality, should be promoted.
Current clinical care is gradually moving toward patient-clinician shared decision making, as patient involvement can provide insights into best health states or outcomes in each case, apart from establishing a partnership that will help clinicians understand their patients’ preferences [
It is necessary to overcome many barriers to ensure success in the inclusion of patient preferences during the decision-making process and to track their impact in the care services provided [
As a complex disease, cancer involves many clinical specialties, with the need of creating a clear taxonomy (ie, systematic categorization) for patients’ preferences that will serve as a standardization over all involved disciplines (eg, analysts, clinical psychologists). This kind of approach would help in the harmonization of the different points of view on the measurement of patients’ preferences by labeling and extracting this information in a processable and understandable way [
As presented in the previous sections, the evidence used in medical practice is based on clinical guidelines, which are often used as the evidence basis for the clinical DSS (CDSS) [
CDSSs provide the opportunity not only to quickly access the latest available evidence but also to incorporate new sources of information that can support the decision-making process. At the same time, the new IT era enables the acquisition of PROMs using questionnaires or even more sophisticated wearables that can measure activity, sleep, or other vital signs that can be translated to patient outcomes. These automatic ubiquitous technologies increase the knowledge required by patients. Processing these data and deducing the desired results will be very helpful. For example, tracking the daily activities of a user provides estimates of how active a patient is, which can be correlated with their depression and/or fatigue levels (variables that are gathered using questionnaires such as the European Platform for Cancer Research—Quality of Life Questionnaire with 30 items). Other techniques could also aid the acquisition of patient outcome information, such as the Ecological Momentary Assessment tests (short questionnaires that are sent to the patient frequently to obtain updates on their status). Thus, the collected data could be used to assess how good the treatment given was for each patient, not in the scope of a randomized control trial but in the real-world environment, without having a specific patient population, but the whole population.
In this context, in the DESIREE project focusing on primary breast cancer, new ways of including PROMs to assess guideline recommendations were explored. The aim of including PROMs does not conflict with the quality of evidence and strength of recommendation measurements but provides other quality attributes that contribute to the decision-making process, considering the patient status reported by the patients themselves.
There are some specifications that should be considered for this quality assessment, which could be named questionnaire based–patient-reported outcome measures, and they include (1) consideration of standard questionnaires for each medical case, such as the International Consortium for Health Outcomes Measurement questionnaires [
When implementing CPGs, several characteristics must be considered to ensure both good health care quality levels and clinicians’ satisfaction. They must assure the validity and reliability of their clinical content, along with their clinical applicability in real clinical settings, and must be clear when defining the procedures to be followed in the current clinical performance procedures within a health care system [
Actual trends move toward highly interactive computerized systems focusing on intuitively presenting complex clinical cases, where clinicians may access and check computerized clinical data and take away insights from all of this information in a more natural and intuitive way, alleviating the ambiguities of the guidelines through a data-driven approach guided by previous practice [
The proposed directions for realizing these needs include the promotion of standardized clinical terminology that facilitates the understanding and univocal interpretation of the clinical data to be analyzed and the clinical knowledge formalized in the CIGs (eg, the Breast Imaging Reporting and Database System standard for breast anomalies [
Clinical guidelines are tedious to develop, and it is even more difficult to interpret and put them into a computer-interpretable way. This usually requires the close collaboration of knowledge engineers and medical experts. A closely related and important issue is the guideline development process or how CPG development working groups are composed. Usually, these teams comprise quality auditors or managers who are guided by their opinions, interests, and experience and intend to formalize evidence, seeking appropriateness of the provided recommendations but ignore the iterative and causal reasoning of clinicians [
In this context, it is crucial to have tools supporting the easy updating of CIGs used for CDSS by clinicians themselves or knowledge engineers who do not necessarily have to understand the technology and/or the programming language used. Thus, interfaces that are easy to use and easy to understand are required for this purpose. This limitation was highlighted in the DESIREE project; however, this is an issue not only for cancer care but also for other clinical specialties.
Therefore, we propose an authoring tool for CPG formalization [
The change in both hardware and software over the past decade has been remarkable. Equally noteworthy has been the ever-increasing internet speed and the accompanying growth in the demand for connectivity. This growth and development have increased our ability to take on challenging projects to improve early diagnosis and improve the quality of cancer care. However, most discussions relating to health data and associated analytical tools often emphasize data privacy and security at the expense of other topics. These talks often overlook the dynamic nature of both health data and the software used in the analysis of those data. In addition, the popularity of internet-based apps and the use of such tools by patients for self-diagnosis necessitate a call to action. There is a need to examine the reliability and trustworthiness of health-related computational tools used in diagnosis and DSS by health care providers. Here, we will focus on the reliance and reliability of computational tools with an emphasis on cancer care and diagnosis.
Reliance and reliability of information in health care are incredibly important. These attributes are particular when it comes to genetic information associated with cancer, which is highly critical in the development of optimal clinical intervention strategies. With an increasing number of people falling victims to medical misinformation and propaganda on the internet [
There is no turning back from this path of dependence on the tools and information available via the internet. People have a reasonable expectation of establishing trust and validation of these tools. Initiatives such as the Secure Socket Layer certification system, implemented for encrypting sensitive information sent across the internet or the use of digital badge to indicate or attest to adhere to an acceptable standard and/or individual skill competency (
Having reliable information alone is not sufficient for people to construct their foundation of trust. People may have a distrust on the reliability of official information on certain topics because of previous unpleasant experiences. One conclusion that can be drawn from our experience in developing relevant computational tools over the past decade is the questionable credibility of the results obtained from such tools. A review of the literature shows a great deal of volatility in the availability of health IT resources and shows that providing explanations for software errors is an acceptable approach. Building on this notion, a certificate that discloses who is responsible and what tests are done or can be done to validate or test the trustworthiness of the output, something along the line of the MD5 checksum, could be envisioned as an acceptable solution to address this issue. Such a certificate could include the versions of the data and the software in a report to help explain the deviation from the previous version. This could be seen as a reasonable step in the right direction before the availability of peer-approved permanent solutions.
An important issue emerging for decision support in medical diagnosis is the trust that clinicians might have in the outputs generated by a computer-aided diagnostic system. This is a highly relevant issue for tissue characterization in general and image-based tumor classification in particular. Hence, it should be considered whether the metrics such as accuracy (ACC) and area under the curve (AUC) correlate well with confidence in the algorithms used in computer-aided design (CAD) systems. If the CAD system provides a recommendation with moderate or low confidence, then a radiologist may deem the recommendation to be useless, even if the classifier being used has a high overall value for ACC and/or AUC. However, if the confidence is much higher, the clinician may deem the recommendation more useful in supporting their decision making. Therefore, building CAD systems in which clinicians have confidence is essential if those CAD systems are to be adopted and are to play a role in fully exploiting the range of digital information available for assisting diagnosis.
Recent studies discuss the failure of CAD systems in terms of a lack of
Exploratory studies for breast mass classification [
To fully exploit the range of digital information available for assisting diagnosis, it is important to identify and implement specific actions to increase the trust of physicians in cancer CAD systems and overcome the barriers to adoption of such systems. Most cancer CAD systems have used ACC and AUC as the main performance evaluation metrics; however, confidence measures must also be considered, as the traditionally used metrics are inadequate for informing clinicians in terms of the confidence that they might place on the recommendations provided. Besides, research in the design of classifiers that are incorporated into CAD systems is essential if future CAD systems are to be trusted by clinicians and adopted as a valued and reliable, and indeed routine, element of the cancer diagnosis process.
Data and image analysis algorithms usually require regulatory approval by the Food and Drug Administration (FDA) in the United States and conformity assessment leading to a Conformité Européenne mark in the EU [
Regulations that aim to validate AI in a safe and transparent way must consider doing so without compromising the potential of AI. Moreover, characteristics of AI that may be seen as risks such as biases in data, its
In the EU, there are no regulatory guidelines that specifically cover AI in health care. Nevertheless, draft ethics guidelines have been published by the High-Level Expert Group on AI in April 2019, proposing 7 key requirements that AI systems should meet to be realized as
The frameworks enumerated above are, however, still being piloted or discussed. To properly define a regulatory framework for users, stakeholders and use cases (data flows) should be identified and defined. New regulatory frameworks should be built to provide guidance for the validation or qualification of AI tools within different scenarios and pathways (for nonclinical or preclinical use or clinical use), taking into consideration the adaptive nature of AI-based tools. The framework should consolidate input from scientific experts and health authorities and should take into consideration published relevant guidelines, for example, from the High-Level Expert Group on AI, the Medical Device Coordination Group and relevant implementing EU legislation, and international regulations (eg, from the FDA).
Topics, opportunities, and recommendation in cancer care.
Topic or section (references) | Opportunities | Recommendations |
Patient engagement and participatory design [ |
Involvement of real users Identification of user needs Unique perspective on user acceptability, usability, and feasibility |
Participatory design approach early and throughout the design process Focus groups with stakeholder representatives Fuse findings with those from other sources |
Small data analytics [ |
Address the annotation problem via appropriate tools Enable experts to teach MLa models that automatically build and annotate their data sets Automatically represent knowledge in a structured and computerized way |
Design new machine learning algorithms that needs minimal feedback from human experts Use knowledge-based learning that can be extended by data-driven findings easily and that uses standardized terminologies to provide interoperability and ease the updating and maintenance of the latest evidence |
Integration and data management [ |
Exploiting aggregated, heterogeneous, and distributed data Translating the clinical findings into an intuitive representation for the patient Building individualized clinical recommendations based on the data interpretation projected in actions upon the patient Data completeness and augmentation |
Embed continuous user feedback and iterative prototyping in the intervention Usage of tools to extract and represent the medical substrate by synthesizing only relevant aspects in a declarative way Development of clinical projections from individualized patient recommendations to therapy plans that embed temporal, procedural, and reasoning processes Incorporation of lived experiences of the patients |
Extracting patient’s portrait [ |
Exploit correlations among multiple data sources to extract patient profile Use data mining and machine learning to guide therapeutic schemes Identify relevant genetic, phenotypical, physiological, lifestyle, and medical data correlations with diagnosis Provide an integrative approach to patient-centered data and demonstrate the potential of feature selection in data analysis and predictive patient-specific outcomes |
Exploit available data Extract underlying correlations Integrate the multitude of representations in a structured object guiding therapeutic interventions in cancer care |
Learning patient disease trajectory for personalized diagnosis [ |
Handling data and choosing specific latent variable models to summarize and extract information from the irregularly sampled and sparse data Learning of a disease trajectory is linked to the inherent computational complexity Continuous adaptation and update in face of disease progression heterogeneity Observed versus latent data artifacts |
Use of discriminative models that exploit conditions on marker histories instead of jointly modeling them Focus on machine models which grow linearly in the number of marker types included in those models Use of a model capable of being applied dynamically in continuous time and updated Exploit models that account for latent factors and covariates influencing disease expression |
Technological interventions in cancer rehabilitation [ |
Cope with patient sensory, motor, and cognitive deficit variability Identify therapy sequelae |
Perform a precise assessment of patient’s sensory or motor or cognitive deficit variability Use machine learning algorithms to identify underlying correlations in patient data and generalize for robust prediction Exploit and mine large sets of structured and unstructured data to identify correlations and map to a certain type of dysfunction |
Addressing current interoperability challenges [ |
Provide cancer-wide care Support diagnosis assistance for complex patients Provide a complete look at the patient’s medical history so physicians can see ineffective treatments Improve surveillance and research |
Develop, test, disseminate, and adopt technical standards for information related to cancer care across the continuum Optimize the flow of information to serve the needs of caregivers, patients, and providers Develop and use standard, open application programming interfaces Promote incentives for the pooling of data and comparison of system-level research Support open use and sharing of big data, without compromising patients’ rights to privacy and confidentiality |
Patient-clinician shared decision-making processes [ |
Inclusion of patient preferences during the decision making and tracking of their impact in the provided care services Identification of |
Create a clear taxonomy (ie, systematic categorization) for patients’ preferences to serve as a standardization Harmonize different points of view to facilitate labeling and extraction of information in a processable and understandable way Build a methodology to synthesize knowledge |
Assessment of clinical evidence-based recommendations, including PROMsb [ |
Quick access to latest available evidence Incorporate new sources of information that can support the decision-making process Increase the knowledge required by patients Effectively assess how good the treatment given was for each patient, not in the scope of a randomized control trial, but in the real-world environment |
Explore new ways of including PROMs to assess guideline recommendations Exploit PROMs in the decision-making process, considering patient status reported by the patients themselves Consider and use existing quality assessment specifications for PROMs |
Ambiguity on clinical guidelines used for clinical decision support [ |
Insight from complex clinical cases in a natural and intuitive way Patient-specific advice when and where needed |
Promotion of standardized clinical terminology Integration of clinical guidelines with care flow |
Up-to-date clinical evidence guidelines for CDSSc [ |
Create tools that support the easy updating of CIGsd for clinicians Interfaces that are easy to use and understand are required for this purpose for CIGs |
Generate tools that enable the input of guideline information in an easy and visual manner and enable the modification of CIGs previously formalized in the system Provide a tool for detecting modifications on guidelines Semiautomate the formalization of guidelines using natural language processing |
Trust and reliance on cancer care [ |
Assurance of the credibility of results generated by various computational tools available on the web |
Provision of a certificate that discloses who is responsible and what tests are done or can be done to validate or test the trustworthiness of the output Include the versions of the data and the software in a report to help explain the deviation from the previous version |
Trust in computer-aided diagnosis systems [ |
Increase confidence in the support provided by a CADe system |
CAD support systems must embody reliable confidence measures as one of their key elements Incorporate trust into the initial classifier design when such algorithms are to be embedded into a cancer CAD system |
Regulatory roadmap for validating the effectiveness of AIf-based models for clinical decision making [ |
Validation of AI in a safe and transparent way without compromising the potential of AI |
Identify and define users, stakeholders, and use cases (data flows) Build regulatory frameworks aiming to provide guidance toward the validation or qualification of AI tools within different scenarios and pathways Consolidate input from scientific experts, health authorities, and published guidelines |
aML: machine learning.
bPROM: patient-reported outcome.
cCDSS: clinical decision support system.
dCIG: computer-interpretable guideline.
eCAD: computer-aided design.
fAI: artificial intelligence.
This paper presents the key topics that were discussed as part of two international workshops on the current status of technological and data-driven innovations in cancer care. Key challenges and opportunities have been identified, and several recommendations have been made to facilitate the acceleration of progress in the data-driven management of cancer. The workshops presented the work that was being conducted in four Horizon 2020 EU–funded projects, namely BOUNCE, CATCH ITN, DESIREE, and MyPal. These projects provide a rich landscape of the challenges and opportunities of the current state of the art of new technologies in cancer care. The authors have presented these issues and discussed recommendations that can be used for further research as well as practical implementation of such tools in cancer.
accuracy
artificial intelligence
area under the curve
clinical decision support system
computer-interpretable guideline
clinical practice guideline
decision support system
electronic health record
electronic patient-reported outcome
European Union
Food and Drug Administration
Horizon 2020
information and communication technology
Institute of Electrical and Electronics Engineers
information technology
long short-term memory
message-digest algorithm 5
mobile health
machine learning
National Comprehensive Cancer Network
patient-reported outcome
software as a medical device
The work presented in this paper is part of the BOUNCE project, which has received funding from the EU’s H2020 Research and Innovation Program under grant agreement number 777167.
None declared.