This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
The preferred evidence of a large randomized controlled trial is difficult to adopt in scenarios, such as rare conditions or clinical subgroups with high unmet needs, and evidence from external sources, including real-world data, is being increasingly considered by decision makers. Real-world data originate from many sources, and identifying suitable real-world data that can be used to contextualize a single-arm trial, as an external control arm, has several challenges. In this viewpoint article, we provide an overview of the technical challenges raised by regulatory and health reimbursement agencies when evaluating comparative efficacy, such as identification, outcome, and time selection challenges. By breaking down these challenges, we provide practical solutions for researchers to consider through the approaches of detailed planning, collection, and record linkage to analyze external data for comparative efficacy.
Historically, real-world evidence from observational studies has had limited use for demonstrating therapeutic effectiveness for regulatory and reimbursement purposes. However, several recent developments, including the 21st Century Cure Act, increased accessibility to large-scale routinely collected health data, improved standardization of collection, and increased high-profile regulatory applications, and have resulted in an increased demand for these data and increased availability of these data [
Real-world evidence is derived from rigorous analyses of real-world data. Real-world data are data related to patient health status or delivery of health care outside of RCTs where sources commonly originate from electronic health records (EHRs), medical claims data, and product and disease registries [
Given the adoption and use of EHRs and other data content sources in general practice, real-world data and real-world evidence are being increasingly reported in publications in recent years (
Number of real-world data and real-world evidence publications over time from 1965 to 2021. The following search terms have been considered in PubMed: “real-world” or “observational” or “nonrandomized” or “standard of care” or “external control” or “single-arm” or “historical-control” or “retrospective” or “noninterventional” or “case series” or “natural history” or “electronic health record” or “electronic medical record” or “claims”.
As the majority of criticism from regulatory and reimbursement agencies on applications using real-world data has so far focused predominantly on data features (type, quality, and frequency) and confounding and selection bias limitations, appropriate curation and evaluation of these sources are critical components of any exercise using external evidence [
RCTs have long been considered the gold standard approach to evaluate the comparative effectiveness of a drug or biological product due to the standardized methods to reduce bias, including randomization and blinding, as well as balancing known and unknown confounders. These gold standard trials, however, may not reflect the real-world setting where the patient is treated [
Computational models from machine learning and artificial intelligence have experienced success in early disease diagnosis and trial recruitment across several indications [
Outside of traditional rare diseases, adoption of real-world data for comparative efficacy remains underutilized in certain categories, such as the incorporation of medications or surgical procedures in clinical trial conduct [
In a traditional clinical trial, there are frameworks present for outcome ascertainment and frequency through defined protocols for the monitoring and timing of key observations to evaluate clinical effectiveness. Given the structured nature of data from RCTs, analyses using these are seldom hampered by high levels of missingness or stochastic outcome measurements. In the context of real-world data and external control arms, a common challenge encountered is addressing the mindset of researchers to be open to the value of real-world data and the ability to develop suitable outcome proxies due to insufficient data availability.
EHR systems were not designed for use in the same structured trial environment for clinical effectiveness studies; therefore, additional steps are required to refine data sets for use. EHR data include routinely collected measurements from general practice, which may not exactly match defined clinical endpoints of interest or disease definitions as encountered in clinical trials. Accordingly, where analyses attempt to align these data for comparative efficacy, issues may arise. In these cases, plausible proxies need to be made. For example, a trial of interest in a hepatology setting might include an outcome for “ascites requiring treatment;” however, in EHR data sets, “ascites” might be captured, but the condition with the additional criteria of requiring treatment would not be preassembled and indexed. Indeed, it would be possible to generate the treatments for ascites and tie these back to an individual patient, but this may be a manual process. As such, estimating the total number of eligible patients with the outcome of interest may be subject to substantial manual data curation. Specifically, in the context of using EHRs for comparisons to data obtained from a prospective clinical trial, there may be an absence of overlapping outcomes. Clinical trials (randomized or single-arm) have been found to often capture data on outcomes not recorded routinely in clinical practice [
Similarly, the translation of many gold standard trial endpoints to real-world data initiatives may not always be a straightforward endeavor. These efforts require a series of methodological considerations during the planning, abstraction, and analysis phases to ensure regulatory-grade fit-for-purpose data [
Other endpoints of interest, such as overall survival, may be clearly definable within a given data set, but are subject to limitations such as use restriction, delayed data availability, and missingness [
Real-world studies provide an opportunity to offer additional insights into the patient journey by providing a reflection of traditional care. In an RCT setting, long-term follow-up can be constrained by financial hurdles and trial logistics. To mitigate this, randomized trials have been extended by incorporating long-term follow-ups augmented using routinely collected health care data to investigate long-term outcomes [
Unlike in randomized controlled comparative efficacy studies, the definition of start date for measuring patient outcomes can be challenging in routinely collected health data. The concept of immortal time bias originated in the 1970s and is termed to account for the period of time within an observational or follow-up period of a cohort where the study outcome of interest cannot occur [
Related to these issues are those representing general temporal biases associated with discordant timeframes of interest. While statistical methodologies may be used to minimize observable between-group differences, other aspects of care may vary over time in ways that cannot be accounted for in such a direct manner. For example, even where diagnostic criteria for a condition do not vary over the time period of interest, access to the requisite diagnostic technology may increase over time, leading to increased disease incidence outside of any other changes to practice. These patients may differ from patients identified at different times with respect to both measurable and unmeasurable characteristics, which may contribute to temporal bias [
Indeed, concerns regarding temporal bias have formed the basis of negative reception in regulatory submissions using external data for the FDA, the EMA, and Health Canada [
Conditional on the circumstances associated with the application of an external control, geographic representation may be an important characteristic to consider. Regulatory and reimbursement bodies have often identified geographic discordance as part of the rationale behind negative endorsement of candidate products, particularly where these geographic differences are likely to translate to differences in prognostic factors, confounding, effect modifiers, and standards of care [
The abundance of real-world data certainly presents a vast set of challenges, including “data dredging” where the hoped-for result may stem from ad hoc data mining or from selection of one of the many data sources with limited characteristics for adjustment [
The intended steps of using real-world data to evaluate clinical efficacy should be discussed or described in a protocol and statistical analysis plan (SAP) with the associated regulatory or reimbursement body ahead of initiating the externally controlled trial, in an effort to not “cherry pick” the results. Here, sponsors should include a justification for selecting or excluding relevant data sources and should demonstrate that the choice of the final data source for the control arm best answers the research question of interest [
FDA guidance specifies that the protocol and SAP that will be submitted following initial discussions should describe the data provenance curation and transformation procedures in the final study-specific analytic data set and describe how processes could affect data integrity (consistency of data and completeness) and overall study validity [
One approach to doing this is to define the patient population for the external control arm, specify the outcome of interest, identify prognostic factors associated with the outcome of interest, and specify the control therapy. During the data source identification phase of the study, it may be common to undertake a scoping exercise to understand the breadth of data available and decipher top-level patient counts. From these selection criteria, data sets that do not have available data for elements of interest can be excluded. The derived top-level counts will reduce as further matching with trial selection criteria is applied. It is an iterative process to assess the best data source that will have sufficient patient counts and covariates to confirm a match, by assessing the strengths and limitations of each of these data sources simultaneously. Since the identification of a suitable real-world data database is an iterative process when proposing the external control arm to regulators at the time of filing for regulatory discussions, a prespecified description of the database is suggested to inform regulators, but a major limitation is that some data element criteria might be unknown until further exploration [
Situations where existing data are not available or suitable for an indication, a necessary exposure, an outcome, or a key covariate to measure confounding from sources, can pose a major limitation to conducting comparative efficacy studies to support regulatory decisions. As other researchers have stated, a common external control arm critique is about the mitigation of confounding [
A possible solution for these issues is to conduct de novo retrospective data collection where there would be manual review of patient charts, in conjunction with pre-existing data. Here, clinicians and key experts for the indication create a customized data collection form (also known as an electronic case record form) that can be standardized across multiple sites and gather useful details. In these scenarios, assessments can be obtained and linked directly to events of interest, such as a new medication or an annual physical examination. As EHR databases can provide access to longitudinal data, further evaluation of measures both before and after the event window can be performed. Drawing upon these aspects further emphasizes the value of real-world data to contribute to both predictive analytics and assessment of long-term outcomes in comparative effectiveness analyses.
De novo data capture will not fix challenges in relation to the frequency of visits or collection of variables known to exist exclusively within trial settings, which is a limitation of this solution. However, qualitative assessments and clinical narratives can contain valuable insights absent from the structured data fields of EHRs. Improvements in advanced analytics and natural language processing have provided increased automation to abstract valuable clinical information from unstructured fields, which is traditionally time consuming. A direct comparison between machine learning review and chart-based review in the diagnosis of rheumatoid arthritis demonstrated the ability of the algorithm to identify patients with a strong overlap in disease classification criteria and baseline disease characteristics [
The ability to characterize the entire patient journey is critical to perform clinical effectiveness analyses using real-world data, though a complete picture of a patient’s medical health is often not available within a single data source. Patients who move in and out of health care networks could have their data omitted from analyses of single data sets. If this movement is related to the patient’s standard of care or access to treatment, the process can result in unintended biases in an analysis that is unable to account for these patient movements and missing data. A solution to the fragmentation of the patient journey and perspective could be to engage with data providers who tokenize their data set. In tokenized data sets, patient “tokens” or unique identifiers are assigned to link a patient within a given data source to assist identification in other separate real-world data sources, avoiding duplication while protecting patient privacy at the same time [
Examples of the role of tokenization have been identified for broadening data sets to improve associations of testing with outcomes in diseases such as COVID-19 [
Every scenario in identifying a suitable real-world data source as an external control arm for comparative efficacy is unique. The solution of tokenization, for example, could work in some instances, but might not be appropriate in cases where complete coverage of data is available, though the definitions in the data set are poorly defined in the external data source.
Summary of real-world data identification for comparative efficacy using externally controlled trial challenges, examples, and application of solutions.
Challenges | Examples | Application of solutions | |||
|
|
||||
|
Indecision gaps due to abundance of real-world data | It is difficult to parse out important data sources, rare disease candidates, and data linkage options. | Machine learning applications can improve accuracy and quality (type and frequency) in data source selection and patient selection [ |
||
|
|
|
|||
|
Poorly defined variables or inconsistent definitions from clinical trial to real-world data for limited comparability of real-world data | The conceptual definition of a data element does not align with the operational definition. |
De novo data collection [ Automated electronic health record (EHR) abstraction [ Characterization of real-world variables or surrogate endpoints [ Prespecify sensitivity analyses, including quantitative bias analyses [ |
||
|
Medical claims data might have limited use to support regulatory-grade decision-making | Claims data have limited clinical outcome data. | Combine with EHRs to expand the applicability, coverage, and depth of data [ |
||
|
|
|
|||
|
Difficult to capture continuity of care in a single data source | Diagnosis is spread across multiple physicians; if the patient moves and seeks care outside of the care network, follow-up data will be lost. |
Tokenization/data linkage and advanced analytics with EHR data for capturing a more complete patient journey (particularly helpful for rare conditions where the sample size would be low) [ Analytical approaches (ie, imputation) for missing data [ |
||
|
|
|
|||
|
Timing of therapy | Patient has multiple lines of treatment; what should be considered the index date? | Define a proper index date or “time zero” following the target trial emulation framework [ |
||
|
Timing of data collection – inconsistent standard of care over time | Data may be present, but are not current enough to provide a reasonable comparison to the current standard of care. |
De novo data collection [ Tokenization/data linkage [ |
||
|
|
|
|||
|
External control arm nongeneralizable to clinical practice | Geographic representation where the main external control arm data source is from outside of the country of interest. Select two unlinked data sources with available data to obtain a sufficient sample size. However, it is unclear if patients overlap in care networks. |
Tokenization/data linkage, which improves patient counts with geographic representation while accounting for duplicates [ Transportability [ |
||
|
|
|
|||
|
Data loss or insufficient sample size to detect power | In the analysis phase, during matching, the power to detect an effect is reduced. |
De novo data collection [ Tokenization/data linkage [ Analytical approaches (ie, imputation) for missing data [ |
||
|
Avoid the appearance of the analysis as post-hoc or cherry picked | Data dredging/post-hoc analysis (eg, regulators can assume the most appealing analysis was conducted). | Transparent prespecified description of data selection, data provenance, and the statistical analysis plan [ |
Given the considerable costs of novel drug development pipelines and the increasing stratification of diseases and subtypes, real-world data will become increasingly relevant for regulatory and reimbursement discussions in the decades to come. Exclusive reliance on an RCT framework as the entire evidence generation plan does not adequately acknowledge the shortcomings of a singular research strategy, despite the advantages for comparative efficacy research. Simultaneously, the field of biostatistics has expanded to focus on data missingness to allow effectiveness analyses when a subject presents with incomplete data, while taking the necessary precautions and accounting for bias.
The versatility of the use of real-world data extends beyond its use in comparative efficacy analyses. There are additional concerns with safety, which have postmarket authorization requirement differences. Similarly, important topics outside the scope of this paper include the many challenges associated with patient privacy and reidentification risk, and the necessary consideration needed when performing these analyses. EHR-based real-world data were the focus of this paper, but additional sources have been successful in comparative efficacy analyses. Claims, like other sources of real-world data, have their own unique challenges for consideration [
electronic health record
European Medicines Agency
findable, accessible, interoperable, and reusable
Food and Drug Administration
International Classification of Diseases
International Society for Pharmacoeconomics and Outcomes Research
progression-free survival
randomized controlled trial
Response Evaluation Criteria in Solid Tumors
statistical analysis plan
Systematized Nomenclature of Medicine – Clinical Terms
whole genome sequencing
RRV, CM, and LD work for Cytel Inc (paid employment). RRV and LD have shares of Cytel Inc. EAS and LE work for Nashville Biosciences (paid employment). DRB has no conflicts to declare.