This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
An increasing number of studies within the field of telemedicine and e-health are designed as noninferiority studies, aiming to show that the telemedicine/e-health solution is not inferior to the traditional way of treating patients.
The objective is to review and sum up the status of noninferiority studies within this field, describing advantages and pitfalls of this approach.
PubMed was searched according to defined criteria, and 16 relevant articles were identified from the period 2008-June 2011.
Most of the studies were related to the fields of psychiatry and emergency medicine, and most were published in journals relating to these fields or in general scientific or general medicine journals. All the studies claimed to be noninferiority studies, but 7 out of 16 tested for statistical differences as a proxy of noninferiority.
The methodological quality of the studies varied. We discuss optimal procedures for future noninferiority studies within the field of telemedicine and e-health and situations in which this approach is most appropriate.
In the field of telemedicine and e-health, there is often a need to demonstrate that a new solution/application is equal in quality or efficacy of treatment to the traditional or established way of treating patients. Demonstrating superiority of the new solution in terms of quality or efficacy of treatment is not always necessary, as the telemedicine/e-health solution/application may have other types of advantages, including saved travel time or saved costs. Testing that the new solution is not inferior to a traditional counterpart may therefore seem to be sufficient in many cases. As would be expected from this line of reasoning, there has been an increase in published studies within the field of telemedicine and e-health, using a noninferiority design, ie, studies that aim to show that the new telemedical solution is not of a lower quality than the established way of treating patients.
In the present study, we performed a systematic review of the published literature and found 16 studies [
The review aims to follow the criteria outlined in the PRISMA statement [
A good starting point for understanding what an insignificant result really means is by considering the famous quote by astronomer Carl Sagan: “Absence of evidence is not evidence of absence” [
Consider an experiment where we evaluate a video-based telemedicine service called T. We have decided to test whether this service is superior to a traditional clinical treatment called C. For simplicity we are looking at one single aspect, the patient’s blood sugar levels.
We do a single sided
The easiest way to understand this is that by reducing the number of participants, we are much more likely to get an insignificant result. It should be fairly obvious that a reduction in the number of participants is not making the groups more equal. It will result only in a study of lower quality and that is less able to detect if the new service is superior.
Including more persons in the trial will increase the chance of detecting superiority (if it exists). However, whenever we end up with an insignificant result, we are still facing Sagan’s observation that the absence of evidence is not evidence of absence.
If the ultimate goal is to prove that service T is not inferior to service C, the only way of approaching this is to first define what we mean by “inferior”. Note that “inferiority” is an empirical definition. When comparing two groups in medical trials, we never end up with exactly the same results, and what margins we define should be based on clinical considerations of what are meaningful margins, not upon our ability to measure them.
In noninferiority trials, we therefore first define that a margin (M) below C is to be considered as noninferior. How to set this margin is discussed in “Methods”. We then go on to test if T really is superior to this margin.
Testing for equivalence has become an essential statistical tool in the process of securing approval for new generic drugs [
As described in the Introduction, there are multiple reasons a failed test of superiority is insufficient for concluding noninferiority, among them is sample that’s too small
In order to demonstrate
Setting the margin (M) must be done at the start of the trial, and in a clinical trial it should be related to what experts find clinically relevant. Wellek [
However, not only the difference between C and T is relevant for setting M. The margin must also be set in a way that a certain amount of the real effect of the active control over nontreatment/placebo (C-P) is conserved. Within biomedicine, it is discussed how small M could be in relation to C-P, and values ranging from 50-80% have been mentioned [
In an ordinary trial, a significant result does automatically prove the ability to detect a difference—typically called the trial’s
Summing up, the following factors are essential in noninferiority trials:
1. Finding a clinical relevant definition of M. M should be independent of factors like variance and sample size. While some have suggested that M could be in the range of 10-20% of C, this needs to be set individually for each project and must be done before the trial. It is not an error to clinically decide that M should be lower.
2. Making sure that M conserves the main effect between the active control and nontreatment. Values of M should be at least 50% of C-P.
3. Assuring assay sensitivity, either by including a placebo or by drawing on historical data.
Whether it is possible to find a formal determination of M and whether it is possible to prove assay sensitivity using historical data are both questions that are still discussed vigorously among statisticians [
The inclusion criteria are English-language articles that apply accepted definitions of telemedicine or e-health [
Specific technological channels were included (eg, videoconference, Internet) in order to include articles within an intersection of fields that is not clearly defined as telemedicine or e-health in the article’s title or abstract. After the search, articles were manually scanned to exclude articles not fullfilling the inclusion criterias. Eighteen articles were excluded because they were clearly unrelated to telemedicine or e-health (in most cases this was caused by abstracts with the words “video” or “Internet”). One additional article was excluded because the main article was available only in Japanese, and another article was excluded since it referred to other noninferiority trials only in the abstract. This left 16 articles for further analysis (
Of the included articles, three were from 2008, three from 2009, five from 2010, and five from 2011 (until June 2011). No articles meeting the inclusion criteria were published prior to 2008.
Strategy Flowchart.
Description of search criteria.
|
(noninferior OR noninferiority OR non-inferiority OR ("non inferior") OR ("not inferior")) AND (telemedicine[Title/Abstract] OR videoconference[Title/Abstract] OR video[Title/Abstract] OR videoconferencing[Title/Abstract] OR online[Title/Abstract] OR Internet[Title/Abstract] OR ehealth[Title/Abstract] OR e-health[Title/Abstract]) |
In the review of the articles, two reviewers (Authors 1 and 2) identified how the noninferiority margin was set and the reason that was provided for setting it. They also noted whether an actual noninferiority test was performed or if it was a test for difference. Finally, they registered how assay sensitivity was assured.
Six of the included articles dealt with matters related to psychiatric treatment (post-traumatic stress disorder, generalized anxiety disorder, depression), four of the articles dealt with medical procedures particularly relevant to emergency medicine (vascular access, defibrillation, advanced life support), one was within the field of urology, one within rehabilitation after surgery, one within endocrinology, one within hematology, and two within medical communication studies. With regard to where the papers were published, only one was published in a telemedicine journal, five were published in emergency medicine journals, two in a psychiatric journal, one in an orthopedic surgery journal, one in an endocrinology journal, and six in general scientific or general medical journals.
Various ways of defining the inferiority margin were used in the 16 articles reviewed (
In one article [
Two studies referred to the lower bound of the confidence interval for the scores of the reference group. One of them [
In the four remaining studies [
Five articles [
In one article [
One article [
In the remaining articles [
Another question is whether a noninferiority test was actually performed, ie, that it was tested that the target effect was larger than the noninferiority margin. This could be accomplished either by checking whether the entire confidence interval for the means difference was above the noninferiority margin or by calculating a
Nine of the articles [
Seven of the articles [
Four of the studies [
In [
Articles included in review.
|
|
|
|
Agha et al, 2009 [ |
0.15 SD | Guided by Cohen | Noninferiority |
Chenkin et al, 2008 [ |
10% from mean | Clinically + prior studies | Difference |
de Vries et al, 2010 [ |
Cohen’s |
Defined | Noninferiority |
Harper & Pollock, 2011 [ |
Unclear: a) Within 5%, b) Lower bound 95% CI | No reason given | Difference |
Hedman et al, 2011 [ |
Absolute value + Cohen’s |
Clinically + prior studies | Noninferiority |
Merchant et al, 2009 [ |
10% from mean | Typical in medical trials | Noninferiority |
Morland et al, 2011 [ |
Not set | Not relevant | Difference |
Morland et al, 2010 [ |
Absolute value | Clinically | Noninferiority |
Morland et al, 2009 [ |
Absolute value | Clinically + prior studies | Noninferiority |
Mpotos et al, 2011 [ |
10 percentage points difference in proportions | No reason given | Noninferiority |
Munger et al, 2008 [ |
RR=0.95 | No reason given | Noninferiority |
Péres-Ferre et al, 2010 [ |
Not set | No reason given | Difference |
Robinson et al, 2010 [ |
Not set | Not relevant | Difference |
Russell et al, 2011 [ |
Absolute value | Clinically | Noninferiority |
Titov et al, 2010 [ |
Not set | Not relevant | Difference |
Weeks & Molsberry, 2008 [ |
Lower bound 90% CI | No reason given | Difference |
As the results show, there are considerable variations in the way the noninferiority trials are performed. The 16 included articles should encompass the majority of the studies that claim to be noninferiority trials within the field of telemedicine and e-health, but a few that have not been indexed in PubMed might have been missed. While the study method seems to be growing in popularity, it is still in its infancy. Most current use of noninferiority trials is within biomedicine, and there are, as we have shown, only a few examples of use within telemedicine and e-health. While noninferiority trials within biomedicine can serve as an inspiration, differences between the fields make it difficult to copy the approaches used in biomedical trials. Below, we discuss some of the central elements of noninferiority trials and how they can be applied to studies within telemedicine and e-health.
To prove that something is equal, or not inferior, we need to define what we mean by equality or noninferiority. This is mainly a clinical issue that primarily should be assessed by experts within the field. Some very rough guidelines have been referred to, and values within 10-20% appear to be considered fairly equal in the literature. What is clinically relevant cannot be decided by this value alone. In some cases, a 10% difference can have enormous impact, while in other cases this value is clinically irrelevant. Only five of the articles included in our review referred to the concept of clinical relevance.
There are other guidelines stating that the margin should be set so that a majority of the effect between the control (C) and the nontreatment (P) should be preserved. In trials where the nontreatment group is not included, the researcher will have to estimate the effect of C-P based on previous trials. This is not a luxury that many telemedicine/e-health trials have.
When performing a traditional hypothesis test, a
One of the main driving forces in the popularity of noninferiority and equality testing within biomedicine is that it enables doing evidence-based medicine without including a nontreatment group. In some cases it might be ethically unacceptable to introduce a placebo. In other cases, this is primarily a question of cost saving. It might be fair to say that the increasing use of noninferiority and equality testing is related to the growth of so-called explanatory or pragmatic trials, where the main question is not whether a treatment is effective but whether the treatment is worthwhile using in a clinical setting [
Ideally, assay sensitivity should be proven by a previous trial or a meta-analysis of multiple previous trials. It is difficult to replicate studies in this fashion within the field of telemedicine/e-health, and none of the studies examined in our review did this. However, 7 of the 16 studies did use previously validated questionnaires, an alternative that in many cases actually may be sufficient.
There were also four studies that included a placebo/no treatment. For simply proving assay sensitivity, this is definitely sufficient. It is, however, a bit contrary to the original purpose of equality and noninferiority tests, which is to be able to do without a placebo/no treatment group.
The review did also identify three studies where there were no explicit indications in the articles that assay sensitivity had been established. The authors might, however, have carried out such procedures without reporting it.
As the analysis shows, the fundamentals of noninferiority testing can be daunting to use in practice, especially for authors that are new to this type of analysis. We recommend that authors pay close attention to the extended CONSORT guidelines for noninferiority testing to the extent that they are applicable for the study in question [
Performing a noninferiority study requires, as with any choice of statistical analysis, strict adherence to protocol to avoid fishing for positive results, which will dramatically affect the probability of type II errors. In particular, the noninferiority margin must be set before the study starts. Setting the margin after investigating the data means the investigator essentially can obtain any result wanted. Similarly, if an investigator performs a standard superiority trial and finds a nonsignificant result, the study should never be transformed into a noninferiority study. The intent of determining noninferiority must be clear from the outset.
When there is sparse evidence for assay sensitivity, such as if there are few studies to base the analysis on, noninferiority testing may not be the best option. Assay sensitivity is essential for doing a proper noninferiority study since without it, the study could end up proving that the intervention is no worse than doing nothing (ie, does no harm). In such settings, it should be considered if another type of design is more appropriate, eg, an economic evaluation.
Noninferiority testing clearly has a place within telemedicine and e-health. It is, however, always a much more daunting task to prove that something (like a difference) does not exist than to prove that it does exist. As we have discussed in our review, noninferiority trials are not a magic shortcut to solving this fundamental challenge.
While several of the trials included in this review are of a high quality, the review also brings to light an apparent lack of awareness of the pitfalls of performing noninferiority trials. We recommend more stringent adherence to the basic principles of noninferiority testing. We have discussed some points that should be given specific attention, including the importance of not mistaking a failed difference test for proof of noninferiority and the importance of setting a clinically relevant noninferiority margin.
The study was conceived and designed by PEK. All authors analyzed the data and wrote the paper.
None declared.
An ethics statement was not required for this work. No direct funding was received for this study. The authors were personally salaried by their institutions during the period of writing (though no specific salary was set aside or given for the writing of this paper).