This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

Reducing research waste and protecting research participants from unnecessary harm should be top priorities for researchers studying interventions. However, the traditional use of fixed sample sizes exposes trials to risks of under- and overrecruitment by requiring that effect sizes be determined a priori. One mitigating approach is to adopt a Bayesian sequential design, which enables evaluation of the available evidence continuously over the trial period to decide when to stop recruitment. Target criteria are defined, which encode researchers’ intentions for what is considered findings of interest, and the trial is stopped once the scientific question is sufficiently addressed. In this tutorial, we revisit a trial of a digital alcohol intervention that used a fixed sample size of 2129 participants. We show that had a Bayesian sequential design been used, the trial could have ended after collecting data from approximately 300 participants. This would have meant exposing far fewer individuals to trial procedures, including being allocated to the waiting list control condition, and the evidence from the trial could have been made public sooner.

Substantial effort is often expended on recruiting and collecting data from participants in behavioral intervention trials. Delivering interventions to participants often incur additional costs that need to be considered in restricted budgets. These efforts and costs need to be balanced with study objectives, as increasing the number of participants leads to reduced uncertainty in effect estimates. It is, therefore, not surprising that sample size considerations are given serious attention during the planning of trials, mixed in with feelings of despair, disbelief, and above all, hope.

With misguided faith in null hypothesis testing delivering certainty about effects in otherwise uncertain circumstances [

Over- and underrecruiting participants is both costly and unethical [

The objective of this study is to demonstrate how a recently completed trial of a digital alcohol intervention would have played out had a Bayesian sequential design been used, rather than following a traditional fixed sample size based on a priori power calculations. We will show that participants were excessively overrecruited, resulting in costs and efforts wasted when the evidence was already at hand.

The literature on Bayesian statistics and sequential designs is substantial [

To understand Bayesian sequential designs, one needs to have at least a general understanding of Bayesian statistics. Within the Bayesian paradigm, one is interested in estimating the

The posterior probability distribution is calculated by combining the information available through the data collected, with what is known as the

To illustrate this,

Marginal posterior distributions of odds ratios for smoking cessation (prolonged abstinence and point prevalence of smoking abstinence)—comparing study participants who had access to a digital smoking cessation intervention versus waiting list control group participants.

Examples of prior distributions; (A) normal distribution with mean 2 and SD 1; (B) normal distribution with mean 0 and SD 1; (C) normal distribution with mean 0 and SD 0.1.

Rather than targeting a fixed sample size, a trial adopting a Bayesian sequential design aims to recruit enough participants so that the posterior distribution of the effect estimate is informative relative to the study objectives. For instance, in a trial of a smoking cessation intervention, where our main concern is the OR of abstinence, we may decide that we want to show that the posterior probability of the OR being greater than 1 is at least 89% (or any other probability we find sufficient relative to the study context). Therefore, we collect data and continuously analyze it until we have reduced the uncertainty enough so that we can show that the OR is greater than 1 with at least 89% probability. There is, however, no need to have only one target; rather, it is often reasonable to include at least one more target defining when the intervention seems ineffective and it is futile to continue the trial. An example of this would be if the posterior probability is at least 92% that the OR is greater than 0.9 and less than 1.1 (ie, close to the null). The targets, often referred to as

Effect: p ( OR > 1 | D ) > 89%

Futility: p ( 0.9 < OR < 1.1 | D ) > 92%

Harm: p ( OR < 1 | D ) > 89%

Note that criteria should be defined relative to the study objectives, the context in which they are evaluated, and their potential benefits and harms. If one was evaluating the effects of a surgical procedure, perhaps the 89% probability of effect should be closer to 98% probability, while the probability for harm should perhaps be revised down to 75%.

To demonstrate how a trial may develop using a Bayesian sequential design in contrast to a fixed sample size, we revisit a randomized trial of a digital alcohol intervention [

The trial received ethics approval on November 6, 2018, by the regional ethical committee in Linköping, Sweden (DNR 2018/417-31).

In this tutorial, we will only give a brief overview of the trial procedures; a full description of the trial is available in the study protocol [

The core element of the digital intervention was a text message sent to participants each Sunday afternoon. The text message included a prompt to self-monitor one’s current alcohol consumption, with a hyperlink to a web-based tool. Those who decided to click on the link were asked to report their recent drinking and were then given access to personalized support. More information on the intervention is available in the study protocol [

Participants allocated to the control group were advised that they would receive information designed to motivate them to think more about reducing their alcohol consumption and that after 4 months they would receive additional support delivered to their mobile phone. Participants in the control group also received a single text message with basic health information regarding short- and long-term effects of alcohol consumption that also included a link to a website with information about alcohol.

There were two primary outcomes in the trial, as follows:

Frequency of heavy episodic drinking (HED), which was assessed by asking participants how many times they consumed 4 (for women), 5 (for men), or more standard drinks on one occasion the past month.

Total weekly alcohol consumption (TWC), which was measured using a short-term recall method by asking participants the number of standard drinks consumed the past week.

Outcomes were assessed at 2- and 4-month postrandomization, initiated by sending text messages to participants with hyperlinks to questionnaires. Participants were called to collect responses if there was no response to reminders.

The required sample size was determined using Monte Carlo simulations. A full description of the simulations is available in the study protocol [

Participants were recruited over a series of 6-month periods. Between each period, we checked if the planned sample size had been achieved. Between April 25, 2019, and November 26, 2020, at which time recruitment was stopped, we randomized 2129 participants. This equated to approximately 19 months of recruitment, having allowed an initial grace period of 1 month for advert placement algorithms to optimize their performance.

Putting aside the required sample size of 2129 participants, what would our null hypothesis-based analyses have looked like if we had stopped the trial after collecting data from only 15 participants? What about after 100 or 200 participants? In

Maximum likelihood estimates and

If we had decided to not use a fixed sample size but had rather adopted a Bayesian sequential design, we would have foregone a power calculation and instead defined target criteria for when recruitment should end. These criteria may have been the following:

Effectiveness: p ( IRR < 1 | D ) > 97.5% and p ( IRR < 0.87 | D ) > 50%

Futility: p ( 0.87 < IRR < 1.15 | D) > 97.5%

The effectiveness criterion says that we should stop recruitment if the probability that the intervention group is drinking less than the control group is greater than 97.5%; it also says that the probability of the estimated IRR being less than 0.87 should be greater than 50%. An IRR of 0.87 is comparable with our fixed sample size power calculation assumption of 15% less alcohol consumption in the intervention group versus the control group. The futility target criterion says that we will stop recruitment if it is more than 97.5% likely that the estimated IRR is between 0.87 and 1.15, that is, within a range of effect sizes that are considered too small to be of importance considering the context.

Just like we did for the null hypothesis analyses in

In

Posterior probability distributions and target criteria plotted over available data from respondents with respect to total weekly consumption (TWC) using both standard normal priors (left) and skeptical priors (right). IRR: incidence rate ratio.

Posterior probability distributions and target criteria plotted over available data from respondents with respect to total weekly consumption (TWC) using both standard normal priors (left) and skeptical priors (right). IRR: incidence rate ratio.

A trial of a digital alcohol intervention could have stopped recruitment after approximately 15% of the prespecified sample size had been recruited if a Bayesian sequential design had been used. The consequences would have been fewer participants recruited to a control condition that made them wait for the novel support tool and reduced costs of recruitment; in addition, evidence of the intervention’s effectiveness could have been made public sooner. Instead, overrecruitment was the result of anticipating small effects from a public health intervention of this type, while also controlling for the risk of type 1 and 2 errors.

Trials are conducted because effects of interventions are not known; thus, the design of trials should facilitate discovery efficiently. This is not to say that prior knowledge cannot be useful when designing Bayesian sequential designs; on the contrary, both conservative views on the effects and data from previous trials can be incorporated into the priors used during analysis. Priors are ideal in this circumstance since they dominate the analysis when data are scarce, protecting from spurious findings, yet their influence is lessened as more data become available.

Bayesian sequential designs do not rely on an a priori fixed sample size; nevertheless, planning, ethics approval, and grant applications often require one. This can still be achieved by estimating the final sample sizes using simulation [

One caveat that should be avoided when using Bayesian sequential designs is to view the target criteria as hard and fast rules—making them shortcuts to going back to dichotomizing evidence into effect and no effect. Instead, the target criteria should be viewed as researchers’ intentions for what is considered findings of interest. One may have fulfilled some criteria of the trial but not others and still decide to end the trial. The trial should be stopped when, on the basis of accumulated results, the answer to a scientific question is sufficiently well known that the results can be used in a broader context [

In some trials, it will not be possible to access follow-up data continuously throughout the trial period to check the criteria, and so a Bayesian sequential design may not be possible to adopt. This may be the case if data are collected at multiple sites, possibly internationally, and it is time-consuming to collate all data to do analyses. However, it should be noted that the benefits of sequential designs may still be used in cases where it is possible to analyze data at least occasionally, for instance for every 50-100 participants. Analyses do not have to be done for every new data point available but rather for larger sets of participants.

Finally, reducing research waste and protecting research participants from unnecessary harm should be top priorities for researchers studying interventions. To avoid under- and overrecruitment, which occurs when using fixed sample sizes, is an important mitigation, and Bayesian sequential designs allow for exactly this. Examples of their use in behavioral intervention trials can be found in the literature [

heavy episodic drinking

incidence rate ratio

odds ratio

total weekly consumption

This project received funding from the Alcohol Research Council of the Swedish Alcohol Retailing Monopoly (DNR 2019-0056 and DNR 2020-0043). The funder of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report.

Deidentified data sets generated during or analyzed during this study will be made available upon reasonable request to the corresponding author, after approval of a proposal and with a signed data access agreement.

MB owns a private company (Alexit AB) that maintains and distributes evidence-based lifestyle interventions to be used by the public and in health care settings. Alexit AB played no role in developing the intervention, study design, data analysis, data interpretation, or writing of this report. Services developed and maintained by Alexit AB were used for sending text messages and data collection.