This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.

The dynamics of the COVID-19 pandemic vary owing to local population density and policy measures. During decision-making, policymakers consider an estimate of the effective reproduction number R_{t}, which is the expected number of secondary infections spread by a single infected individual.

We propose a simple method for estimating the time-varying infection rate and the R_{t}.

We used a sliding window approach with a Susceptible-Infectious-Removed (SIR) model. We estimated the infection rate from the reported cases over a 7-day window to obtain a continuous estimation of R_{t}. A proposed adaptive SIR (aSIR) model was applied to analyze the data at the state and county levels.

The aSIR model showed an excellent fit for the number of reported COVID-19 cases, and the 1-day forecast mean absolute prediction error was <2.6% across all states. However, the 7-day forecast mean absolute prediction error approached 16.2% and strongly overestimated the number of cases when the R_{t} was rapidly decreasing. The maximal R_{t} displayed a wide range of 2.0 to 4.5 across all states, with the highest values for New York (4.4) and Michigan (4.5). We found that the aSIR model can rapidly adapt to an increase in the number of tests and an associated increase in the reported cases of infection. Our results also suggest that intensive testing may be an effective method of reducing R_{t}.

The aSIR model provides a simple and accurate computational tool for continuous R_{t} estimation and evaluation of the efficacy of mitigation measures.

The COVID-19 pandemic is currently underway. As of September 2, 2020, over 6,000,000 individuals in the United States have been reported positive for COVID-19. Modeling studies are key to understanding the factors that drive the spread of the disease and for developing mitigation strategies. Early modeling efforts forecasted very large numbers of infected individuals, which would overwhelm health care systems in many countries [

One of the most fundamental metrics that describes the pandemic’s dynamics is the reproduction number R_{t}, which is the expected number of secondary infections spread by a single infectious individual [_{t} depends on three factors: (1) the likelihood of infection per contact, (2) the period during which infectious individuals freely interact with susceptible individuals and spread the disease, and (3) the rate of contact. The likelihood of infection per contact (factor 1) is determined on the basis of pathogen virulence and protective measures such as social distancing or wearing masks. Free interactions between infectious and susceptible individuals (factor 2) occur until the infectious individual is self-quarantined or hospitalized, either when the individual tests positive or experiences severe symptoms. Finally, the rate of contact (factor 3) is strongly affected by public health measures to mitigate risk [_{t} is determined on the basis of the biological properties of the pathogen and multiple aspects of social behavior. When R_{t}>1, the number of cases is expected to increase exponentially. The pandemic is considered to have been contained when R_{t} decreases and remains at <1. Real-time R_{t} estimation is critical for determining the effect of implemented mitigation measures and future planning.

We propose a method for continuous estimation of the infection rate and R_{t} to investigate the effect of mitigation measures and immunity acquired by those who recover from the disease. We estimated R_{t} with a Susceptible-Infectious-Removed (SIR) model [_{t} and an assessment of the effect of mitigation measures were carried out on the basis of estimates of the distribution of the serial intervals between symptom onset in the primary and secondary cases [_{t} estimate is then obtained using the infection rate estimate. The SIR model is described as a system of differential equations, and the key idea in our proposed method is that the initial conditions for each window are considered as values estimated for the previous window. The only additional hyperparameter is the length of the sliding window. The proposed method retains the conceptual and computational simplicity of SIR-type models and can be easily extended through the introduction of additional compartments supported by data.

Data on daily and cumulative confirmed cases between February 29 and September 2, 2020, were obtained from John Hopkins University (JHU), and the dates of interventions by state (eg, state of emergency and stay-at-home orders) were obtained from Wikipedia. The JHU data were available at 2 levels of aggregation: county and state. JHU considers many sources for reporting these data; county-level information was extracted from the websites of the states’ departments of health, and state-level data were extracted directly from the website of the Centers for Disease Control and Prevention.

The SIR model is a system of ordinary differential equations:

Here,

The infection rate is determined as follows:

where

The removal rate

The aSIR model contains two parameters,

The reproduction number was calculated as follows:

_{t}

For the first window, we determined the date when the number of confirmed cases began to increase exponentially. This is important because for many states or counties, very few confirmed cases were initially reported for a number of days or even weeks, which suggests that either the epidemic had not started or the true number of infected individuals was unknown. It is not reasonable to apply an SIR model for this initial period. We considered the onset of the pandemic as the first of the 4 consecutive days in which the number of reported confirmed cases increased in at least 3 days. The initial conditions for system (1) for window 0 were as follows:

_{0}(0)

where _{0}(0)_{0}(0)_{i}

The window was slid by _{i+1}(0)_{i}(s), I_{i+1}(0)_{i}(s),_{i+1}(0)_{i+1}_{i+1}(t)_{i+1}(t)

For each window, the _{t.i}

_{t.i}_{i}/

The _{t.i}_{t}, we used a rolling average of 5 points.

We fit the model for each state and county in the United States. Model performance was evaluated by calculating the quality of fit as the root mean squared error between the actual and fitted _{t} was rapidly decreasing (

(A) Estimated Infectious and forecast Removed. (B) Estimated reproduction number R_{t}. The shaded region indicates the dates of the lockdown. While the 1-day and 3-day forecasts are accurate, the 7-day forecast exhibits marked errors when R_{t}>1 and is rapidly decreasing.

Reproduction numbers and forecast accuracy for 50 US states.

State | R_{t}^{a} max |
MAPE^{b} (1-day forecast), % |
MAPE (3-day forecast), % | MAPE (7-day forecast), % |

Alabama | 2.9 | 1.5 | 4.2 | 10.0 |

Alaska | 2.8 | 1.6 | 3.7 | 11.3 |

Arizona | 3.3 | 1.3 | 2.9 | 10.1 |

Arkansas | 2.8 | 1.5 | 4.0 | 12.6 |

California | 2.5 | 1.7 | 2.9 | 6.3 |

Colorado | 2.6 | 1.1 | 2.9 | 7.3 |

Connecticut | 4.1 | 2.0 | 3.1 | 9.3 |

Delaware | 2.4 | 1.7 | 2.9 | 7.9 |

District of Columbia | 2.1 | 0.8 | 1.8 | 4.4 |

Florida | 3.6 | 2.0 | 4.4 | 9.3 |

Georgia | 3.0 | 1.8 | 3.7 | 7.4 |

Hawaii | 2.7 | 2.0 | 3.6 | 9.7 |

Idaho | 3.4 | 2.4 | 4.8 | 13.6 |

Illinois | 4.0 | 1.3 | 2.5 | 8.5 |

Indiana | 3.8 | 1.4 | 4.0 | 10.4 |

Iowa | 2.8 | 1.8 | 3.6 | 8.0 |

Kansas | 3.0 | 1.6 | 3.5 | 8.6 |

Kentucky | 3.0 | 2.6 | 4.9 | 11.2 |

Louisiana | 3.7 | 1.8 | 4.0 | 12.1 |

Maine | 2.0 | 1.2 | 2.8 | 6.7 |

Maryland | 3.3 | 1.2 | 2.8 | 6.2 |

Massachusetts | 3.4 | 1.3 | 3.6 | 9.7 |

Michigan | 4.5 | 1.6 | 3.5 | 12.8 |

Minnesota | 2.7 | 1.3 | 2.9 | 8.0 |

Mississippi | 2.9 | 1.2 | 3.0 | 9.3 |

Missouri | 3.6 | 1.8 | 3.3 | 11.4 |

Montana | 3.3 | 1.6 | 3.7 | 11.9 |

Nebraska | 2.5 | 2.0 | 4.1 | 9.7 |

Nevada | 2.9 | 2.3 | 3.8 | 10.0 |

New Hampshire | 2.3 | 1.7 | 3.3 | 8.5 |

New Jersey | 4.1 | 1.5 | 2.3 | 7.8 |

New Mexico | 2.3 | 2.2 | 3.4 | 7.3 |

New York | 4.4 | 1.5 | 4.2 | 16.2 |

North Carolina | 3.2 | 1.3 | 2.4 | 7.2 |

North Dakota | 2.4 | 1.8 | 4.8 | 12.6 |

Ohio | 3.3 | 1.2 | 3.4 | 9.8 |

Oklahoma | 3.1 | 1.4 | 3.5 | 10.2 |

Oregon | 2.5 | 1.3 | 2.8 | 6.6 |

Pennsylvania | 3.2 | 1.7 | 2.9 | 6.3 |

Rhode Island | 2.4 | 1.5 | 3.1 | 6.8 |

South Carolina | 3.5 | 2.1 | 4.3 | 10.6 |

South Dakota | 2.1 | 1.3 | 3.2 | 8.7 |

Tennessee | 3.5 | 2.2 | 4.8 | 12.5 |

Texas | 3.6 | 2.0 | 3.9 | 9.3 |

Utah | 3.2 | 1.4 | 3.1 | 8.3 |

Vermont | 2.9 | 0.8 | 2.4 | 7.7 |

Virginia | 2.5 | 1.1 | 2.1 | 5.1 |

Washington | 3.0 | 2.0 | 4.8 | 8.6 |

West Virginia | 3.5 | 1.6 | 3.9 | 14.0 |

Wisconsin | 3.6 | 1.5 | 3.2 | 10.0 |

Wyoming | 2.9 | 1.9 | 4.7 | 14.2 |

^{a}R_{t}: reproduction number.

^{b}MAPE: mean absolute prediction error.

The estimated time course of R_{t} for New York and Nassau county, one of the most affected counties since the beginning of the COVID-19 pandemic, are shown in _{t} also declined upon implementation of the lockdown (_{t} exhibits weekly seasonality, which likely reflects the effect of social interactions and possibly the effect of fluctuations in case reporting on weekdays vs weekends. For New York and Nassau county, R_{t} initially increased, which may reflect the fact that the pandemic in New York was continuously seeded by travelers arriving at John F Kennedy International Airport until a ban on international travel was implemented on March 12, 2020. This may also reflect the fact that not all severe cases were initially recognized and reported as COVID-19 cases. In Florida, R_{t} decreased to almost 1 by mid-April but then began increasing at the end of May (_{t} in the second half of July 2020. The opening of multiple states since June 2020 has been accompanied by an increase in R_{t} beyond 1 (data not shown), and close monitoring of R_{t} is needed to contain another wave of the pandemic.

Next, we compared aSIR with the model developed by Cori et al [_{t} estimate was smoothed with a 7-point rolling average window, same as that in aSIR. While all 3 models show similar estimates when R_{t} approaches 1, their estimates differ considerably in the beginning of the pandemic. In particular, the rt.live model [_{t} than the other 2 models and estimated that R_{t} already decreased to 1 by the time the lockdown was announced in New York on March 22, 2020 (_{t}, and both models estimated that R_{t} decreased and approached 1 in the first week of April 2020. Although both models show a rapid reduction in R_{t} in March, the aSIR model shows a lagged change. However, we are not aware of the ground truth data to determine which model yields a more accurate estimate.

Comparison of models that generate continuous R_{t} estimates. The three R_{t} estimates differ widely in the beginning of the COVID-19 pandemic. In particular, the R_{t} estimated using the rt.live model of Systrom, Vladeck, and Krieger [

Finally, we investigated the effect of an abrupt increase in testing on the estimated R_{t} (_{t}. However, an increase in testing would help identify and quarantine infectious individuals sooner, resulting in a shorter infectious period and larger removal rate _{t}. We did not model a potential increase in _{t} time course estimated without an increase in testing.

Effect of a step-wise 50% increase in testing (left panel, dashed line). The 1-day forecast by the aSIR model adapts within a week. For the Rt estimate, both EpiEstim and our aSIR models produced a spike, followed by a reduction (right panel, dashed lines) before returning to the unperturbed Rt time course (solid lines). aSIR: adaptive Susceptible-Infectious-Removed.

We developed a simple approach to adaptively estimate the time-varying parameters of the SIR model, using reported data on the number of confirmed COVID-19 cases. This approach adds to the already large literature on COVID-19 modeling in 2 ways. First, we estimate the parameters of the SIR model with a sliding window of a limited duration (7 days) to account for rapid changes in transmissibility and contact patterns in response to changes in social behavior and government mitigation measures. The window duration is a hyperparameter that can be changed as needed, the trade-off being the accuracy of the parameter estimates versus the rapid reaction to changes in the underlying pandemic. Because the proposed model is so simple, a number of scenarios can be explored as needed.

Second, we attribute the data on reported cases to the Removed compartment rather than the Infectious compartment. This modeling decision is based on the realities of the COVID-19 pandemic in the United States, where individuals with confirmed COVID-19 are supposed to self-isolate or be hospitalized. Although these individuals remain infectious and can infect other family members or caregivers even when self-isolated or hospitalized, they would not freely interact with the susceptible population, as would be required to attribute them to the

The reported number of positive COVID-19 cases represents a fraction of infected individuals because of the limited testing capacity in March and April 2020; consequently, only those who developed severe symptoms were tested. Up to 80% of infected individuals may have been asymptomatic or may have experienced mild symptoms [_{t} estimated from these limited data can be used to guide policy decisions aimed at protecting the most vulnerable population [

Across all US states, the maximal R_{t} values were estimated for New York (4.4) and Michigan (4.5) (_{t} of 2.0-4.5 (_{t} is to increase the removal rate _{t} of <1 until a vaccine is available and while vaccination efforts are ramping up. Intensive testing combined with social distancing and mask wearing, followed by the isolation of individuals confirmed with COVID-19, are key features of reopening strategies for schools and universities [_{t} in different geographic regions of the United States, better understand the effect of government policies on the dynamics of the pandemic, and develop further mitigation strategies as we continue to battle COVID-19 [

The SIR model is perhaps the simplest model that captures the dynamics of a pandemic. It is based on several assumptions that are valid only to some degree as we consider real-life scenarios. The 2 main limitations of the original SIR model are that it has constant parameters and it is deterministic. Our proposed aSIR model allows us to estimate time-varying parameters and thus removes the first limitation. The other limitation remains, however. It is assumed that infectious individuals freely interact with the susceptible population. The infection rate _{t}, which is calculated using constants _{t} estimated for a large population does not reflect differences in subpopulations, such as age groups, which is especially relevant for COVID-19 [

SIR type models, particularly the proposed time-variant aSIR model, have an advantage over more complex models in the initial stages of a pandemic when critical public policy decisions need to be made while the empirical data on interaction dynamics, transmission rates, and the disease progression and contagiousness from the moment of infection are not yet readily available. Our model provides a simple and efficient method to assess the efficacy of interventions as the pandemic progresses.

Susceptible-Infectious-Removed

adaptive Susceptible-Infectious-Removed

Johns Hopkins University

None declared.