This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
The COVID-19 pandemic has caused major disruptions worldwide since March 2020. The experience of the 1918 influenza pandemic demonstrated that decreases in the infection rates of COVID-19 do not guarantee continuity of the trend.
The aim of this study was to develop a precise spread model of COVID-19 with time-dependent parameters via deep learning to respond promptly to the dynamic situation of the outbreak and proactively minimize damage.
In this study, we investigated a mathematical model with time-dependent parameters via deep learning based on forward-inverse problems. We used data from the Korea Centers for Disease Control and Prevention (KCDC) and the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University for Korea and the other countries, respectively. Because the data consist of confirmed, recovered, and deceased cases, we selected the susceptible-infected-recovered (SIR) model and found approximated solutions as well as model parameters. Specifically, we applied fully connected neural networks to the solutions and parameters and designed suitable loss functions.
We developed an entirely new SIR model with time-dependent parameters via deep learning methods. Furthermore, we validated the model with the conventional Runge-Kutta fourth order model to confirm its convergent nature. In addition, we evaluated our model based on the real-world situation reported from the KCDC, the Korean government, and news media. We also crossvalidated our model using data from the CSSE for Italy, Sweden, and the United States.
The methodology and new model of this study could be employed for short-term prediction of COVID-19, which could help the government prepare for a new outbreak. In addition, from the perspective of measuring medical resources, our model has powerful strength because it assumes all the parameters as time-dependent, which reflects the exact status of viral spread.
Similar to the 1918 influenza pandemic that occurred more than 100 years ago, the COVID-19 pandemic has created major disruptions worldwide. At the end of World War I, the 1918 influenza pandemic wreaked havoc globally; it killed more than 40 million people, more than 2% of the world’s population [
Recently, there have been numerous studies on developing models to find a mathematical description of a system and translate it to the current situation of COVID-19. These studies typically introduce the susceptible-infected-recovered (SIR) model or its derivatives. In some of these studies, the model parameters are considered as constants due to the complexity of modeling. For instance, a previous study proposed a conceptual model that includes individual behavioral reactions and government actions, while another study reviewed the basic reproduction number of COVID-19 with constant parameters [
Mathematical modeling is a process that aims to find a mathematical description of a system and translate it into a relational expression. When a system (eg, an infectious disease) continuously changes over time, differential equations, which may include parameters, can be used to model it. The process of finding the parameters that best fit the given data from the system is called an inverse problem. In this study, we aimed to analyze COVID-19 spread in South Korea using the SIR model. We approximated each outcome variable (S, I, and R) and parameter (β and γ) in the model using deep learning. Moreover, to address the shortcomings of previous studies, we considered the parameters as functions of time, which allowed us to compute the infection rate, the recovery rate, and the time-dependent reproduction number, RTD. This approach is more interpretable because β(t), γ(t), and RTD can be obtained as functions of time, and the overall dynamics of the actual data can also be obtained. We hypothesized that RTD could be used as a surrogate marker to indicate the pressure on health care resources in a region. This is because the number of available beds for patients with COVID-19 in an area decreases when the infection rate increases or when the recovery rate stagnates or decreases.
Additionally, unlike in other models, such as the growth model, we do not assume any distribution type for the modeling. In the traditional growth model, the growth rate is considered as a piecewise constant function to compute the effective reproduction number. However, this assumption is not realistic in many cases, as the reproduction number can dramatically change. In contrast, our model is an appropriate solution for such problems due to its time-dependent nature. Furthermore, we provide numerical simulation results that guarantee the convergence of our deep learning approach. Finally, our methodology is applicable to many areas involving differential equations, and it can be easily implemented without a deep understanding of the model.
The reproduction number has several variants. The basic reproduction number (R0) is defined as the expected number of cases directly generated by one case in a population, assuming all individuals are susceptible to infection. Compared to R0, the effective reproduction number (Rt) does not assume complete susceptibility of the population [
We collected our data from the Korea Centers for Disease Control and Prevention (KCDC) and the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. The data consisted of the cumulative numbers of tested people (
Cumulative numbers of infected and recovered COVID-19 cases in South Korea (left) and cumulative numbers of negative cases and tested people (right).
Daily numbers of confirmed cases and cumulative numbers of recovered cases for Seoul, Busan, Daegu, and Gyeonggi-do.
This study received an exemption from informed consent by the institutional review board committee of the Seoul National University Bundang Hospital because we used public data provided by the KCDC.
Infectious disease modeling in mathematics can capture an epidemic of a given infectious disease and aid public health interventions. Modeling usually requires disease-related statistical data, calculation of model parameters, and analysis of the epidemic. We adopted the SIR model, which is suitable for our data (see
Illustration of the SIR model. I: infected; R: recovered; S: susceptible.
We constructed five neural network models for S, I, R, β, and γ, denoted by Snet, Inet, Rnet, βnet, and γnet, respectively. The concrete model structures are presented in
Forward-Inverse SIR model networks. Each network contains 1 input node (time t), one1output node (value), 4 hidden layers, and 256 nodes in each hidden layer. The hyperbolic tangent tanh(x) is used in the activation functions except for the last layer. The Softplus and Sigmoid functions are used in the last hidden layer to meet the constraints S, I, R, β>0, and 0<γ<1. I: infected; R: recovered; S: susceptible.
We conducted simulations for four provinces: Seoul, Gyeonggi-do, Busan, and Daegu. We applied five deep neural network (DNN) models to derive the parameters Snet, Inet, Rnet, βnet, and γnet. For a more accurate evaluation of the model parameters, we also provided a numerical solution called Runge-Kutta fourth order (RK4) using the estimated parameters. RK4 is one of the most well-known and theoretically proven algorithms that converges to analytic solutions. In contrast, the neural network-based methodology of this study has a weak theoretical background for convergence. Therefore, we aimed to show how close the time-dependent parameters found by DNN are to the actual solution through RK4.
For the RK4 method, we set a step size of h=10−3, with 26 observations used for Seoul, Busan, Daegu, and Gyeonggi and 77 observations used for South Korea. The observations are presented in
We estimated the model parameters (β and γ) and outcome variables (S, I, and R) in the SIR model via DNNs for South Korea, Seoul, Busan, Daegu, and Gyeonggi. The results for South Korea are presented in
We also estimated RTD for South Korea (
SIR model target values and relative errors from February 7 (t=–26.6) to March 30 (t=25.0), 2020, in South Korea. The red lines in the top three graphs denote the Snet, Inet, and Rnet values for each graph, the green lines in the top and middle three graphs denote the observations, and the blue lines in the middle three graphs denote the RK4 results with the parameters βnet and γnet. The population ratio is the number of people in each group (S, I, and R) divided by the total number of people (N). Relative errors were defined as (|observed value − Network [or RK4] value|/|observed value|) × 100 and were calculated for each parameter. I: infected; R: recovered; RK4: Runge-Kutta fourth order method; S: susceptible.
Suspected-infected-recovered model parameter network values and RTD values from February 7 (t=–26.6) to March 30 (t=25.0), 2020, for South Korea. We divided the range of RTD into two parts, shown at bottom left (–26.6≤t≤–19) and bottom right (9≤t≤25.0). On February 18 (t=–15.3), the first case was confirmed to be related to Shincheonji, which was the starting point of the outbreak in Daegu.
We summarized the overall trend by analyzing RTD (t). First, on February 8 (t=–25.3) in South Korea, RTD=1.0610 implies the spread of COVID-19. Starting from February 18, RTD (t) increased dramatically (t=–15.6), and it reached its peak (RTD (t)=124.8454) on February 28 (t=–5.6). After March 13 (t=8.0), RTD decreased below 1 again, signaling a decreasing trend in the spread of COVID-19 from an epidemiological viewpoint. However, RTD began to increase again from March 19 (t=14.0). From February 7 to March 30, the average values of β and γ were 0.1656 and 0.0253, respectively.
In the second case, Seoul, up to March 9 (t=4), β reached 0.2306 while γ only reached 0.0192, resulting in a maximum value of 12.0405 for RTD in this period. After March 16 (t=11), RTD decreased to 3.1244 but then increased again, reaching 3.8255 on March 30 (t=25). This indicates that effective control of the spread of COVID-19 was not achieved. The average values of β and γ were 0.0705 and 0.0140, respectively. In the third case, Busan, on March 5 (t=0), at the beginning of the observation, β was 0.1300 (Supplementary Table), while RTD was 156.7965. This is because R(t), the recovery group, did not change in the initial stage, whereas γ was estimated to be 0.0008 due to the constraint γ>0. On March 8 (t=3), RTD was 0.0908 because of the change in R(t), reaching 0.5401 on March 30 (t=25). The average values of β and γ were 0.0253 and 0.0670, respectively. In the final case of Daegu, similar to Busan, RTD was 521.9075 at the beginning of the observation on March 5 (t=0). After March 11 (t=6), the recovery rate γ began to increase faster than the infection rate β, with RTD having its lowest value of 0.1224 on March 24 (t=19). After March 24, RTD increased, reaching 0.2409 on March 30 (t=25) (see Figure SM3 in
Because RTD is the ratio of β(t) to γ(t), RTD can have a large value when γ is small compared to β. This situation can be observed in the early stage of COVID-19 spread in South Korea, excluding Seoul and Busan (eg, the Shincheonji cult cases). However, following the computation of the basic reproduction number in a previous study, we obtained the effective reproduction number Rt in the usual range found in previous studies [
Comparison of RTD and Rt for South Korea. Rt was computed based on the growth model. Rt: effective reproduction number; RTD: time-dependent effective reproduction number.
RTD is a more sensitive and responsive marker than Rt, and it reflects subtle changes of situations over time. Especially at the starting point of an outbreak, we can detect increasing trends more accurately with RTD (
Comparison of RTD and real-world confirmed cases. RTD: time-dependent effective reproduction number.
The RTD we developed has important real-world implications for measuring the current status of the viral spread and the effectiveness of interventions. By setting the infection rate and recovery rate as time-dependent parameters, it is possible to accurately evaluate the pressure of depletion of health resources on the community. Indeed, after March 5, when RTD exceeded 500, Daegu was in danger of total depletion of medical resources [
Compared to Daegu, Gyeonggi-do intervened more proactively. The RTD of Daegu at the first opening of the CTC was over 500; however, that of Gyeonggi-do was 2.6. The local government in Gyeonggi-do, which closely monitored the situation in Daegu, prevented the exhaustion of medical resources by providing optimal medical services for each risk group of patients with COVID-19 in cooperation with the central government, along with general policies such as public disclosure of mobile routes of infected people, encouragement of social isolation, and wearing of masks (
Comparison of β(t), γ(t), and RTD in Daegu and Gyeonggi-do. Note the differences in the vertical scales of RTD. RTD: time-dependent effective reproduction number.
We developed an entirely new SIR model with time-dependent parameters via deep learning methods. Furthermore, we validated the model with the conventional RK4 model to confirm its convergent nature. In addition, we evaluated our model based on real-world data reported by the KCDC, the Korean government, and news media.
Compared to previous studies, this research has the following three technical advantages. First, previous studies only dealt with the infected cases under certain assumptions, such as the cumulative number of infected cases increasing exponentially [
In the situation of a novel virus pandemic, it is crucial for every central and local government to maintain appropriate medical resources in readiness for unexpected penetration of the new disease. South Korea saw one of the most disastrous outbreaks of COVID-19 during the first few weeks of March 2020. In Daegu especially, the entire local medical system was on the brink of collapse. However, the Korean government soon developed a preemptive policy for each local government by learning from the situation in Daegu. The government solved its acute hospital bed shortage by revising the triage criteria more than seven times and implementing CTCs all over the country. Since then, lives were saved by reserving beds for the most acutely ill patients with COVID-19 and placing patients with less severe disease in CTCs [
In a country such as Korea, where there is no interregional blockade, the spread of the virus can be exacerbated in a few days due to movement of the virus across regions. In fact, the number of COVID-19 cases started increasing again from March 19 (t=14.0), indicating that the containment of COVID-19 cannot be realized without achieving herd immunity or developing therapeutics.
Furthermore, we require a tool that can monitor virus outbreaks simultaneously across regions in the shortest time span.
The same principle applies even if we broaden our view from the spread of viruses between regions to the spread among countries. In the current COVID-19 pandemic, the world must work together to prevent the spread of the virus. This is because the entire world is socially, culturally, and economically intertwined through advanced transportation. Therefore, there is an urgent need for a tool that can respond sensitively over time, provide information about the current virus outbreak, and evaluate the effectiveness of interventions. The methodology and new model of this study could be employed for proactive intervention. In addition, from the perspective of measuring medical resources, our model has powerful strength because it assumes all the parameters as time-dependent, which reflects the exact status of viral spread. Furthermore, the methodology and modeling approach are scalable and universal; therefore, they can be applied to other new infectious disease pandemics if real-world data are available.
This research has several limitations. First, the time-dependent model of this study was validated only with COVID-19 data from South Korea. However, this model can be easily applied to data from another outbreak because the modeling process and methodology are disclosed fully in this article. To crossvalidate our strategies, we provide results of similar analyses of outbreaks in Italy, Sweden, and the United States in
Mathematical formulation of the SIR model and supplemental figures.
Center for Systems Science and Engineering
Community Treatment Center
deep neural network
Korea Centers for Disease Control and Prevention
reproduction number
basic reproduction number
effective reproduction number
time-dependent effective reproduction number
Runge-Kutta fourth order
susceptible-infected-recovered
HJ.H was supported by National Research Foundation of Korea (NRF) grants funded by the Korean government (MSIT) (Nos. 2017R1E1A1A03070105 and NRF-2019R1A5A1028324) and by an Institute for Information and Communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. 2019-0-01906, Artificial Intelligence Graduate School Program (POSTECH)).
SYJ and HTJ drafted the entire manuscript as first authors. HJS contributed to the discussion of the data. HJH supervised the entire process as the corresponding author.
None declared.