Hospital-Based Back Surgery: Geospatial-Temporal, Explanatory, and Predictive Models

Background: Hospital-based back surgery in the United States increased by 60% from January 2012 to December 2017, yet the supply of neurosurgeons remained relatively constant. During this time, adult obesity grew by 5%. Objective: This study aimed to evaluate the demand and associated costs for hospital-based back surgery by geolocation over time to evaluate provider practice variation. The study then leveraged hierarchical time series to generate tight demand forecasts on an unobserved test set. Finally, explanatory financial, technical, workload, geographical, and temporal factors as well as state-level obesity rates were investigated as predictors for the demand for hospital-based back surgery. Methods: Hospital data from January 2012 to December 2017 were used to generate geospatial-temporal maps and a video of the Current Procedural Terminology codes beginning with the digit 63 claims. Hierarchical time series modeling provided forecasts for each state, the census regions, and the nation for an unobserved test set and then again for the out-years of 2018 and 2019. Stepwise regression, lasso regression, ridge regression, elastic net, and gradient-boosted random forests were built on a training set and evaluated on a test set to evaluate variables important to explaining the demand for hospital-based back surgery. Results: Widespread, unexplained practice variation over time was seen using geographical information systems (GIS) multimedia mapping. Hierarchical time series provided accurate forecasts on a blind dataset and suggested a 6.52% (from 497,325 procedures in 2017 to 529,777 in 2018) growth of hospital-based back surgery in 2018 (529,777 and up to 13.00% by 2019 [from 497,325 procedures in 2017 to 563,023 procedures in 2019]). The increase in payments by 2019 are estimated to be US $323.9 million. Extreme gradient-boosted random forests beat constrained and unconstrained regression models on a 20% unobserved test set and suggested that obesity is one of the most important factors in explaining the increase in demand for hospital-based back surgery. Conclusions: Practice variation and obesity are factors to consider when estimating demand for hospital-based back surgery. Federal, state, and local planners should evaluate demand-side and supply-side interventions for this emerging problem. (J Med Internet Res 2019;21(10):e14609) doi: 10.2196/14609


Background
In 2012, there were 3689 practicing board-certified neurosurgeons in the United States [1]. That number was largely unchanged in 2016 [2]. During these years, demand for back surgery (Current Procedural Terminology [CPT] codes beginning with the digit 63) increased by 49% from 311,028 to 464,391, and by the end of 2017, that increase was 60% [3]. CPT 63 medical codes are a series of spinal procedures including laminectomies, laminotomies, decompressions, and

Data
Definitive Healthcare provided the hospital, zip code, and state-level procedure and cost data from January 2012 to June 2018 through the hospital revenue center analytics query, which includes queries by CPT code. Data in Definitive Healthcare are derived from the Standard Analytical Files by the Centers for Medicare and Medicaid Services (CMS)]. From these data, the organization uses undisclosed algorithms to estimate all-payor claims. Columns with fewer than 11 claims or procedures are not shown because of privacy requirements [3]. For this analysis, only complete annual data from 2012 to 2017 were used, as the CMS datafile and thus the associated estimates from January 2018 through June 2018 were approximately only 93% complete [3].
The Centers for Disease Control and Prevention's Behavioral Risk Factor Surveillance System (BRFSS) prevalence data provided the information for state-level adult obesity rates by year, from 2012 to 2017 [11]. Guam, Puerto Rico, and the US Virgin Islands were excluded from the analysis because of small sample sizes in both datasets.

Geospatial Analysis
Heat maps are used to plot the zip code unit of analysis procedure data by year. Heat maps provide the intensity of the number of claims by time and geographic region. These types of maps have been used for improving minority health surveillance [12], examining birth outcomes [13], and evaluating a variety of other applications in health care. The value in geospatial-temporal analysis is the graphical depiction of change in demand over time. A video display from 2012 to 2017 with standardized heat intensities provides an animated view of the change in demand by location. An analysis of cost and demand centers is then provided.

Forecasting Analysis
The data in the Definitive Healthcare dataset are nonseasonal as they provide annual-level observations by the hospital unit of analysis. Even so, generating nonseasonal forecasting models that have predictive capability on a blind withhold set at the proper level of aggregation can provide decision support for supply-and demand-side interventions. These types of models have found support in many areas of health care such as radiology [14] and Alzheimer disease [15].
To this end, hierarchical time series (HTS) [16] using R statistical software [17] evaluated the number of claims as a function of time series components. An HTS recognizes that data are aggregated at various levels. In this case, the hierarchy evaluated include the states, the census regions, and the nation. The models are built on a training set of data for the years 2012 to 2015 and forecast on a blind test set, years 2016 and 2017. Although Bayesian hierarchical models have been used for spatially correlated health outcomes and utilization rates [18], there is no readily found use of HTS for prediction in health care.
To understand HTS, one needs to only consider a single medical system that operates in 2 separate states with 3 hospitals per state. There are then 4 basic ways using which one might forecast annual visits as an example: 1. A forecast might be generated for each hospital, aggregated at the state level and then further aggregated at the system level. A variety of different forecast methods might be used to generate the forecasts. The term for this method is bottoms up.
2. Forecasts might be generated at the state level and then disaggregated (eg, via historical proportions) to the hospitals and aggregated to the system level. Again, the forecasts might be generated in multiple ways. The term for this method is middle out. 3. Forecasts might be generated at the system level (via multiple methods) and then disaggregated to all levels below (eg, proportions). The term for this is top down. 4. One might take some combination of the previous methods to minimize forecast error. This is an ensemble method that might be termed the optimal reconciliation approach, which is optimal if the forecasts are unbiased [19].
To avoid selection bias, all methods were evaluated for performance on the test set. Furthermore, the method for forecasting at these levels of hierarchy was using autoregressive integrated moving average (ARIMA) components and as well as smoothed error and trend components (exponential, trend, seasonality [ETS] without seasonality).
ARIMA models focus on autocorrelation of components for stationary time series data. The AR components are autoregressive terms, offset by time. For example, the number of claims at time t might be forecast by using the number of claims at time t-1. This would be an AR1 model, as there is 1 offset. As ARIMA models assume stationarity of the time series for forecasting, 2 other components are necessary. The first is differencing or integration, the I in ARIMA, which helps stabilize the mean (whereas transformations help stabilize the variance). Although seasonality and trend might make an ARIMA nonstationary, differencing often corrects this. Sometimes, more than 1 difference is required to make the time series stationary, for example, y t-y t-1 -(y t-1 -y t-2 ) is a 2d order differencing.

Explanatory Analysis
Stepwise linear regression (both forward and backward), lasso regression, robust regression, elastic net regression, and extreme gradient-boosted random forests are built on unaggregated data as well as state-level aggregated data to estimate the number of claims. These models are built on a random 80% training set (10,771 unaggregated, 245 aggregated observations) and evaluated on a 20% withhold set (2693 unaggregated, 61 aggregated) as well. The total number of valid observations were 13,464 unaggregated and 306 aggregated. The primary hypothesis is that the inclusion of obesity rates as an independent variable will yield better explanatory models for both the number of claims and the payment per claim.
Stepwise linear regression based on minimum Akaike Information Criterion was selected over best subset because of the computational complexity. By using forward and backward simultaneously, variables are added in sequence but might be removed if they no longer contribute to the model [21].
Lasso regression is a form of constrained regression that penalizes a model that selects too many variables by using an L1-norm formulation (absolute value), whereas ridge regression is similar but penalizes using an L2-norm formulation (squared coefficient estimates). Elastic net uses a weighted L1 and L2 norm penalty function to reduce the number of coefficients in the model. Formulae for estimating the parameters of the linear model, the lasso regression, the ridge regression, and the elastic net are shown in Figure 1. Random forests, a machine learning technique, use an ensemble of decorrelated tree models and average the estimates of those trees to build forecasts. A tree model itself classifies counts of observations by splitting variables at points based on some decision criteria. An example of a tree with a depth of 3 (3 branches) is given in Figure 2, which splits observations by obesity rate less than 31.63 and then again by obesity rate less than 25.75 and number of discharges less than 10,558.78 and then finally by net income less than US $35,018,392, cash less than US $25,522,424, and cash less than US $8,122,498 [21]. The graph was produced by the xgboost package of R [22]. Gradient-boosted random forests optimize a cost function based on the (pseudo-)residuals of a function using nonlinear optimization techniques. Essentially, the residuals of each tree in the forest are refitted with the possible independent variables in another tree model to estimate a better fit of the original function. Often, a learning rate (shrinkage) is applied to the residuals to allow for better generalization. A discussion of gradient boosting is available in Chapter 10 of Elements of Statistical Learning [21].

Variables
All the considered variables from the Definitive Healthcare dataset are shown in Table 1 with reasons for exclusion/inclusion. Most variables were, by default, included during analysis; however, those variables that were linear combinations of each other or were necessarily unknowable when forecasting CPT 63 codes were omitted.
There is 1 primary dependent variable of interest: number of claims for CPT 63 codes. This variable is measured at the hospital level over time and is also aggregated by zip code/year for geospatial mapping and by state/year for forecasting and additional modeling analysis. The number of claims include third-party invoices provided by the hospital, regardless of the payer. The number of claims provides a measure of the met demand for services.
For the geospatial and temporal analyses, the variables year and zip code (aggregated hospital-level data) are used to describe the intensity of both the number of claims and the payment per claim. Zip code provides a high resolution for geographic claims data. For the HTS forecasting analysis, time components are used without external regressors.
Explanatory stepwise regression, lasso regression, ridge regression, elastic net, and gradient-boosted random forest models investigate financial variables, technical variables, workload variables, geospatial variables, a temporal variable (year), and obesity rates (defined as the proportion of individuals with a body mass index greater than or equal to 30%). A discussion of each of the variable groups and variables follows.
The financial variables investigated include net patient revenue, net income, cash on hand, total assets, total liabilities, and proportion of Medicare/Medicaid reimbursement. The financial variables were carefully selected from the set of available financials such that they are not a linear combination of other variables or nearly a linear combination (see Table 1). Although available, total payments and charges for CPT 63 were not used in the models, as they (1) would not be known in advance and (2) would necessarily be direct functions of the number of claims.
Quantitative workload variables include the number of staffed beds, discharges, emergency room visits, surgeries, affiliated physicians, and employees. Categorical technical variables include ownership type, medical school affiliation, and hospital type. These variables are investigated because of their availability and possible confounding effects. Geographic variables include urban/rural location, state, and zip code. These variables are important in evaluating practice area variation and associated effects.
Obesity rates are of interest to the study. These rates are assigned based on the state, as county and zip code level data are not available. This independent variable is of importance to the study.

Descriptive Statistics: Missing Data
Missing data were present in the Definitive Healthcare dataset. As the percentage of missing data was small, the data were imputed via regression trees (simple imputation). The total number of valid observations at the hospital unit of analysis from January 2012 to December 2017 was 13,769. There were 2244 unique zip codes with data resulting in 13,464 observations from 2012 to 2017, although many of these were true zeros. Aggregated at the state level, there were 306 observations of the 50 states plus the District of Columbia over the 6-year span.

Descriptive Statistics: Quantitative Data
Important descriptive statistics for the Definitive Healthcare data are shown ( Table 2). The average number of CPT 63 claims by hospital by year was 182, and the average payment was over US $4045.99, about 50.37% of mean charges (US $8,032.13). On average, hospitals performing these claims were large (227 beds with 1629 employees and 299 affiliated physicians). These hospitals had on average positive net income (US $22 million) and assets exceeding liabilities. Overall, 45% of their patients used Medicare or Medicaid reimbursement.  (Table 4).
Obesity rates by state have increased from 2012 to 2017 (Table  5). In 2012, the mean obesity rate per state was 27.95%. By 2017, this rate was 30.59%; however, this increase is not weighted by population size. As discussed previously, the aggregate increase for the United States from 2012 to 2017 was 5% (34.9% to 39.6%) [7]. The state data include the District of Columbia (51 observations per year) but are not population weighted.

Descriptive Statistics: Correlational Analysis
Hierarchical clustered correlational analysis revealed strong relationships among many of the quantitative variables. Payments and claims are (as to be expected) highly correlated (r=0.9). Most financial and workload metrics are highly correlated as well (eg, net patient revenue and the number of employees; r=0.95). Owing to the large sample size, nearly all correlations are statistically significant at the alpha=.05 level (see Figure 3). The matrix was produced using ggcorrplot [23].
The inclusion of obesity in this study is because of a correlational finding that the number of CPT 63 procedures appears to be influenced by state obesity rates at the aggregate level ( Figure 4 [24]). The question, though, is whether this apparent correlation in the logs is sustained when other financial, geographic, technical, and temporal variables are considered.

Exploratory Data Analysis: Feature Engineering and Transformations
Although random forest regression models are scale invariant, traditional regression techniques such as stepwise, lasso, ridge, and elastic net are not [21]. Investigating transformations to achieve multivariate normality (assuming random-effects regression) is important to meet model assumptions. Furthermore, time series forecasting often benefits from these same techniques [16]. In addition, investigating additional features that might be generated from the existing ones through linear combinations and other methods often results in disentangling collinear variables and finding interesting relationships that might otherwise remain undiscovered.
A multivariate Box-Cox transformation was run using the car package in R [25] for all modeled quantitative variables simultaneously after these variables had been location adjusted to make the variables strictly positive, definite, and scale adjusted by dividing by the standard deviation. Multivariate Box-Cox seeks to find power transformations (values of λ for each variable) that make the data multivariate normal enough for use in traditional linear models [26]. These transformations help alleviate the problem of collinearity and address multivariate normal assumptions of random-effects regression. The null hypothesis is that the proposed transformation generated through the transformation is a good fit. The alternative is that it is not a good fit. The proposed transformation was a vector of primarily natural logarithms (values near zero) with some exceptions. The likelihood ratio test resulted in a P value >.99, indicating that the assumption of multivariate normality cannot be rejected. The actual vector of transformations follows: λ={0.1, 0.3, 0.33, 0.38, 0.21, −0.07, 1.66, 0.76, −0.03, 0.55, 0.28, 0.17, 1.03, 1.04} for x={number of claims, number of staffed beds, number of discharges, ER visits, total surgeries, net patient revenue, net income, cash, total assets, total liabilities, affiliated physicians, employees, percent Medicare/Medicaid, obesity rate}, respectively.
Univariate histograms for the number of claims and obesity preand posttransformation are in Figure 5. The transformed graph of the number of claims shows some slight skew but is otherwise unremarkable. However, the graph of obesity rates is telling. Although the transformation fails to reject the assumption of multivariate normality, the obesity graph is bimodal. It is possible that linear-in-parameter models will not be able to correctly fit the importance of this variable, whereas tree-based models will find patterns.

Geospatial Analysis Results: Zip Code Unit of Analysis
Geospatial heat map analysis of CPT 63 number of claims by year and parsed by zip code is shown in panels ( Figure 6). The maximum scale is approximately 7000 claims for each diagram to allow for comparison across years. Multimedia Appendix 1 shows this analysis in video format.
In 2012, there was very little high-intensity activity (Houston and Dallas, Texas, primarily, with some activity in the Carolinas). The Eastern seaboard has activity, but it is not intense, and the Western seaboard has minimal activity, except near Seattle.
By 2013, the Eastern seaboard (particularly New Jersey) has increased in intensity, and the areas around Chicago and Salem, Oregon, are emerging as well. Houston and Dallas remain the most intense regions for the number of claims.
In 2014, the number of claims in Seattle and San Antonio, Texas, shifted these cities to high-intensity areas (some red visible). It must be noted that 3 of the 4 cities with visible red tint are in Texas (San Antonio, Dallas, and Houston). Overall, the maps may suggest small area variations in practice patterns [27]. Although California and Florida have large populations, none of their major population centers reached the high-intensity scale of major cities in Texas. Furthermore, the Eastern seaboard's increasing intensity suggests that something has changed. The questions then become are these changes in demand forecastable and how might they be explained.

Number of Claims
Being able to forecast demand is necessary for decision makers to investigate both supply-and demand-side interventions. To that end, HTS for state, census bureau region, and the nation using both ETS and ARIMA components were built on 2012-2015 training dataset and compared with the 2016-2017 test set using the hts package in R [16]. Bottom-up, top-down, middle-out, and combination approaches to this forecasting were analyzed.
The ETS models performed better on the test set in terms of both variance and bias as shown (Table 6), and the middle-out model performed better on all bias (mean error and mean percentage error) as well as variance (root mean squared error, mean absolute error, and mean absolute percentage error) metrics. The overall forecast from the ETS middle-out model for the unobserved years {2016, 2017} was {454,720.3, 482,049.9}, whereas the actual overall claims were {464,323, 497,325}, resulting in mean absolute percent error (MAPE) of {2.0%, 3.1%}. Table 7 illustrates the forecast and actual number of claims at the region-level hierarchy for the best performing model, whereas Table 8 provides the state-by-state forecasts.  HTS with the middle-out approach and ETS methods was refit on the entire dataset to generate forecasts. Figure 7

Explanatory Modeling Results
To investigate explanatory variables, several models were explored. Stepwise regression for the number of claims at the hospital level using the transformed variables and an 80% training set was successfully able to predict the number of claims on the withheld test set with some accuracy (adjusted R 2 =0.39 on the training set and adjusted R 2 =0.38 on the test set). This indicates that the sum of squared regression accounted for 38% of the variance of the sum of squared total on the test set. Payments and charges were excluded from the analysis as they are necessarily functions of claims. The variables evaluated were the number of staffed beds, discharges, surgeries, net patient revenue, net income, total assets, total liabilities, affiliated physicians, employees, percentage Medicare/Medicaid, state, year, urban/rural status, ownership, medical school status, and hospital type. Table 9 provides the remaining variables generated from the stepwise regression at the hospital unit of analysis. It should be noted that obesity did not remain in the final model.
Stepwise regression for the number of claims with data aggregated (mean) by state and by year (N=306 observations, 51 states/territories × 6 years) resulted in an impressive model using an 80% training set to predict a 20% withhold set. The adjusted R 2 was 0.87 on the training set and 0.77 on the test set after dropping insignificant variables from the analysis. The variables in this model included state, year, number of discharges, and total liabilities (a parsimonious model; Table  10). Again, there is no evidence that obesity rates are predictive of CPT 63 surgery in this model. Lasso, ridge, and elastic net regression models were able to predict the unaggregated test set with some accuracy (R 2 =0.38, 0.37, 0.38, respectively.) None of these penalty-weighted models improved upon the stepwise analysis significantly, although elastic net tied. Obesity was not retained in these models. For the aggregated set (state and year), the associated R 2 were 0.78, 0.75, and 0.78, respectively. The lasso and elastic net models were slightly superior to the stepwise regression model ( Figure  5). The top 10 variables by effect size in the state-aggregated elastic net model are shown in Table 11. The effect size of obesity was near zero (0.0098). If one were to make a conclusion using traditional and constrained linear models, obesity would not be a factor for explaining the number of claims; however, random forests would prove otherwise.   Figure  8 is a plot of the gain (the average improvement when the feature is used in a tree) for the top 5 items in the importance matrix, whereas Figure 9 is a plot of the cover (the average proportion of samples affected by splitting using this feature) for the top 5 items of the unaggregated model. These figures illustrate that obesity is one of the prominent features in both gain and cover of the unaggregated model.  Despite the exceptional gains of the extreme gradient-boosted random forests on the unaggregated, hospital-level data, the application of hyperparameter-tuned models to the aggregated data (by state and year) yielded only nominal improvement over the constrained regression methods, possibly because of the smaller sample due to aggregation. A well-pruned model (depth=3) after 3000 epochs with a slow learning rate (0.1) achieved an R 2 of 0.80. The gain and cover graphs are shown in Figures 10 and 11, and obesity rate is the most important feature at the state-aggregated level.
Most importantly, the gradient-boosted random forests identified obesity as the second most important factor for gain at the hospital level and as the most important factor for both gain and cover at the state level of analysis. Furthermore, the gradient-boosted random forests performed better than any other model considered on a blinded test set.

Principal Findings
In this analysis, we evaluated the location, magnitude, and reasons for the growth of CPT 63 back surgeries in the United States. The GIS heat map analysis shows large-scale growth, particularly in the Northeastern region of the United States, and sustained activity in Texas. The entirety of the Eastern seaboard has seen growth in these procedures, and the associated increased cost is estimated to be US $323.9 million by the end of 2019.
The principal findings of this study are described here. Each of the following results includes a discussion of significance and (if appropriate) policy: 1. The Northeastern seaboard is likely to see continued growth in CPT 63 procedures. The implication for states in this region is that they may see more unplanned expenditures on health care, affecting their budgets. Furthermore, cost controls and reduction of practice variation based on evidence will become more important. 2. The cost associated with these procedures is outstripping inflation and will likely result in national expenditures in the triple-digit billions. The federal government may need to evaluate its own evidence-based, best practice policies associated with funding of procedures that link selected interventions with outcomes and that reasonably limit reimbursement. 3. Interstate practice variation appears to be extreme. For example, large population centers in California have fewer claims than large population centers in Texas. States should also investigate intrastate variation. 4. Hierarchical forecasts suggest an increase in the number of claims of 6.5% for 2018 and 13% in 2019. The initial models were built on a blind test set and performed well. These types of forecasts are reasonably effective for claims analysis. 5. Explanatory regression models for nation-level claims data had only some success in internal predictions. These models excluded obesity as a predictor. Regression models were more successful at predicting aggregated state/year models, though. These traditional models should be abandoned in favor of random forests. 6. Extreme gradient-boosted random forest models were highly successful in predicting both hospital-level unit of analysis number of claims and aggregate-level claims on an unobserved test set. These models identified obesity as an important factor in estimating the number of claims. Furthermore, the use of these models underscores that even after multivariate transformations, nonlinear functions may exist in modeled data. Random forests unearthed patterns not visible to regression and constrained regression models.

Limitations
There are many limitations in this work. First, the algorithms used by Definitive Healthcare to extrapolate CMS data to all-payor data are not divulged. This omission is problematic for verification but understandable because of parochial concerns. Second, only ETS and ARIMA models were considered for the HTS fitting as these models are implemented in the R HTS package. There are an infinite number of models for forecasting, including random forest time series that might have performed better. Third, the explanatory variables are limited to those tracked by CMS and the BRFSS.

Conclusions
Hospital-based back surgeries are likely to increase dramatically over the next several years, yet the supply of neurosurgeons is constant. With that increase, the cost of the procedures (mostly borne by third-party payers) will increase as well. Practice variation appears to be prevalent across the country; however, obesity itself is a factor that must be considered as a significant influence. Policy interventions must be considered at many levels.
Clinical practice variation is something that may require intervention at the federal level. For example, a study in Scandinavia found significant differences among Norway, Sweden, and Denmark in the use of concomitant arthrodesis without any difference in treatment efficacy, increasing the cost without improving outcomes [28]. Controlling costs across states may require federal (and state) reimbursement interventions and incentives.
States should continue educational and financial interventions targeting obesity in adults and children. As the obesity epidemic continues to grow, the medical intervention costs are likely to grow accordingly. Furthermore, states should evaluate county-to-county practice variation as these variations often increase cost without improving quality [27].
Local interventions should consider the targeting of food deserts (urban areas where fresh, quality food is difficult to find) for eradication as well as educational interventions. Several studies have shown that the food environment is directly linked to obesity [29][30][31]. Eliminating or at least reducing the number of food deserts requires incentivizing grocery stores to populate areas where it may not be as lucrative because of poverty or demand.
Insurance companies themselves have a vested interest in both reducing obesity and controlling practice variation. Obesity is linked to numerous health disorders such as heart disease, type 2 diabetes, and bone and joint disease [32], any of which may result in additional costs to the health care system and insurer. Funding prevention efforts and establishing policies to reduce practice area variation are likely to benefit them as well as the population health over time.
Federal, state, and local policy makers need to address the increasing obesity epidemic and the likely associated increase in demand for back surgeries. The implications of not doing so are increased cost, questionable quality/cost trade-offs, and reduced access because of the small and steady number of available neurosurgeons. The fattening of America and the costs associated with it are likely to continue increasing otherwise.

Conflicts of Interest
None declared.