Multiscale GWR using top down scale approaches

Ghislain Geniaux and Cesar Martinez

2025-02-18

Introduction

In this release, we introduce several estimation methods aimed at accommodating bandwidths that may vary for each variable and adapt according to location. Following the initial works of Yang et al (2011) for developing a Flexible Bandwidth approach for GWR, Yang (2014); Leong and Yue (2017); Lu et al (2017); Fotheringham et al (2017) (hereafter FT2017) made a significant contribution to this issue by allowing for the identification of different optimal bandwidths for each variable j, using jacobi iteration or backfitting algorithm. However existing implementation of the multiscale GWR using backfitting methods demonstrates sluggish performance and is only suitable for datasets containing fewer than 2500 observations and fewer than 6 explanatory variables. To achieve this for larger sample size, we rely on an innovative approach that departs from the optimization of a single optimal bandwidth and instead utilizes a decreasing series of bandwidths in a manner reminiscent of gradient boosting algorithms.

The gwr_multiscale function directly transcribes the backfitting procedure used in the gwr.multiscale of the GWmodel package. It leverages the mgwrsar estimation framework to reduce estimation time and enables the estimation of multiscale GWR with spatial dependence. The criterion used for finding optimal (univariate) bandwidth stopping criteria is based on AICc while RMSE is used as stopping criteria for the backfitting algorithm. For this method a demo is provided only for the analyses of in sample Betas estimates.

The tds_mgwr is an innovative approach for multiscale GWR that departs from the original backfitting procedure proposed by Fotheringham and al. (2017) by employing a set of decreasing bandwidths without fully optimizing the bandwidth at each iteration of the backfitting algorithm. It leverages the top-down scale method, employing a backfitting procedure only to preserve the favorable properties observed in the univariate case. This drastically reduces the number of estimations, and convergence is ensured by a test procedure conducted in parallel on a batch of ascending and descending bandwidths. Along the algorithm the full AICc is computed using Yu et al. (2020) proposition.

The atds_mgwr procedure constitutes a second step after the estimation of tds_mgwr, once the optimal bandwidths for each variable are known. It launches a backfitting on each variable to perform an univariate tds_gwr estimation to test if adapting the bandwidth based on location improves the estimations. (The tds_gwr model is an innovative top-down scale approach for univariate GWR, allowing adaptation of the bandwidth to the data’s location. It employs a gradient boosting framework instead of a backfitting procedure.)

Both tds_mgwr and atds_mgwr imply several \(n\times n\) matrix multiplications for computing the hat matrix, using Wu et al 2018 for backfitting part and Buhlmann and Hothorn (2007) for the gradient boosting part. For both cases we propose approximation method to speed up calculation.

Data example: simulated DGP with complex spatial patterns.

In the following example, the simu_multiscale allow to simulate the DGP proposed by Geniaux 2025 (Top down scale approaches for multiscale GWR and other semi-parametric partially linear varying coefficient regression with multiple locally-self-adaptive bandwidths, in prep) illustrated in the next figure.

library(mgwrsar)
n=1000
simu1=simu_multiscale(n,myseed = 5,config_snr=0.9,config_beta='default',config_eps='normal')
mydata=simu1$mydata
coords=simu1$coords
TRUEBETA<-as.matrix(mydata[,c('Beta1','Beta2','Beta3','Beta4')],ncol=4)
myformula<-as.formula('Y ~ X1 + X2 + X3')
GG2025 multiscale DGP
GG2025 multiscale DGP

Estimation of multiscale GWR like models.

GWR estimates as benchmark

With a GWR, we obtain for this simulated data a global bandwidth equal to 0.04339215 and a mean RMSE of the Beta coefficients equal to 0.8726403.

############
## GWR as a benshmark model (we use the whole dataset for estimation)
############
res1=golden_search_bandwidth(formula=myformula,H2=NULL,data = mydata,coords=coords, fixed_vars=NULL,kernels=c('gauss'), Model = 'GWR',control=list(verbose=F,NN=300,criterion='AICc',adaptive=F),lower.bound=0, upper.bound=1,tolerance=0.001)
res1$minimum # 0.04339215
## [1] 0.04339215
res1$objective #   4755.424
## [1] 4755.424
summary(res1$model@Betav)
##    Intercept            X1               X2                X3          
##  Min.   :-1.856   Min.   :-1.317   Min.   :-3.0094   Min.   :-3.92712  
##  1st Qu.: 1.167   1st Qu.: 1.116   1st Qu.:-0.5407   1st Qu.:-1.91987  
##  Median : 2.052   Median : 2.304   Median : 0.1700   Median :-0.19165  
##  Mean   : 2.075   Mean   : 2.330   Mean   : 0.2149   Mean   :-0.05694  
##  3rd Qu.: 2.856   3rd Qu.: 3.522   3rd Qu.: 0.9500   3rd Qu.: 1.72766  
##  Max.   : 6.890   Max.   : 6.092   Max.   : 3.2380   Max.   : 5.48828
### Beta RMSE
Beta_RMSE=apply((res1$model@Betav-TRUEBETA %>% as.data.frame()),2,rmse)
Beta_RMSE #0.5719872 0.6340170 1.0473393 1.2372179
##     Beta1     Beta2     Beta3     Beta4 
## 0.5719872 0.6340170 1.0473393 1.2372179
mean(Beta_RMSE) # 0.8726403
## [1] 0.8726403
res1$model@AICc # 4755.426
## [1] 4755.426

multiscale_gwr estimates

The multiscale_gwr function implements the backfitting algorithm proposed by Fotheringham et al. (2017). Key features of this implementation include:

While this implementation of multiscale GWR does not significantly reduce the bias of Beta coefficients compared to standard GWR, it highlights the challenges of selecting appropriate bandwidths for complex spatial patterns. For instance, the bandwidth chosen for \(\beta_4\) (1.359149) is too large, leading to suboptimal results.

############
## multiscale_gwr 
############

model_multiscale_gwr <-multiscale_gwr(formula=myformula,data=mydata,coords=coords,kernels='gauss',verbose=T,get_AICg=TRUE,control=list(SE=F,adaptive=F,NN=300,isgcv=FALSE,verbose=F,family=gaussian(),doMC=F, ncore=1),init='GWR',nstable=5,tolerance =0.0001)
## H0= 0.04345371 
## AICc= 4755.423  RMSE= 1.698892 
## 
##  backfitting   Intercept    X1    X2    X3  
##  iter  1  H  0.05462896 0.05796469 0.03096999 1.359149  RMSE= 3.045944  delta_rmse  0.414929  AICg= 5657.412  stable 1 1 1 1 
## 
##  backfitting   Intercept    X1    X2    X3  
##  iter  2  H  0.1214272 0.06336201 0.03926689 1.359149  RMSE= 3.007683  delta_rmse  0.01256141  AICg= 5400.491  stable 1 1 1 2 
## 
##  backfitting   Intercept    X1    X2    X3  
##  iter  3  H  0.131717 0.06479464 0.03917704 1.359149  RMSE= 3.015468  delta_rmse  -0.002588628  AICg= 5397.519  stable 1 1 1 3 
## 
##  backfitting   Intercept    X1    X2    X3  
##  iter  4  H  0.1337998 0.06502987 0.0391215 1.359149  RMSE= 3.016645  delta_rmse  -0.0003902467  AICg= 5397.2  stable 1 1 1 4 
## 
##  backfitting   Intercept    X1    X2    X3  
##  iter  5  H  0.1341804 0.06502987 0.0391215 1.359149  RMSE= 3.016867  delta_rmse  -7.342889e-05  AICg= 5397.149  stable 1 2 2 5 
## 
##  backfitting   Intercept    X1    X2    X3  
##  iter  6  H  0.134236 0.06502987 0.0391215 1.359149  RMSE= 3.016904  delta_rmse  -1.224577e-05  AICg= 5397.144  stable 1 3 3 6 
## 
##  backfitting   Intercept    X1    X2    X3  
##  iter  7  H  0.1342703 0.06502987 0.0391215 1.359149  RMSE= 3.01692  delta_rmse  -5.271203e-06  AICg= 5397.142  stable 1 4 4 7 
## 
##  backfitting   Intercept    X1    X2    X3  
##  iter  8  H  0.1342703 0.06502987 0.0391215 1.359149  RMSE= 3.016921  delta_rmse  -3.256739e-07  AICg= 5397.142  stable 2 5 5 8 
## 
##  backfitting   Intercept    X1    X2    X3  
##  iter  9  H  0.1342703 0.06502987 0.0391215 1.359149  RMSE= 3.016921  delta_rmse  -3.307275e-08  AICg= 5397.142  stable 3 6 6 9 
## 
##  backfitting   Intercept    X1    X2    X3  
##  iter  10  H  0.1342703 0.06502987 0.0391215 1.359149  RMSE= 3.016921  delta_rmse  -3.812308e-09  AICg= 5397.142  stable 4 7 7 10 
## 
##  backfitting   Intercept    X1    X2    X3  
##  iter  11  H  0.1342703 0.06502987 0.0391215 1.359149  RMSE= 3.016921  delta_rmse  -4.792044e-10  AICg= 5397.142  stable 5 8 8 11 
## Time =  257.014
summary(model_multiscale_gwr)
## Call:
## MGWRSAR(formula = myformula, data = data, coords = coords, fixed_vars = NULL, 
##     kernels = kernels, H = c(h, H2), Model = "multiscale_gwr", 
##     control = control)
## Model: multiscale_gwr 
## Method for spatial autocorrelation: 2SLS 
## Kernels function: gauss 
## Kernels adaptive: NO 
## Kernels type: GD 
## Bandwidth: 0.1342703 0.06502987 0.0391215 1.359149 
## Computation time: 257.014 
## Use of parallel computing: FALSE  ncore = 1 
## Use of rough kernel: YES, 300 neighbors / 1000 
## Use of Target Points: NO 
## Number of data points: 1000 
## Varying parameters: X3 
##         Intercept        X1        X2      X3
## Min.     0.092348 -1.468016 -4.121051 -0.8917
## 1st Qu.  1.403305  1.329966 -0.720194 -0.2933
## Median   2.218123  2.284794  0.260265 -0.1072
## Mean     2.161043  2.343896  0.195910 -0.0408
## 3rd Qu.  2.847813  3.488064  0.981496  0.3363
## Max.     4.485215  5.771464  4.081508  0.7085
## Effective degrees of freedom: 996.5876 
## AICc: 5397.142 
## Residual sum of squares: 9101.81 
## RMSE: 3.016921
# Computation time: 211.175 with macbookpro M1

### Beta RMSE
Beta_RMSE=apply((model_multiscale_gwr@Betav-TRUEBETA %>% as.data.frame()),2,rmse)
Beta_RMSE # 0.2915894 0.6517318 1.0764612 2.8593433 
##     Beta1     Beta2     Beta3     Beta4 
## 0.2915045 0.6529376 1.0763655 2.8593369
mean(Beta_RMSE) #  1.220036
## [1] 1.220036
model_multiscale_gwr@AICc # 5397.142
## [1] 5397.142

tds_mgwr estimates

The tds_mgwr algorithm is designed to achieve the same objectives as multiscale GWR, specifically to identify a unique bandwidth for each variable. The model is defined as:

\(y_{i}=\sum_{k=1}^{K}\beta_{k}(u_{i},v_{i};K(h_k))x_{ij}+\epsilon_{i}\)

Key Features:

  • Bandwidth Sequence: The algorithm uses a geometrically decreasing sequence of bandwidths (default: 30 values) during the backfitting process. The starting bandwidth is the largest, corresponding to an Ordinary Least Squares (OLS) estimate.
  • Bandwidth Selection: At each step \(m\) of the backfitting algorithm, the bandwidth \(h_m\) is adjusted (typically reduced). The algorithm evaluates whether to retain \(h_m\) or revert to previous value based on the AICc criterion.
  • Efficiency: Instead of performing a full optimization at each iteration, the algorithm computes the AICc for up to five bandwidth values in parallel:
    • The most recent optimal bandwidth for the variable.
    • The preceding bandwidth value (upper bandwidth \(h_{m-1}\)).
    • The bandwidth \(h_m\).
    • The subsequent bandwidth value (lower bandwidth \(h_{m+1}\)).
    • The lowest bandwidth across all variables.
############
### tds_mgwr 
############
# model estimate
model_tds_mgwr=tds_mgwr(formula=myformula,Model='tds_mgwr',data=mydata,coords=coords,kernels='gauss',fixed_vars=NULL,H2=NULL,control_tds=list(nns=30,get_AIC=T),control=list(adaptive=F,doMC=F,ncore=1))

# Estimation can be faster using get_AIC=F

### Beta RMSE
Beta_RMSE=apply((model_tds_mgwr@Betav-TRUEBETA %>% as.data.frame()),2,rmse)
Beta_RMSE # 0.1866531 0.4445685 0.8608283 0.9585331
##     Beta1     Beta2     Beta3     Beta4 
## 0.1866531 0.4445685 0.8608283 0.9585331
mean(Beta_RMSE) # 0.6126457
## [1] 0.6126457
model_tds_mgwr@AICc # 4547.775 
##          
## 4547.775

atds_mgwr estimates

The atds_mgwr algorithm extends tds_mgwr by incorporating location-specific bandwidths, enabling local adaptation and improved estimation accuracy. The model is defined as:

\(y_{i}=\sum_{k=1}^{K}\beta_{k}(u_{i},v_{i};K(h_k(u_i,v_i)))x_{ij}+\epsilon_{i}\)

The atds_mgwr algorithm integrates a backfitting approach with a univariate GWR gradient boosting algorithm. Unlike tds_mgwr, which applies GWR at each iteration of the backfitting process, atds_mgwr employs a gradient boosting GWR that follows a sequence of decreasing bandwidths (using the same sequence as in tds_mgwr). For further details, see Geniaux (2025).

Key Features:

  • Initialization: The algorithm uses the tds_mgwr estimates as the starting model (model_stage1=model_tds_mgwr).
  • Gradient Boosting: Instead of a standard GWR, the algorithm atds_mgwr employs a gradient boosting GWR with a sequence of decreasing bandwidths. This approach improves model flexibility and accuracy.
  • Stopping Criterion: Due to the limitations of using relative RMSE in gradient boosting, the algorithm relies on the global AICc proposed by Yu and Fotheringham (2020). This requires estimating the hat matrix at each boosting iteration using the approximation by Bühlmann and Hothorn (2007).
  • Computational Considerations: The computation of the hat matrix involves \(n \times n\) matrix storage and multiplication, which can be time-intensive. To mitigate this, the number of backfitting rounds is limited (e.g., 3 rounds in this example).

The atds_mgwr algorithm significantly improves estimation accuracy compared to tds_mgwr:

  • The mean RMSE of \(\beta\) decreases from 0.6163591 (tds_mgwr) to 0.4814645 (atds_mgwr first round) and further to 0.4556188 (atds_mgwr second round). -This corresponds to a reduction of 22% and 26%, respectively.
############
## atds_mgwr 
############

model_atds_mgwr=tds_mgwr(formula=myformula,Model='atds_mgwr',data=mydata,coords=coords,kernels='gauss',fixed_vars=NULL,control_tds=list(nns=30,model_stage1=model_tds_mgwr),control=list(adaptive=F,verbose=FALSE,isgcv =FALSE,Type ='GD',doMC=F,ncore=1))
summary(model_atds_mgwr)
## Call:
## tds_mgwr(formula = myformula, data = mydata, coords = coords, 
##     Model = "atds_mgwr", kernels = "gauss", fixed_vars = NULL, 
##     control_tds = list(nns = 30, model_stage1 = model_tds_mgwr), 
##     control = list(adaptive = F, verbose = FALSE, isgcv = FALSE, 
##         Type = "GD", doMC = F, ncore = 1))
## Model: atds_mgwr 
## Method for spatial autocorrelation:  
## Kernels function: gauss 
## Kernels adaptive: NO 
## Kernels type: GD 
## Bandwidth: 0.3218526 0.1437018 0.04387009 0.04730317 
## Computation time: 263.314 
## Use of parallel computing: FALSE  ncore = 1 
## Use of rough kernel: NO 
## Use of Target Points: NO 
## Number of data points: 1000 
## Varying parameters: Intercept X1 X2 X3 
##         Intercept       X1       X2      X3
## Min.     -0.65501 -0.92293 -4.32107 -5.6627
## 1st Qu.   1.19592  1.06110 -0.87681 -2.6450
## Median    2.06740  2.19465  0.18820 -0.5112
## Mean      2.05897  2.32279  0.24090 -0.1311
## 3rd Qu.   2.88027  3.56081  1.17471  2.3528
## Max.      4.98107  5.41300  7.74459  6.9789
## Effective degrees of freedom: 747.5733 
## AICc: 4222.661 
## Residual sum of squares:  
## RMSE: 1.422596
### Beta RMSE
Beta_RMSE=apply((model_atds_mgwr@Betav-TRUEBETA %>% as.data.frame()),2,rmse)
Beta_RMSE # nrounds=3 0.1016979 0.2814159 0.7942419 0.6162484;
##      Beta1      Beta2      Beta3      Beta4 
## 0.08675225 0.27901447 0.80444136 0.63377897
mean(Beta_RMSE) # nrounds=1 0.4823437 ; nrounds=2 0.4559616 ; nrounds=3  0.448401 
## [1] 0.4509968
model_atds_mgwr@AICc # rounds=1  4239.097
## [1] 4222.661

Even though considering specific bandwidth for each Beta, the multiscale_gwr algorithm performs poorly for estimating \(Beta_4\). In this simulation, the estimations of \(Beta_4\) are significantly improved using tds_mgwr, and even more so with atds_mgwr.

The following table compare rmse for Betas estimates using GWR and multiscale GWR like estimators of mgwrsar R package on a single simulation of the DGP proposed by Geniaux (2025).

\[\beta_1\] \[\beta_2\] \[\beta_3\] \[\beta_4\]
GWR 0.5719872 0.6340170 1.0473393 1.2372179
multiscale_gwr 0.2915894 0.6517318 1.0764612 2.8593433
TDS_MGWR 0.1866531 0.4445685 0.8608283 0.9585331
ATDS_MGWR 0.1437748 0.3569051 0.8220955 0.6234205

Out sample predictions comparisons

### PREDICTION

# build in sample and out sample data
# folds
length_out=800
index_in=sample(1:1000,length_out)
index_out=(1:1000)[-index_in]
# out sample data
newdata=mydata[index_out,]
newdata_coords=coords[index_out,]

## GWR model
res14=golden_search_bandwidth(formula=myformula,H2=NULL,data = mydata[index_in,],coords=coords[index_in,], fixed_vars=NULL,kernels=c('gauss'), Model = 'GWR',control=list(verbose=F,NN=300,criterion='AICc',adaptive=F),lower.bound=0, upper.bound=1,tolerance=0.001)

### three methods of predictions with GWR are now tested
# surprisingly, the target point method leads to outsample 
#estimates and predictions better than methods 'shepard' and 'model'

## outsample prediction with the GWR model with target points
res_pred1=predict(res14$model, newdata=newdata, newdata_coords=newdata_coords,method_pred='TP',beta_proj=TRUE)
Y_pred=res_pred1$Y_predicted
head(mydata$Y[index_out])
## [1] -0.7209412  7.2004245 -1.1266273  8.4334616 -2.4885930 -1.5429266
# RMSE for predicted Y on the out sample fold
gwr_out_rmse = sqrt(mean((mydata$Y[index_out]-Y_pred)^2))
sqrt(mean((mydata$Y[index_out]-Y_pred)^2)) #  2.831865
## [1] 2.907076
# in sample RMSE for predicted Y 
res14$model@RMSE # 1.780888
## [1] 1.747453
# out sample RMSE for extrapolated beta coefficients
Beta_RMSE_outsample_gwr=apply((res_pred1$Beta_proj_out-TRUEBETA[index_out,] %>% as.data.frame()),2,rmse)
mean(Beta_RMSE_outsample_gwr) #  1.009409
## [1] 1.096563
# in sample RMSE for beta coefficients
Beta_RMSE_insample_gwr=apply((res14$model@Betav-TRUEBETA[index_in,] %>% as.data.frame()),2,rmse)
Beta_RMSE_insample_gwr
##     Beta1     Beta2     Beta3     Beta4 
## 0.6210909 0.7177753 1.1229993 1.3723732
#    Beta1     Beta2     Beta3     Beta4 
# 0.6129079 0.6707307 1.0983044 1.4200666 
# -> Beta 4  (and Beta 3) spatial structure are not well capture by the GWR
# (large RMSE)
mean(Beta_RMSE_insample_gwr) # 0.9505024
## [1] 0.9585597
## Predictions with the GWR model (method_pred='shepard')
res_pred3=predict(res14$model, newdata=newdata, newdata_coords=newdata_coords,method_pred='shepard',k_extra=8,beta_proj=TRUE)
Y_pred=res_pred3$Y_predicted
head(mydata$Y[index_out])
## [1] -0.7209412  7.2004245 -1.1266273  8.4334616 -2.4885930 -1.5429266
sqrt(mean((mydata$Y[index_out]-Y_pred)^2)) # 2.868228
## [1] 2.897205
res14$model@RMSE # 1.780888
## [1] 1.747453
Beta_RMSE_outsample=apply((res_pred3$Beta_proj_out-TRUEBETA[index_out,] %>% as.data.frame()),2,rmse)
mean(Beta_RMSE_outsample) #  1.021705 
## [1] 1.095181
mean(Beta_RMSE_insample_gwr) # 0.9505024
## [1] 0.9585597
## Predictions with the GWR model (method_pred='model')
res_pred4=predict(res14$model, newdata=newdata, newdata_coords=newdata_coords,method_pred='model',k_extra=5,beta_proj=TRUE)
Y_pred=res_pred4$Y_predicted
head(mydata$Y[index_out])
## [1] -0.7209412  7.2004245 -1.1266273  8.4334616 -2.4885930 -1.5429266
sqrt(mean((mydata$Y[index_out]-Y_pred)^2)) # 3.133317
## [1] 3.043557
res14$model@RMSE # 1.780888
## [1] 1.747453
Beta_RMSE_outsample=apply((res_pred4$Beta_proj_out-TRUEBETA[index_out,] %>% as.data.frame()),2,rmse)
mean(Beta_RMSE_outsample) #  1.066749
## [1] 1.11062
mean(Beta_RMSE_insample_gwr) #  0.9505024
## [1] 0.9585597
##############################################
### Algorithm Top down scale MGWR (tds_mgwr)
# model calibration
model_tds_mgwr_in=tds_mgwr(formula=myformula,Model='tds_mgwr',data=mydata[index_in,],coords=coords[index_in,],kernels='gauss',fixed_vars=NULL,control_tds=list(nns=30,get_AIC=T),control=list(adaptive=F,verbose=FALSE,isgcv =FALSE,Type ='GD',doMC=F,ncore=1))

summary(model_tds_mgwr_in)
## Call:
## tds_mgwr(formula = myformula, data = mydata[index_in, ], coords = coords[index_in, 
##     ], Model = "tds_mgwr", kernels = "gauss", fixed_vars = NULL, 
##     control_tds = list(nns = 30, get_AIC = T), control = list(adaptive = F, 
##         verbose = FALSE, isgcv = FALSE, Type = "GD", doMC = F, 
##         ncore = 1))
## Model: tds_mgwr 
## Method for spatial autocorrelation:  
## Kernels function: gauss 
## Kernels adaptive: NO 
## Kernels type: GD 
## Bandwidth: 0.07310735 0.0570755 0.03317522 0.03317522 
## Computation time: 102.001 
## Use of parallel computing: FALSE  ncore = 1 
## Use of rough kernel: NO 
## Use of Target Points: NO 
## Number of data points: 800 
## Varying parameters: Intercept X1 X2 X3 
##         Intercept       X1       X2      X3
## Min.     -1.10029 -0.25180 -3.11937 -4.4422
## 1st Qu.   1.17345  1.27204 -0.73879 -2.2401
## Median    2.34015  2.34312  0.17149 -0.2979
## Mean      2.13598  2.44952  0.19176 -0.1209
## 3rd Qu.   2.94012  3.66287  1.06636  2.0009
## Max.      5.15849  5.63382  4.58067  5.4436
## Effective degrees of freedom: 513.4306 
## AICc: 3721.427 
## Residual sum of squares:  
## RMSE: 1.414451
# prediction on outsample data, beta coefficients extrapolation using (method_pred='tWtp_model')
res_pred2=predict(model_tds_mgwr_in, newdata=newdata, newdata_coords=newdata_coords,method_pred='tWtp_model',beta_proj=TRUE)

# predicted values for the dependent variable
Y_pred=res_pred2$Y_predicted
# RMSE for out sample predictions
sqrt(mean((mydata$Y[index_out]-Y_pred)^2)) #  2.565149 (# for GWR 2.831865)
## [1] 2.587307
# RMSE for in sample predictions
model_tds_mgwr_in@RMSE # 1.418821 (# 1.780888 for GWR)
## [1] 1.414451
# note that tds_mgwr improves predictions RMSE comparing to GWR 

## RMSE on coefficients estimates
# out sample
Beta_RMSE_outsample_tds_mgwr=apply((res_pred2$Beta_proj_out-TRUEBETA[index_out,] %>% as.data.frame()),2,rmse)
# outsample rmse for each coefficient
Beta_RMSE_outsample_tds_mgwr
##     Beta1     Beta2     Beta3     Beta4 
## 0.2977538 0.4905907 1.1510240 1.4276576
            #Beta1     Beta2     Beta3     Beta4 
# TDS-MGWR 0.2030172 0.4936683 1.0147774 1.3803936 
# GWR      0.6129079 0.6707307 1.0983044 1.4200666 

## TDS-MGWR model improves outsample beta estimates espescially for
# Beta_1 and Beta_2. For Beta_4 the improvement is small.

# average RMSE
mean(Beta_RMSE_outsample_tds_mgwr) #  0.7729641 (GWR 1.009409)
## [1] 0.8417565
# in sample
Beta_RMSE_insample_tds_mgwr=apply((model_tds_mgwr_in@Betav-TRUEBETA[index_in,] %>% as.data.frame()),2,rmse)
Beta_RMSE_insample_tds_mgwr
##     Beta1     Beta2     Beta3     Beta4 
## 0.2874578 0.4825926 0.9396987 1.0851012
mean(Beta_RMSE_insample_tds_mgwr) #  0.6784951 (GWR 0.9505024)
## [1] 0.6987126
## prediction with (method_pred='shepard')
# prediction performances close to the ones obtained with (method_pred='tWtp_model')
res_pred6=predict(model_tds_mgwr_in, newdata=newdata, newdata_coords=newdata_coords,method_pred='shepard',k_extra=5,beta_proj=TRUE)
Y_pred=res_pred6$Y_predicted
sqrt(mean((mydata$Y[index_out]-Y_pred)^2)) # 2.617877 
## [1] 2.584122
model_tds_mgwr_in@RMSE # 1.418821
## [1] 1.414451
Beta_RMSE_outsample_tds_mgwr=apply((res_pred6$Beta_proj_out-TRUEBETA[index_out,] %>% as.data.frame()),2,rmse)
mean(Beta_RMSE_outsample_tds_mgwr) #  0.7874575
## [1] 0.8543411
mean(Beta_RMSE_insample_tds_mgwr) # 0.6784951
## [1] 0.6987126
###################################################################
###################################################################
### atds_mgwr algorithm

# model estimate using in sample data, Gaussian non adaptive kernel
model_atds_mgwr_in=tds_mgwr(formula=myformula,Model='atds_mgwr',data=mydata[index_in,],coords=coords[index_in,],kernels='gauss',fixed_vars=NULL,control_tds=list(nns=30,get_AIC=T),control=list(adaptive=F,verbose=FALSE,isgcv =FALSE,Type ='GD',doMC=F,ncore=1))

summary(model_atds_mgwr_in)
## Call:
## tds_mgwr(formula = myformula, data = mydata[index_in, ], coords = coords[index_in, 
##     ], Model = "atds_mgwr", kernels = "gauss", fixed_vars = NULL, 
##     control_tds = list(nns = 30, get_AIC = T), control = list(adaptive = F, 
##         verbose = FALSE, isgcv = FALSE, Type = "GD", doMC = F, 
##         ncore = 1))
## Model: atds_mgwr 
## Method for spatial autocorrelation:  
## Kernels function: gauss 
## Kernels adaptive: NO 
## Kernels type: GD 
## Bandwidth: 0.2532088 0.1367192 0.0432803 0.0522608 
## Computation time: 298.204 
## Use of parallel computing: FALSE  ncore = 1 
## Use of rough kernel: NO 
## Use of Target Points: NO 
## Number of data points: 800 
## Varying parameters: Intercept X1 X2 X3 
##         Intercept       X1       X2      X3
## Min.     -0.93601 -1.02440 -4.16357 -5.2688
## 1st Qu.   1.17042  1.13717 -0.93081 -2.5715
## Median    2.17956  2.33458  0.12425 -0.5127
## Mean      2.10036  2.40935  0.22340 -0.1595
## 3rd Qu.   3.00911  3.67388  1.29080  2.2497
## Max.      5.02085  5.37713 11.29336  7.3030
## Effective degrees of freedom: 574.2321 
## AICc: 3446.372 
## Residual sum of squares:  
## RMSE: 1.403203
# prediction on out sample data, method_pred='tWtp_model' 
res_pred7=predict(model_atds_mgwr_in, newdata=newdata, newdata_coords=newdata_coords,method_pred='tWtp_model',beta_proj=TRUE)
Y_pred=res_pred7$Y_predicted
# RMSE for predicted Y on the out sample fold
sqrt(mean((mydata$Y[index_out]-Y_pred)^2)) 
## [1] 2.43769
# ATDS-MGWR 2.445863 # TDS-MGWR 2.565149  # GWR 2.831865)
# RMSE for insample predicted Y
model_atds_mgwr_in@RMSE # 1.380074
## [1] 1.403203
## beta estimates
RMSE_outsample=apply((res_pred7$Beta_proj_out-TRUEBETA[index_out,] %>% as.data.frame()),2,rmse)
mean(RMSE_outsample) # 0.7191038
## [1] 0.778225
RMSE_insample_atds=apply((model_atds_mgwr_in@Betav-TRUEBETA[index_in,] %>% as.data.frame()),2,rmse)
mean(RMSE_insample_atds) # 0.5305797
## [1] 0.5684047
## predictions with method_pred='shepard': 
# in this example this methods leads to outsample RMSE better than 'tWtp_model'

res_pred8=predict(model_atds_mgwr_in, newdata=newdata, newdata_coords=newdata_coords,method_pred='shepard',k_extra=8,beta_proj=TRUE)
# predictions
Y_pred=res_pred8$Y_predicted
sqrt(mean((mydata$Y[index_out]-Y_pred)^2)) #   2.372426
## [1] 2.328874
model_atds_mgwr_in@RMSE # 1.380074
## [1] 1.403203
# RMSE for beta estimates
RMSE_outsample_atds=apply((res_pred8$Beta_proj_out-TRUEBETA[index_out,] %>% as.data.frame()),2,rmse)
RMSE_outsample_atds
##     Beta1     Beta2     Beta3     Beta4 
## 0.1700092 0.3449231 1.0705712 1.1302407
            #Beta1     Beta2     Beta3     Beta4 
# ARTS-MGWR 0.1265100 0.3675659 1.0143203 1.0776401 
# TDS-MGWR  0.2030172 0.4936683 1.0147774 1.3803936 
# GWR       0.6129079 0.6707307 1.0983044 1.4200666 

                     # ATDS-MGWR   # TDS-MGWR  # GWR
mean(RMSE_outsample_atds) #  0.6465091  # 0.7729641 # 1.009409
## [1] 0.678936
mean(RMSE_insample_atds)  #  0.5305797  # 0.6784951 # 0.9505024
## [1] 0.5684047
#the method atds\_MGWR allows to significantly improve $\Beta$ estimates, both for in sample and out sample data.

The results for out-of-sample prediction of the dependent variable demonstrate that both tds_mgwr and atds_mgwr significantly improve prediction accuracy compared to standard GWR estimates. The Root Mean Squared Prediction Error (RMSPE) values are as follows:

While atds_mgwr achieves the lowest RMSPE, the improvement in out-of-sample prediction accuracy between atds_mgwr and tds_mgwr is less pronounced than the improvement in the RMSE of the coefficients.

Limitations of GWR

In this simulation, the GWR model, which employs a single bandwidth, fails to adequately capture the spatial structure of the \(\beta\) coefficients. Despite relatively low prediction errors, the estimation errors are significantly higher, indicating potential misspecification issues in the GWR model.

Improvements with tds_mgwr

The tds_mgwr method addresses this limitation by assigning a unique bandwidth to each covariate, leading to substantial improvements in the estimation of \(\beta\) coefficients. This is particularly evident for \(\beta_1\) and \(\beta_2\), where the model achieves significantly better accuracy.

Further Enhancements with atds_mgwr

Building on the tds_mgwr model, atds_mgwr introduces location-specific bandwidths, enabling further improvements for \(\beta_1\) and \(\beta_2\). Additionally, atds_mgwr achieves a notable improvement for \(\beta_4\) by applying a sequence of bandwidths that better capture the spatial pattern of this coefficient.

References:

mirror server hosted at Truenetwork, Russian Federation.