If you have yet not read the “Start Here” vignette, please do so by running
vignette("start-here", "spsurvey")spsurvey’s analysis functions are used to analyze data in a variety
of contexts. We focus mainly on using the NLA_PNW analysis
data to introduce some of these analysis functions. The
NLA_PNW data contains response variables measured at
several lakes in the Pacific Northwest Region (PNW) of the United
States. There are several variables in NLA_PNW you will use
throughout this vignette:
SITE_ID: a unique site identifierWEIGHT: the design weightsURBAN: urban land categories (Urban and
Non-Urban)STATE: state name (California,
Oregon, and Washington)BMMI: benthic macroinvertebrate multi-metric indexBMMI_COND: benthic macroinvertebrate multi-metric index
condition categories (Poor and Good)PHOS_COND: phosphorous condition categories
(Poor and Good)NITR_COND: nitrogen condition categories
(Poor Fair, and Good)Before proceeding, we load spsurvey by running
library(spsurvey)Categorical variables are analyzed in spsurvey using the
cat_analysis() function. The cat_analysis()
function estimates the proportion of observations and the total units
(i.e. extent) that belong to each level of the categorical variable
(total units refer to the total number (point resources), total line
length (linear network), or total area (areal network)). Several useful
pieces of information are returned by cat_analysis(),
including estimates, standard errors, margins of error, and confidence
intervals. The analysis results contain columns with a .P
and .U suffixes. The .P suffix corresponds to
estimates of proportions for each category, while the .U
suffix corresponds to estimates of total units (i.e. extent) for each
category.
To estimate the proportion of total lakes in each nitrogen condition category and the total number of lakes in each nitrogen condition category, run
cat_ests <- cat_analysis(
  NLA_PNW,
  siteID = "SITE_ID",
  vars = "NITR_COND",
  weight = "WEIGHT"
)
cat_ests
#>        Type Subpopulation Indicator Category nResp Estimate.P StdError.P
#> 1 All_Sites     All Sites NITR_COND     Fair    24   23.69392   6.194024
#> 2 All_Sites     All Sites NITR_COND     Good    38   51.35111   7.430172
#> 3 All_Sites     All Sites NITR_COND     Poor    34   24.95496   5.919180
#> 4 All_Sites     All Sites NITR_COND    Total    96  100.00000   0.000000
#>   MarginofError.P LCB95Pct.P UCB95Pct.P Estimate.U StdError.U MarginofError.U
#> 1        12.14006   11.55386   35.83399   2530.428   693.5593        1359.351
#> 2        14.56287   36.78824   65.91398   5484.120  1223.3708        2397.763
#> 3        11.60138   13.35359   36.55634   2665.103   658.0966        1289.846
#> 4         0.00000  100.00000  100.00000  10679.652  1416.2707        2775.840
#>   LCB95Pct.U UCB95Pct.U
#> 1   1171.077   3889.780
#> 2   3086.357   7881.883
#> 3   1375.258   3954.949
#> 4   7903.812  13455.491The estimate of the proportion of lakes in Good
condition is 51.35% with a 95% confidence interval of (36.8%, 65.9%),
while the estimate of the total number of lakes in Good
condition is 5484 lakes with a 95% confidence interval of (3086, 7882).
In each case, the estimated standard error and margin of error is given.
The confidence level can be changed using the conf
argument. If more than one categorical variable is of interest, then
vars can be a vector of variables and separate analyses are
performed for each variable.
Sometimes the goal is to estimate proportions and totals separately
for different subsets of the population – these subsets are called
subpopulations. To estimate the proportion of total lakes and in each
nitrogen condition category the total number of lakes in each nitrogen
condition category separately for California,
Oregon, and Washington lakes, run
cat_ests_sp <- cat_analysis(
  NLA_PNW,
  siteID = "SITE_ID",
  vars = "NITR_COND",
  weight = "WEIGHT",
  subpop = "STATE"
)
cat_ests_sp
#>     Type Subpopulation Indicator Category nResp Estimate.P StdError.P
#> 1  STATE    California NITR_COND     Fair     6   8.239162   4.712515
#> 2  STATE    California NITR_COND     Good     8  73.722638  13.359879
#> 3  STATE    California NITR_COND     Poor     5  18.038200  11.909638
#> 4  STATE    California NITR_COND    Total    19 100.000000   0.000000
#> 5  STATE        Oregon NITR_COND     Fair     8  27.152211   9.763817
#> 6  STATE        Oregon NITR_COND     Good    26  59.670307   9.717093
#> 7  STATE        Oregon NITR_COND     Poor    13  13.177483   4.901892
#> 8  STATE        Oregon NITR_COND    Total    47 100.000000   0.000000
#> 9  STATE    Washington NITR_COND     Fair    10  30.396139  11.938582
#> 10 STATE    Washington NITR_COND     Good     4  22.711979  11.878964
#> 11 STATE    Washington NITR_COND     Poor    16  46.891882  13.041148
#> 12 STATE    Washington NITR_COND    Total    30 100.000000   0.000000
#>    MarginofError.P LCB95Pct.P UCB95Pct.P Estimate.U StdError.U MarginofError.U
#> 1         9.236359   0.000000   17.47552   208.4605   87.63415        171.7598
#> 2        26.184881  47.537757   99.90752  1865.2692  940.12797       1842.6170
#> 3        23.342462   0.000000   41.38066   456.3876  299.29104        586.5997
#> 4         0.000000 100.000000  100.00000  2530.1172  978.65831       1918.1350
#> 5        19.136729   8.015482   46.28894  1298.8470  526.66732       1032.2490
#> 6        19.045152  40.625155   78.71546  2854.3752  674.02641       1321.0675
#> 7         9.607532   3.569950   22.78501   630.3551  198.49966        389.0522
#> 8         0.000000 100.000000  100.00000  4783.5773  706.53216       1384.7776
#> 9        23.399190   6.996949   53.79533  1023.1210  437.69351        857.8635
#> 10       23.282341   0.000000   45.99432   764.4755  453.48899        888.8221
#> 11       25.560181  21.331701   72.45206  1578.3606  556.63044       1090.9756
#> 12        0.000000 100.000000  100.00000  3365.9571  741.89397       1454.0855
#>    LCB95Pct.U UCB95Pct.U
#> 1    36.70069   380.2202
#> 2    22.65222  3707.8861
#> 3     0.00000  1042.9873
#> 4   611.98220  4448.2523
#> 5   266.59801  2331.0960
#> 6  1533.30774  4175.4427
#> 7   241.30288  1019.4072
#> 8  3398.79970  6168.3549
#> 9   165.25751  1880.9845
#> 10    0.00000  1653.2976
#> 11  487.38500  2669.3362
#> 12 1911.87165  4820.0426If more than one type of subpopulation is of interest, then
subpop can be a vector of subpopulation variables and
separate analyses are performed for each subpopulation. If both
vars and subpops are vectors, separate
analyses are performed for each variable and subpopulation
combination.
Analysis results for all sites (ignoring subpopulations) can be
presented alongside the subpopulation analysis results using the
All_Sites argument:
cat_ests_sp <- cat_analysis(
  NLA_PNW,
  siteID = "SITE_ID",
  vars = "NITR_COND",
  weight = "WEIGHT",
  subpop = "STATE",
  All_Sites = TRUE
)
cat_ests_sp
#>         Type Subpopulation Indicator Category nResp Estimate.P StdError.P
#> 1      STATE    California NITR_COND     Fair     6   8.239162   4.712515
#> 2      STATE    California NITR_COND     Good     8  73.722638  13.359879
#> 3      STATE    California NITR_COND     Poor     5  18.038200  11.909638
#> 4      STATE    California NITR_COND    Total    19 100.000000   0.000000
#> 5      STATE        Oregon NITR_COND     Fair     8  27.152211   9.763817
#> 6      STATE        Oregon NITR_COND     Good    26  59.670307   9.717093
#> 7      STATE        Oregon NITR_COND     Poor    13  13.177483   4.901892
#> 8      STATE        Oregon NITR_COND    Total    47 100.000000   0.000000
#> 9      STATE    Washington NITR_COND     Fair    10  30.396139  11.938582
#> 10     STATE    Washington NITR_COND     Good     4  22.711979  11.878964
#> 11     STATE    Washington NITR_COND     Poor    16  46.891882  13.041148
#> 12     STATE    Washington NITR_COND    Total    30 100.000000   0.000000
#> 13 All_Sites     All Sites NITR_COND     Fair    24  23.693923   6.194024
#> 14 All_Sites     All Sites NITR_COND     Good    38  51.351112   7.430172
#> 15 All_Sites     All Sites NITR_COND     Poor    34  24.954965   5.919180
#> 16 All_Sites     All Sites NITR_COND    Total    96 100.000000   0.000000
#>    MarginofError.P LCB95Pct.P UCB95Pct.P Estimate.U StdError.U MarginofError.U
#> 1         9.236359   0.000000   17.47552   208.4605   87.63415        171.7598
#> 2        26.184881  47.537757   99.90752  1865.2692  940.12797       1842.6170
#> 3        23.342462   0.000000   41.38066   456.3876  299.29104        586.5997
#> 4         0.000000 100.000000  100.00000  2530.1172  978.65831       1918.1350
#> 5        19.136729   8.015482   46.28894  1298.8470  526.66732       1032.2490
#> 6        19.045152  40.625155   78.71546  2854.3752  674.02641       1321.0675
#> 7         9.607532   3.569950   22.78501   630.3551  198.49966        389.0522
#> 8         0.000000 100.000000  100.00000  4783.5773  706.53216       1384.7776
#> 9        23.399190   6.996949   53.79533  1023.1210  437.69351        857.8635
#> 10       23.282341   0.000000   45.99432   764.4755  453.48899        888.8221
#> 11       25.560181  21.331701   72.45206  1578.3606  556.63044       1090.9756
#> 12        0.000000 100.000000  100.00000  3365.9571  741.89397       1454.0855
#> 13       12.140064  11.553860   35.83399  2530.4285  693.55933       1359.3513
#> 14       14.562870  36.788242   65.91398  5484.1199 1223.37078       2397.7627
#> 15       11.601379  13.353585   36.55634  2665.1033  658.09658       1289.8456
#> 16        0.000000 100.000000  100.00000 10679.6517 1416.27075       2775.8397
#>    LCB95Pct.U UCB95Pct.U
#> 1    36.70069   380.2202
#> 2    22.65222  3707.8861
#> 3     0.00000  1042.9873
#> 4   611.98220  4448.2523
#> 5   266.59801  2331.0960
#> 6  1533.30774  4175.4427
#> 7   241.30288  1019.4072
#> 8  3398.79970  6168.3549
#> 9   165.25751  1880.9845
#> 10    0.00000  1653.2976
#> 11  487.38500  2669.3362
#> 12 1911.87165  4820.0426
#> 13 1171.07716  3889.7798
#> 14 3086.35722  7881.8826
#> 15 1375.25770  3954.9489
#> 16 7903.81199 13455.4913To estimate the proportion of total lakes in each nitrogen condition
category and the total number of lakes in each nitrogen condition
category stratified by URBAN category (whether the lake is
classified as Urban or Non-Urban), run
strat_cat_ests <- cat_analysis(
  NLA_PNW,
  siteID = "SITE_ID",
  vars = "NITR_COND",
  weight = "WEIGHT",
  stratumID = "URBAN"
)
strat_cat_ests
#>        Type Subpopulation Indicator Category nResp Estimate.P StdError.P
#> 1 All_Sites     All Sites NITR_COND     Fair    24   23.69392   6.027083
#> 2 All_Sites     All Sites NITR_COND     Good    38   51.35111   7.472377
#> 3 All_Sites     All Sites NITR_COND     Poor    34   24.95496   5.882487
#> 4 All_Sites     All Sites NITR_COND    Total    96  100.00000   0.000000
#>   MarginofError.P LCB95Pct.P UCB95Pct.P Estimate.U StdError.U MarginofError.U
#> 1        11.81287   11.88106   35.50679   2530.428   683.8837        1340.387
#> 2        14.64559   36.70552   65.99670   5484.120  1229.9485        2410.655
#> 3        11.52946   13.42550   36.48443   2665.103   653.6357        1281.102
#> 4         0.00000  100.00000  100.00000  10679.652  1440.4796        2823.288
#>   LCB95Pct.U UCB95Pct.U
#> 1   1190.041   3870.816
#> 2   3073.465   7894.775
#> 3   1384.001   3946.206
#> 4   7856.363  13502.940To then compute these estimates separately for
California, Oregon, and
Washington, run
strat_cat_ests_sp <- cat_analysis(
  NLA_PNW,
  siteID = "SITE_ID",
  vars = "NITR_COND",
  weight = "WEIGHT",
  stratumID = "URBAN",
  subpop = "STATE"
)
strat_cat_ests_sp
#>     Type Subpopulation Indicator Category nResp Estimate.P StdError.P
#> 1  STATE    California NITR_COND     Fair     6   8.239162   5.481100
#> 2  STATE    California NITR_COND     Good     8  73.722638  13.653267
#> 3  STATE    California NITR_COND     Poor     5  18.038200  12.512366
#> 4  STATE    California NITR_COND    Total    19 100.000000   0.000000
#> 5  STATE        Oregon NITR_COND     Fair     8  27.152211   9.592356
#> 6  STATE        Oregon NITR_COND     Good    26  59.670307   9.845022
#> 7  STATE        Oregon NITR_COND     Poor    13  13.177483   5.325784
#> 8  STATE        Oregon NITR_COND    Total    47 100.000000   0.000000
#> 9  STATE    Washington NITR_COND     Fair    10  30.396139  11.383474
#> 10 STATE    Washington NITR_COND     Good     4  22.711979   9.039351
#> 11 STATE    Washington NITR_COND     Poor    16  46.891882  12.109215
#> 12 STATE    Washington NITR_COND    Total    30 100.000000   0.000000
#>    MarginofError.P LCB95Pct.P UCB95Pct.P Estimate.U StdError.U MarginofError.U
#> 1         10.74276   0.000000   18.98192   208.4605   87.91849        172.3171
#> 2         26.75991  46.962726  100.00000  1865.2692  953.91452       1869.6381
#> 3         24.52379   0.000000   42.56199   456.3876  306.37270        600.4795
#> 4          0.00000 100.000000  100.00000  2530.1172  994.94678       1950.0599
#> 5         18.80067   8.351538   45.95288  1298.8470  531.02949       1040.7987
#> 6         19.29589  40.374417   78.96620  2854.3752  685.86488       1344.2705
#> 7         10.43835   2.739137   23.61583   630.3551  177.79364        348.4691
#> 8          0.00000 100.000000  100.00000  4783.5773  737.53853       1445.5490
#> 9         22.31120   8.084941   52.70734  1023.1210  435.45383        853.4738
#> 10        17.71680   4.995177   40.42878   764.4755  433.94100        850.5087
#> 11        23.73363  23.158256   70.62551  1578.3606  554.91047       1087.6045
#> 12         0.00000 100.000000  100.00000  3365.9571  748.29764       1466.6364
#>    LCB95Pct.U UCB95Pct.U
#> 1     36.1434   380.7775
#> 2      0.0000  3734.9073
#> 3      0.0000  1056.8671
#> 4    580.0574  4480.1771
#> 5    258.0483  2339.6457
#> 6   1510.1048  4198.6457
#> 7    281.8859   978.8242
#> 8   3338.0283  6229.1262
#> 9    169.6472  1876.5948
#> 10     0.0000  1614.9842
#> 11   490.7561  2665.9652
#> 12  1899.3207  4832.5935Continuous variables are analyzed in spsurvey using the
cont_analysis() function. The cont_analysis()
function estimates cumulative distribution functions (CDFs),
percentiles, and means of continuous variables. By default, all these
quantities are estimated (though this can be changed using the
statsitics argument to cont_analysis()). For
the quantities requiring estimation, several useful pieces of
information are returned by cont_analysis(), including
estimates, standard errors, margins of error, and confidence intervals.
The .P suffix corresponds to estimates of proportions for
each variable, while the .U suffix corresponds to estimates
of total units (i.e. extent) for each variable.
To estimate the cumulative distribution function (CDF), certain
percentiles, and means of BMMI, run
cont_ests <- cont_analysis(
  NLA_PNW,
  siteID = "SITE_ID",
  vars = "BMMI",
  weight = "WEIGHT"
)To view the analysis results for the mean estimates, run
cont_ests$Mean
#>        Type Subpopulation Indicator nResp Estimate StdError MarginofError
#> 1 All_Sites     All Sites      BMMI    96 56.50929 1.782278        3.4932
#>   LCB95Pct UCB95Pct
#> 1 53.01609 60.00249Similarly, the CDF and select percentile estimates can be viewed (the output is omitted here) by running
cont_ests$CDF
cont_ests$PctTo visualize the CDF estimates, run
plot(cont_ests$CDF)The solid line indicates the CDF estimates, and the dashed lines
indicate lower and upper 95% confidence interval bounds for the CDF
estimates. cdf_plot() can equivalently be used in place of
plot() (cdf_plot() is currently maintained for
backwards compatibility with previous spsurvey versions).
To estimate the CDF, certain percentiles, and means of
BMMI separately for each state, run
cont_ests_sp <- cont_analysis(
  NLA_PNW,
  siteID = "SITE_ID",
  vars = "BMMI",
  weight = "WEIGHT",
  subpop = "STATE"
)To view the analysis results for the mean estimates, run
cont_ests_sp$Mean
#>    Type Subpopulation Indicator nResp Estimate StdError MarginofError LCB95Pct
#> 1 STATE    California      BMMI    19 50.48964 4.049094      7.936079 42.55357
#> 2 STATE        Oregon      BMMI    47 61.29675 2.581032      5.058730 56.23802
#> 3 STATE    Washington      BMMI    30 54.23036 3.143924      6.161977 48.06838
#>   UCB95Pct
#> 1 58.42572
#> 2 66.35548
#> 3 60.39234To estimate the CDF, certain percentiles, and means of
BMMI for a design stratified by URBAN
category, run
strat_cont_ests <- cont_analysis(
  NLA_PNW,
  siteID = "SITE_ID",
  vars = "BMMI",
  weight = "WEIGHT",
  stratumID = "URBAN"
)To view the analysis results for the mean estimates, run
strat_cont_ests$Mean
#>        Type Subpopulation Indicator nResp Estimate StdError MarginofError
#> 1 All_Sites     All Sites      BMMI    96 56.50929 1.795959      3.520015
#>   LCB95Pct UCB95Pct
#> 1 52.98928 60.02931To then compute these estimates separately for each state, run
strat_cont_ests_sp <- cont_analysis(
  NLA_PNW,
  siteID = "SITE_ID",
  vars = "BMMI",
  weight = "WEIGHT",
  stratumID = "URBAN",
  subpop = "STATE",
)To view the analysis results for the mean estimates, run
strat_cont_ests_sp$Mean
#>    Type Subpopulation Indicator nResp Estimate StdError MarginofError LCB95Pct
#> 1 STATE    California      BMMI    19 50.48964 4.110046      8.055543 42.43410
#> 2 STATE        Oregon      BMMI    47 61.29675 1.630357      3.195441 58.10131
#> 3 STATE    Washington      BMMI    30 54.23036 2.930517      5.743708 48.48665
#>   UCB95Pct
#> 1 58.54519
#> 2 64.49219
#> 3 59.97407Alongside the cat_analysis() and
cont_analysis() functions, spsurvey offers functions for
estimating attributable risk, relative risk, risk difference, change,
and trend. Attributable risk analysis, relative risk analysis, and risk
difference analysis quantify the attributable risk, relative risk, and
risk difference, respectively, of environmental resources being in poor
condition after exposure to a stressor. Attributable risk analysis is
performed using the attrisk_analysis() function, relative
risk analysis is performed using the relrisk_analysis()
function, and risk difference analysis is performed using the
diffrisk_analysis() function. Change and trend analysis
capture the behavior of environmental resources between two samples,
while trend analysis generalizes this approach to include more than two
samples. Often, change and trend analyses are performed to study an
environmental resource through time. Change analysis is performed using
the change_analysis() function, and trend analysis is
performed using the trend_analysis() function. The
attrisk_analysis(), relrisk_analysis(),
diffrisk_analysis(), change_analysis() and
trend_analysis() functions all share very similar syntax
with the cat_analysis() and cont_analysis()
functions.
The attributable risk is defined as \[1 - \frac{P(Response = Poor | Stressor = Good)}{P(Response = Poor)},\] where \(P(\cdot)\) is a probability and \(P(\cdot | \cdot)\) is a conditional probability. The attributable risk measures the proportion of the response variable in poor condition that could be eliminated if the stressor was always in good condition. To estimate the attributable risk of benthic macroinvertebrates with a phosphorous condition stressor, run
attrisk_ests <- attrisk_analysis(
  NLA_PNW, 
  siteID = "SITE_ID",
  vars_response = "BMMI_COND",
  vars_stressor = "PHOS_COND",
  weight = "WEIGHT"
)
attrisk_ests
#>        Type Subpopulation  Response  Stressor nResp  Estimate StdError_log
#> 1 All_Sites     All Sites BMMI_COND PHOS_COND    96 0.6201042     0.624808
#>   MarginofError_log  LCB95Pct  UCB95Pct WeightTotal Count_RespPoor_StressPoor
#> 1          1.224601 -0.292713 0.8883582    10679.65                         5
#>   Count_RespPoor_StressGood Count_RespGood_StressPoor Count_RespGood_StressGood
#> 1                         7                        18                        66
#>   Prop_RespPoor_StressPoor Prop_RespPoor_StressGood Prop_RespGood_StressPoor
#> 1               0.03971418               0.01738181                 0.158931
#>   Prop_RespGood_StressGood
#> 1                 0.783973The relative risk is defined as \[\frac{P(Response = Poor | Stressor = Poor)}{P(Response = Poor | Stressor = Good)},\] which measures the risk of the response variable being in poor condition relative to the stressor’s condition. To estimate the relative risk of benthic macroinvertebrates being in poor condition with a phosphorous condition category stressor, run
relrisk_ests <- relrisk_analysis(
  NLA_PNW, 
  siteID = "SITE_ID",
  vars_response = "BMMI_COND",
  vars_stressor = "PHOS_COND",
  weight = "WEIGHT"
)
relrisk_ests
#>        Type Subpopulation  Response  Stressor nResp Estimate Estimate_num
#> 1 All_Sites     All Sites BMMI_COND PHOS_COND    96 9.217166    0.1999252
#>   Estimate_denom StdError_log MarginofError_log LCB95Pct UCB95Pct WeightTotal
#> 1     0.02169053    0.8753855          1.715724 1.657555 51.25389    10679.65
#>   Count_RespPoor_StressPoor Count_RespPoor_StressGood Count_RespGood_StressPoor
#> 1                         5                         7                        18
#>   Count_RespGood_StressGood Prop_RespPoor_StressPoor Prop_RespPoor_StressGood
#> 1                        66               0.03971418               0.01738181
#>   Prop_RespGood_StressPoor Prop_RespGood_StressGood
#> 1                 0.158931                 0.783973The risk difference is defined as \[P(Response = Poor | Stressor = Poor) - P(Response = Poor | Stressor = Good),\] which measures the risk of the response variable being in poor condition differenced by the stressor’s condition. To estimate the risk difference of benthic macroinvertebrates being in poor condition with a phosphorous condition category stressor, run
diffrisk_ests <- diffrisk_analysis(
  NLA_PNW, 
  siteID = "SITE_ID",
  vars_response = "BMMI_COND",
  vars_stressor = "PHOS_COND",
  weight = "WEIGHT"
)
diffrisk_ests
#>        Type Subpopulation  Response  Stressor nResp  Estimate
#> 1 All_Sites     All Sites BMMI_COND PHOS_COND    96 0.1782347
#>   Estimate_StressPoor Estimate_StressGood  StdError MarginofError   LCB95Pct
#> 1           0.1999252          0.02169053 0.1139557      0.223349 -0.0451143
#>    UCB95Pct WeightTotal Count_RespPoor_StressPoor Count_RespPoor_StressGood
#> 1 0.4015837    10679.65                         5                         7
#>   Count_RespGood_StressPoor Count_RespGood_StressGood Prop_RespPoor_StressPoor
#> 1                        18                        66               0.03971418
#>   Prop_RespPoor_StressGood Prop_RespGood_StressPoor Prop_RespGood_StressGood
#> 1               0.01738181                 0.158931                 0.783973By default, the levels of the variables in vars_response
and vars_stressor are assumed to equal "Poor"
(event occurs) or "Good" (event does not occur). If those
default levels do not match the levels of the variables in
vars_response and vars_stressor, the levels of
vars_response and vars_stressor must be
explicitly stated using the response_levels and
stressor_levels arguments, respectively. Similar to
cat_analysis() and cont_analysis() from the
previous sections, subpopulations and stratification are incorporated
using subpops and stratumID, respectively. For
more on attributable and relative risk in an environmental resource
context, see Van Sickle and Paulsen (2008).
To demonstrate change analysis, we use the NRSA_EPA7
data. There are several variables in NRSA_EPA7 you will use
next:
SITE_ID: a unique site identifierWEIGHT: the survey design weightsNITR_COND: nitrogen condition category
(Good, Fair, and Poor)BMMI: benthic macroinvertebrate multi-metric indexYEAR: probability sample (survey) yearTo estimate the change between samples (time points) for
BMMI (a continuous variable) and NITR_COND (a
categorical variable), run
change_ests <- change_analysis(
  NRSA_EPA7,
  siteID = "SITE_ID",
  vars_cont = "BMMI",
  vars_cat = "NITR_COND",
  surveyID = "YEAR",
  weight = "WEIGHT"
)The surveyID argument is the variable in the data
distinguishing the different samples (YEAR in the previous
example).
To view the analysis results for NITR_COND (the
categorical variable), run
change_ests$catsum
#>   Survey_1 Survey_2      Type Subpopulation Indicator     Category  DiffEst.P
#> 1  2008-09  2013-14 All_Sites     All Sites NITR_COND         Fair -1.8867976
#> 2  2008-09  2013-14 All_Sites     All Sites NITR_COND         Good -2.9648182
#> 3  2008-09  2013-14 All_Sites     All Sites NITR_COND Not Assessed -0.2447633
#> 4  2008-09  2013-14 All_Sites     All Sites NITR_COND         Poor  5.0963791
#>   StdError.P MarginofError.P  LCB95Pct.P UCB95Pct.P  DiffEst.U StdError.U
#> 1  3.5366139       6.9316358  -8.8184334  5.0448382 -2919.8839  4990.4714
#> 2  4.7633367       9.3359683 -12.3007865  6.3711502 -4597.9365  6772.4175
#> 3  0.2072643       0.4062305  -0.6509938  0.1614672  -363.1305   306.1656
#> 4  5.8020788      11.3718655  -6.2754864 16.4682446  6598.4548 19777.9218
#>   MarginofError.U LCB95Pct.U UCB95Pct.U nResp_1 Estimate.P_1 StdError.P_1
#> 1       9781.1441 -12701.028   6861.260      37   11.2929433    2.7579622
#> 2      13273.6943 -17871.631   8675.758      22   18.5076549    3.6712147
#> 3        600.0735   -963.204    236.943       1    0.2447633    0.2072643
#> 4      38764.0144 -32165.560  45362.469     119   69.9546385    4.3636869
#>   MarginofError.P_1 LCB95Pct.P_1 UCB95Pct.P_1 Estimate.U_1 StdError.U_1
#> 1         5.4055067     5.887437   16.6984500   16754.1954    3998.9767
#> 2         7.1954486    11.312206   25.7031034   27457.9317    5388.5407
#> 3         0.4062305     0.000000    0.6509938     363.1305     306.1656
#> 4         8.5526691    61.401969   78.5073076  103784.6070   12266.4461
#>   MarginofError.U_1 LCB95Pct.U_1 UCB95Pct.U_1 nResp_2 Estimate.P_2 StdError.P_2
#> 1         7837.8504     8916.345    24592.046      28     9.406146     2.213884
#> 2        10561.3456    16896.586    38019.277      34    15.542837     3.035055
#> 3          600.0735        0.000      963.204       0     0.000000     0.000000
#> 4        24041.7926    79742.814   127826.400     112    75.051018     3.823919
#>   MarginofError.P_2 LCB95Pct.P_2 UCB95Pct.P_2 Estimate.U_2 StdError.U_2
#> 1          4.339133     5.067013     13.74528     13834.31     2985.463
#> 2          5.948599     9.594238     21.49144     22860.00     4102.349
#> 3          0.000000     0.000000      0.00000         0.00        0.000
#> 4          7.494743    67.556274     82.54576    110383.06    15514.525
#>   MarginofError.U_2 LCB95Pct.U_2 UCB95Pct.U_2
#> 1          5851.400     7982.911     19685.71
#> 2          8040.456    14819.539     30900.45
#> 3             0.000        0.000         0.00
#> 4         30407.911    79975.151    140790.97Estimates are provided for the difference between the two samples and
for each of the two individual samples (the _1 and
_2 suffixes).
To view the analysis results for BMMI (the continuous
variable), run
change_ests$contsum_mean
#>   Survey_1 Survey_2      Type Subpopulation Indicator  DiffEst StdError
#> 1  2008-09  2013-14 All_Sites     All Sites      BMMI 3.971559 2.561155
#>   MarginofError  LCB95Pct UCB95Pct nResp_1 Estimate_1 StdError_1
#> 1      5.019771 -1.048211  8.99133     179   25.88274   1.468744
#>   MarginofError_1 LCB95Pct_1 UCB95Pct_1 nResp_2 Estimate_2 StdError_2
#> 1        2.878686   23.00405   28.76142     174    29.8543   2.098166
#>   MarginofError_2 LCB95Pct_2 UCB95Pct_2
#> 1        4.112331   25.74196   33.96663Though we don’t show an example here, the median can be estimated
using the test argument in
change_anlaysis().
Trend analysis generalizes change analysis to more than two samples
(usually time points). Though we omit an example here, the arguments to
trend_analysis() are very similar to the arguments for
change_analysis(). One difference is that
trend_analysis() contains arguments that specify which
statistical model to apply to the estimates from each sample.
The default variance estimator in spsurvey is the local neighborhood
variance estimator (Stevens and Olsen, 2003). The local neighborhood
variance estimator incorporates the spatial locations of design sites
into the variance estimation process. Because the local neighborhood
variance estimator incorporates spatial locations, the resulting
variance estimate tends to be smaller than variance estimates from
approaches ignoring spatial locations. The local neighborhood variance
estimator requires x-coordinates and y-coordinates of the design sites.
When the analysis data is an sf object, the coordinates
from the data’s geometry are used. When the analysis data is just a data
frame, x-coordinates and y-coordinates (from an appropriate coordinate
reference system) must be provided using the xcoord and
ycoord arguments, respectively, of the analysis function
being used.
Several additional variance estimation options are available in
spsurvey’s analysis functions through the vartype and
jointprob arguments.
Sickle, J. V., & Paulsen, S. G. (2008). Assessing the attributable risks, relative risks, and regional extents of aquatic stressors. Journal of the North American Benthological Society, 27(4), 920-931.
Stevens Jr, D. L., & Olsen, A. R. (2003). Variance estimation for spatially balanced samples of environmental resources. Environmetrics, 14(6), 593-610.