Type: Package
Title: Bootstrap Algorithms for Finite Population Inference
Version: 0.4.6
Date: 2024-03-08
Description: Finite Population bootstrap algorithms to estimate the variance of the Horvitz-Thompson estimator for single-stage sampling. For a survey of bootstrap methods for finite populations, see Mashreghi et Al. (2016) <doi:10.1214/16-SS113>.
License: GPL-3
Encoding: UTF-8
BugReports: https://github.com/rhobis/bootstrapFP/issues
RoxygenNote: 7.3.1
Imports: sampling
NeedsCompilation: no
Packaged: 2024-03-08 22:29:45 UTC; Roberto
Author: Roberto Sichera [aut, cre]
Maintainer: Roberto Sichera <rob.sichera@gmail.com>
Repository: CRAN
Date/Publication: 2024-03-08 23:00:02 UTC

bootstrapFP: Bootstrap Algorithms for Finite Population Inference

Description

Perform bootstrap variance estimation of the Horvitz-Thompson total estimator in finite population sampling with equal or unequal probabilities.

Author(s)

Maintainer: Roberto Sichera rob.sichera@gmail.com

References

Mashreghi Z.; Haziza D.; Léger C., 2016. A survey of bootstrap methods in finite population sampling. Statistics Surveys 10 1-52.

See Also

Useful links:


Antal and Tillé (2011) Bootstrap for Unequal Probability Sampling without replacement

Description

Draw B bootstrap samples according to Antal and Tillé (2011) direct bootstap method for Unequal Probability Sampling. Note that this method does not need a double bootstrap.

Usage

AntalTille2011_ups(
  ys,
  pks,
  B,
  smplFUN,
  approx_method = c("Hajek", "DevilleTille")
)

Arguments

ys

values of the variable of interest for the original sample

pks

vector of first-order inclusion probabilities for sampled units

B

integer scalar, number of bootstrap resamples to draw from the pseudo-population

smplFUN

a function that takes as input a vector of length N of inclusion probabilities and return a vector of length N, either logical or a vector of 0s and 1s, where TRUE or 1 indicate sampled units and FALSE or 0 indicate non-sample units.

approx_method

method used to approximate the variance Dkk.

Value

a list of two elements, a vector of K average bootstrap totals and a vector of K variance estimates.

References

Antal, E.; Tillé, Y., 2011. A Direct Bootstrap Method for Complex Sampling Designs From a Finite Population. Journal of the American Statistical Association, 106:494, 534-543, doi: 10.1198/jasa.2011.tm09767

Antal, E.; Tillé, Y., 2014. A new resampling method for sampling designs without replacement: the doubled half bootstrap. Computational Statistics, 29(5), 1345-1363. doi: 10.10007/s00180-014-0495-0


Bootstrap algorithms for Finite Population sampling

Description

Bootstrap variance estimation for finite population sampling.

Usage

bootstrapFP(
  y,
  pik,
  B,
  D = 1,
  method,
  design,
  x = NULL,
  s = NULL,
  distribution = "uniform"
)

Arguments

y

vector of sample values

pik

vector of sample first-order inclusion probabilities

B

scalar, number of bootstrap replications

D

scalar, number of replications for the double bootstrap (when applicable)

method

a string indicating the bootstrap method to be used, see Details for more

design

sampling procedure to be used for sample selection. Either a string indicating the name of the sampling design or a function; see section "Details" for more information.

x

vector of length N with values of the auxiliary variable for all population units, only required if method "ppHotDeck" is chosen

s

logical vector of length N, TRUE for units in the sample, FALSE otherwise. Alternatively, a vector of length n with the indices of the sample units. Only required for "ppHotDeck" method.

distribution

required only for method='generalised', a string indicating the distribution to use for the Generalised bootstrap. Available options are "uniform", "normal", "exponential" and "lognormal"

Details

Argument design accepts either a string indicating the sampling design to use to draw samples or a function. Accepted designs are "brewer", "tille", "maxEntropy", "poisson", "sampford", "systematic", "randomSystematic". The user may also pass a function as argument; such function should take as input the parameters passed to argument design_pars and return either a logical vector or a vector of 0 and 1, where TRUE or 1 indicate sampled units and FALSE or 0 indicate non-sample units. The length of such vector must be equal to the length of x if units is not specified, otherwise it must have the same length of units.

method must be a string indicating the bootstrap method to use. A list of the currently available methods follows, the sampling design they they should be used with is indicated in square brackets. The prefix "pp" indicates a pseudo-population method, the prefix "d" represents a direct method, and the prefix "w" inicates a weights method. For more details on these methods see Mashreghi et al. (2016).

Value

The bootstrap variance of the Horvitz-Thompson estimator.

References

Mashreghi Z.; Haziza D.; Léger C., 2016. A survey of bootstrap methods in finite population sampling. Statistics Surveys 10 1-52.

Examples


library(bootstrapFP)

### Generate population data ---
N   <- 20; n <- 5
x   <- rgamma(N, scale=10, shape=5)
y   <- abs( 2*x + 3.7*sqrt(x) * rnorm(N) )
pik <- n * x/sum(x)

### Draw a dummy sample ---
s  <- sample(N, n)

### Estimate bootstrap variance ---
bootstrapFP(y = y[s], pik = n/N, B=100, method = "ppSitter")
bootstrapFP(y = y[s], pik = pik[s], B=10, method = "ppHolmberg", design = 'brewer')
bootstrapFP(y = y[s], pik = pik[s], B=10, D=10, method = "ppChauvet")
bootstrapFP(y = y[s], pik = n/N, B=10, method = "dRaoWu")
bootstrapFP(y = y[s], pik = n/N, B=10, method = "dSitter")
bootstrapFP(y = y[s], pik = pik[s], B=10, method = "dAntalTille_UPS", design='brewer')
bootstrapFP(y = y[s], pik = n/N, B=10, method = "wRaoWuYue") 
bootstrapFP(y = y[s], pik = n/N, B=10, method = "wChipperfieldPreston")
bootstrapFP(y = y[s], pik = pik[s], B=10, method = "wGeneralised", distribution = 'normal')




Bootstrap with Adjusted Weights

Description

Compute bootstrap estimates according to Bootstrap Weights procedures by Rao et Al. (1992) and Chipperfield and Preston (2007).

Usage

bootstrap_weights(ys, N, B, method = c("RaoWuYue", "ChipperfieldPreston"))

Arguments

ys

values of the variable of interest for the original sample

N

scalar, representing the population size

B

integer scalar, number of bootstrap resamples to draw from the pseudo-population

method

a string indicating the bootstrap method to be used; available methods are "RaoWuYue" and "ChipperfieldPreston".

Value

a list of two elements, a vector of K average bootstrap totals and a vector of K variance estimates.

References

Rao J. N. K.; Wu C. F. J.; Yue K. (1992). Some recent work on resampling methods for complex surveys. Journal of the American Statistical Association, 83(401), 620-630.

Chipperfield J.; Preston J. (2007).Efficient bootstrap for business surveys. Survey Methodology, 33(2), 167-172.


Define the phi vector

Description

Define the phi vector used to select the first sample in Antal & Tillé (2011) bootstrap (algorithm 4, first step). If the sum of the elements of \phi is not an integer, phi is decomposed in a convex combination of two vectors \phi_1 and \phi_2, such that the sum of \phi_1i is the integer part of \sum phi_i and the sum of \phi_2i is the integer part of \sum phi_i plus 1 [see Antal and Tille' (2011) bootstrap procedure for unequal probability sampling, p. 539 - Algorithm 4, Case 1] The procedure used to decompose the vector \phi is described in the answer to this question: https://math.stackexchange.com/questions/2700483/vector-decomposition-into-a-convex-combination-of-two-vectors-with-constraints-o

Usage

define_phi(phi)

Arguments

phi

vector of inclusion probabilities for Antal and Tillé (2011) bootstrap, given by 1 - D_kk

Value

a list with the two vectors in which phi is decomposed


Direct bootstrap methods for simple random sampling

Description

Direct bootstrap methods for simple random sampling

Usage

directBS_srs(y, N, B, method)

Arguments

y

vector of sample values

N

scalar, representing the population size

B

scalar, number of bootstrap replications

method

a string indicating the bootstrap method to be used, available methods are: 'Efron', 'McCarthySnowden', 'RaoWu', 'Sitter'.

Details

See Mashreghi et al. (2016) for details about the algorithm.

References

Mashreghi Z.; Haziza D.; Léger C., 2016. A survey of bootstrap methods in finite population sampling. Statistics Surveys 10 1-52.


Select a doubled-half sampling (Antal and Tille', 2014)

Description

Select a doubled-half sampling (Antal and Tille', 2014)

Usage

doubled_half(n)

Arguments

n

integer scalar representing sample size

Value

an integer vector of size n, indicating how many times each unit is present in the sample


Generalised Bootstrap

Description

Compute bootstrap estimates according to Generalised Bootstrap procedure by Beaumont and Patak (2012)

Usage

generalised(
  ys,
  pks,
  B,
  distribution = c("uniform", "normal", "exponential", "lognormal")
)

Arguments

ys

values of the variable of interest for the original sample

pks

inclusion probabilities for units in the sample

B

integer scalar, number of bootstrap resamples to draw from the pseudo-population

distribution

the distribution from which to generate the weights adjustments. One of uniform, normal or lognormal.

Value

a list of two elements, a vector of K average bootstrap totals and a vector of K variance estimates.

References

Bertail, P., & Combris, P. (1997). Bootstrap généralisé d'un sondage. Annales d'Economie et de Statistique, 49-83.

Beaumont, J. F., & Patak, Z. (2012). On the generalized bootstrap for sample surveys with special attention to Poisson sampling. International Statistical Review, 80(1), 127-148.


Check if a number is integer

Description

Check if x is an integer number, differently from is.integer, which checks the type of the object x

Usage

is_wholenumber(x, tol = .Machine$double.eps^0.5)

Arguments

x

a scalar or a numeric vector

tol

a scalar, indicating the tolerance

Note

From the help page of function is.integer


Select a one-one sampling

Description

A one-one sampling is a design for which the random variables Sk, representing the number of times unit k is included in the sample, have expectation and variance equal to 1. Proposed by Antal and Tille' (2011, 2014).

Usage

one_one(n, method = c("doubled-half", "over-replacement"))

Arguments

n

integer, the sample size

method

algorithm to be used, either doubled half sampling or srs with over-replacement. See the Details section.

Details

Antal and Tillé proposed two procedures that lead to one-one samplings. The first one (Antal and Tillé, 2011a) in more complex and makes use of a simple random Sampling with over-replacement (Antal and Tillé, 2011b), and it is called by setting method = "over-replacement". The second one (Antal and Tillé, 2014) is the doubled half sampling, which is simpler and quickier to compute, and can employed by setting method = "doubled-half"; this is the default option.

Value

an integer vector of size n, indicating how many times each unit is present in the sample


Select a simple random sampling with over-replacement

Description

Used for resampling procedures. Proposed by Antal and Tille' (2011).

Usage

over_replacement(N, n)

Arguments

N

integer, the population size

n

integer, the sample size

Value

an integer vector of size n, indicating how many times each unit is present in the sample

References

Antal, E.; Tillé, Y. (2011). Simple random sampling with over-replacement. Journal of Statistical Planning and Inference, 141(1), 597-601.


Pseudo-population bootstrap for simple random sampling

Description

Pseudo-population bootstrap for simple random sampling

Usage

ppBS_srs(y, N, B, D = 1, method)

Arguments

y

vector of sample values

N

scalar, represents the population size

B

scalar, number of bootstrap replications

D

scalar, number of replications for the double bootstrap (when applicable)

method

a string indicating the bootstrap method to be used, available methods are: 'Gross', 'Booth', 'ChaoLo85', 'ChaoLo94', 'BickelFreedman', 'Sitter'

Details

See Mashreghi et al. (2016) for details about these bootstrap methods.

References

Mashreghi Z.; Haziza D.; Léger C., 2016. A survey of bootstrap methods in finite population sampling. Statistics Surveys 10 1-52.


Pseudo-population bootstrap for simple random sampling

Description

Pseudo-population bootstrap for simple random sampling

Usage

ppBS_ups(y, pik, B, D = 1, method, smplFUN, x = NULL, s = NULL)

Arguments

y

vector of sample values

pik

vector of sample first-order inclusion probabilities

B

scalar, number of bootstrap replications

D

scalar, number of replications for the double bootstrap

method

a string indicating the bootstrap method to be used, available methods are: 'Gross', 'Booth', 'ChaoLo85', 'ChaoLo94', 'BickelFreedman', 'Sitter'

smplFUN

a function that takes as input a vector of length N of inclusion probabilities and return a vector of length N, either logical or a vector of 0s and 1s, where TRUE or 1 indicate sampled units and FALSE or 0 indicate non-sample units.

x

vector of length N with values of the auxiliary variable for all population units, only required if method "HotDeck" is chosen

s

logical vector of length N, TRUE for units in the sample, FALSE otherwise. Alternatively, a vector of length n with the indices of the sample units. Only required for "HotDeck" method.

References

Mashreghi Z.; Haziza D.; Léger C., 2016. A survey of bootstrap methods in finite population sampling. Statistics Surveys 10 1-52.


Select the random part of a pseudo-population

Description

Helper function that generates the fixed part of a pseudo-population in function ppBS_srs().

Usage

select_Uc(..., method)

Arguments

...

parameters of the function, depending on the bootstap method chosen.

method

string indicating the bootstrap method

mirror server hosted at Truenetwork, Russian Federation.