Optimum Sample Allocation in Stratified Sampling with stratallo

The stratallo package provides algorithms for computing optimum sample allocations in stratified sampling designs. The implemented methods solve several classical optimization problems arising in survey design using exact analytical algorithms, rather than relying on numerical approximations.

The package supports a variety of practical constraints, including lower and upper bounds on stratum sample sizes, cost constraints, and multi-domain precision control.


Installation

install.packages("stratallo")

# Development version
# install.packages("remotes")
remotes::install_github("wwojciech/stratallo")

User functions

The package provides three main user functions for solving optimum allocation problems:

Function Description
opt() Optimum allocation with a fixed total sample size.
optcost() Minimum-cost allocation under a variance constraint.
dopt() Multi-domain optimum allocation with controlled precision.

The package also includes several helper functions:

Function Description
var_st() Computes the value of the variance of the stratified \(\pi\) estimator of a population total.
var_stsi() Computes the value of the variance of the stratified \(\pi\) estimator of a population total under simple random sampling without replacement within each stratum.
alloc_summary() Summarizes an allocation produced by opt() or optcost().
round_oric() Deterministic rounding procedure for non-integer allocations, preserving integer constraints.
round_ran() Randomized rounding procedure for non-integer allocations.

For a detailed description of the allocation problems and algorithms, see the package vignette.


Example datasets

The package includes several artificial populations that can be used for examples and benchmarking. The available datasets can be listed with

data(package = "stratallo")

Examples

library(stratallo)

Optimum allocation with fixed sample size

N <- c(3000, 4000, 5000, 2000) # Strata sizes.
S <- c(48, 79, 76, 16) # Standard deviations of a study variable in strata.
A <- N * S
m <- c(100, 90, 500, 50) # Lower bounds.
M <- c(300, 400, 800, 90) # Upper bounds.
n <- 1284 # Total sample size.

x <- opt(n = n, A = A, m = m, M = M)
x
#> [1] 228.9496 400.0000 604.1727  50.8777

x_int <- round_oric(x)
x_int
#> [1] 229 400 604  51

var_stsi(x, N, S)
#> [1] 538073357
var_stsi(x_int, N, S)
#> [1] 538073497

Multi-domain optimal allocation with controlled precision

# Three domains with 2, 2, and 3 strata, respectively.
H_counts <- c(2, 2, 3)
N <- c(140, 110, 135, 190, 200, 40, 70)
S <- c(180, 20, 5, 4, 35, 9, 40)
total <- c(2, 3, 5)
kappa <- c(0.5, 0.2, 0.3)
n <- 828

dopt(n, H_counts, N, S, total, kappa)
#> [1] 140.00000 108.06261 135.00000 154.02807 200.00000  20.90933  70.00000

mirror server hosted at Truenetwork, Russian Federation.