Help for package samplingR

Type:

Package

Title:

Sampling and Estimation Methods

Version:

1.0.1

Date:

2023-06-25

Maintainer:

Javier Estévez <javier.estase@gmail.com>

Description:

Functions to take samples of data, sample size estimation and getting useful estimators such as total, mean, proportion about its population using simple random, stratified, systematic and cluster sampling.

Imports:

dplyr, methods

License:

GPL-2

Encoding:

UTF-8

RoxygenNote:

7.2.3

Suggests:

knitr, rmarkdown

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2023-06-25 19:53:00 UTC; javie

Author:

Javier Estévez [aut, cre, cph]

Repository:

CRAN

Date/Publication:

2023-06-26 11:00:02 UTC

Parameter estimation for cluster samples

Description

Estimates parameters with optional confidence interval for clustered data of similar cluster size.

Usage

cluster.estimator(
  N,
  data,
  estimator = c("total", "mean", "proportion", "class total"),
  replace = FALSE,
  alpha
)

Arguments

N

Number of clusters for the population

data

Cluster sample

estimator

Estimator to compute. Can be one of "total", "mean", "proportion", "class total". Default is "total".

replace

Whether the sample to be taken can have repeated instances or not.

alpha

Optional value to calculate estimation error and build 1-alpha

Details

This function admits both grouped and non-grouped by cluster data.
Non-grouped data must have interest variable data in the first column and cluster name each individual belongs to in the last column.
Grouped by cluster data must have interest variable data in the first column, cluster size in the second and the cluster name in the last column. Interest values of grouped data must reflect the total value of each cluster.

Value

A list containing different interest values:

estimator
variance
sampling.error
estimation.error
confint

Examples

d<-cbind(rnorm(500, 50, 20), rep(c(1:50),10)) #Non-grouped data
sample<-cluster.sample(d, n=10) #Non-grouped sample
sampleg<-aggregate(sample[,1], by=list(Category=sample[,2]), FUN=sum)
sampleg<-cbind(sampleg[,2], rep(10,10), sampleg[,1]) #Same sample but with grouped data
sum(d[,1])
cluster.estimator(N=50, data=sample, estimator="total", alpha=0.05)
cluster.estimator(N=50, data=sampleg, estimator="total", alpha=0.05)

Cluster sample for non-grouped data.

Description

Retrieves a sample of clusters for data which interest variable values are not grouped by cluster.

Usage

cluster.sample(data, n, replace = FALSE)

Arguments

data

Matrix or data.frame containing the population data in first column and the cluster it belongs to in the last column.

n

Number of clusters of the returning sample.

replace

Whether the sample to be taken can have repeated clusters or not.

Details

#' If your data is grouped by cluster use srs.sample function to retrieve your sample. Remember grouped by cluster data must have interest variable data in the first column, cluster size in the second and the cluster name in the last column. Interest values of grouped data must reflect the total value of each cluster.

Value

Data frame of a clustered sample

Examples

data<-cbind(rnorm(500, 50, 20), rep(c(1:50),10))
sample<-cluster.sample(data, 10);sample

Sample size estimation on cluster sampling

Description

Calculates the required sample size in order to achieve an absolute sampling error less or equal to the specified for an specific estimator and an optional confidence interval in cluster sampling.

Usage

cluster.samplesize(
  N,
  data,
  error,
  alpha,
  estimator = c("total", "mean", "proportion", "class total"),
  replace = FALSE
)

Arguments

N

Number of clusters in the population.

data

Dataset.

error

Sampling error.

alpha

Significance level to obtain confidence intervals.

estimator

The estimator to be estimated. Default is "total".

replace

Whether the samples to be taken can have repeated instances or not.

Details

Value

Number of clusters to be taken.

Examples

d<-cbind(rnorm(500, 50, 20), rep(c(1:50),10)) #Non-grouped data
sample<-cluster.sample(d, n=10) #Non-grouped sample
sampleg<-aggregate(sample[,1], by=list(Category=sample[,2]), FUN=sum)
sampleg<-cbind(sampleg[,2], rep(10,10), sampleg[,1]) #Sample sample with grouped data

#Cluster size to be taken for estimation
cluster.samplesize(N=50, data=sample, error=500, estimator="total", replace=TRUE)

newsample<-cluster.sample(d, n=26) #New sample for estimation
sum(d[,1])
cluster.estimator(N=50, data=newsample, estimator="total", alpha=0.05, replace=TRUE)
cluster.estimator(N=50, data=sampleg, estimator="total", alpha=0.05)

Simple Random Sample parameter estimation of domains.

Description

Function to make estimations of diferent parameters on a given domain based on a Simple Random Sample.

Usage

srs.domainestimator(
  Nh,
  data,
  estimator = c("total", "mean", "proportion", "class total"),
  domain,
  replace = FALSE,
  alpha
)

Arguments

Nh

Number of instances of the data set domain.

data

Sample of the data. It must constain a column with the data to estimate and a second column with the domain of each instance.

estimator

One of "total", "mean". Default is "total".

domain

Domain of the sample from which parameter estimation will be done.

replace

Whether the sample to be taken can have repeated instances or not.

alpha

Optional value to calculate estimation error and build 1-alpha confidence interval.

Details

Data columns must be arranged with interest values on the first column and domain values on the last column.
Domain parameter can be either numeric or character and must be equal to one of the values of the domain column of data.

Value

A list containing different interest values:

estimator
variance
sampling.error
estimation.error
confint

References

Pérez, C. (1999) Técnicas de muestreo estadístico. Teoría, práctica y aplicaciones informáticas. 193-195

Examples

data<-cbind(rnorm(500, 50, 20), rep(c(1:2),250))
sample<-data[srs.sample(500, 100),]
sum(data[which(data[,-1]==1),1])
srs.domainestimator(Nh = 250, data = sample, estimator="total", domain=1)

Simple Ramdom Sampling parameter estimation.

Description

Function to make estimations of diferent parameters based on a Simple Random Sample.

Usage

srs.estimator(
  N,
  data,
  estimator = c("total", "mean", "proportion", "class total"),
  replace = FALSE,
  alpha
)

Arguments

N

Number of instances of the data set.

data

Sample of the data. It must only contain a single column of the data to estimate.

estimator

Estimator to compute. Can be one of "total", "mean", "proportion", "class total". Default is "total".

replace

Whether the sample has been taken with replacement or not.

alpha

Optional value to calculate estimation error and build 1-alpha confidence interval.

Value

A list containing different interest values:

estimator
variance
sampling.error
estimation.error
confint

Examples

data<-rnorm(200, 100, 20)
sample<-data[srs.sample(200, 50)]
tau<-sum(data);tau
srs.estimator(200, sample, "total", alpha=0.05)


mu<-mean(data);mu
srs.estimator(200, sample, "mean", alpha=0.05)

Simple Random Sample

Description

With this function you receive a simple random sample consisting on a list of the instances index

Usage

srs.sample(N, n, replace = FALSE, data)

Arguments

N

Number of instances of the data set.

n

Number of instances of the returning sample.

replace

Whether the sample to be taken can have repeated instances or not.

data

Optional matrix or data.frame containing the population data. If specified an object of same class as data will be returned with sample instances.

Value

List of size n with numbers from 1 to N indicating the index of the data set's instances to be taken.

Examples

srs.sample(10,3)


data<-matrix(data=c(1:24), nrow=8)
N<-dim(data)[1]
sample<-srs.sample(N, 3, data = data)
sample

Simple Random Sample size.

Description

Calculates the required sample size in order to achieve a relative or absolute sampling error less or equal to the specified for an specific estimator and an optional confidence interval in simple random sampling.

Usage

srs.samplesize(
  N,
  var,
  error,
  alpha,
  estimator = c("total", "mean", "proportion", "class total"),
  p,
  mean,
  replace = FALSE,
  relative = FALSE
)

Arguments

N

Number of instances of the data set.

var

Estimated quasivariance.

error

Sampling error

alpha

Significance level to obtain confidence intervals.

estimator

One of "total", "proportion", "mean", "class total". Default is "total"

p

Estimated proportion. If estimator is not "proportion" or "class total" it will be ignored.

mean

Estimated mean. If relative=FALSE it will be ignored.

replace

Whether the sample to be taken can have repeated instances or not.

relative

Whether the specified error is relative or not.

Details

If the sample size result is not a whole number the number returned is the next whole number so srs.samplesize>=n is satisfied.
To estimate sample size of estimators "total" and "mean" estimated quasivariance must be provided. If the error is relative then estimated mean must also be provided.
To estimate sample size of estimator "proportion" and "class total" estimated proportion must be provided. If p is not specified sample size will be estimated based on worst-case scenario of p=0.5.
N must be always be provided for calculations.

Value

Number of instances of the sample to be taken.

Examples

data<-rnorm(200, 100, 20)
n<-srs.samplesize(200, var(data), estimator="total", error=400, alpha=0.05);n
sample<-data[srs.sample(200, n)]
srs.estimator(200, sample, "total", alpha=0.05)$sampling.error

Strata allocation given a sample size

Description

Function to allocate the number of samples to be taken for each strata given the total sample size and the strata.allocation method. The number of allocations returned will be equal to the length of the parameters.

Usage

strata.allocation(
  Nh,
  n,
  var,
  alloc = c("unif", "prop", "min", "optim"),
  C,
  cini,
  ch
)

Arguments

Nh

Vector of population strata sizes.

n

Sample size

var

Vector of strata variances.

alloc

The allocation method to be used. Default is "unif".

C

Total study cost.

cini

Overhead study cost.

ch

Vector of costs to take an individual from a strata for the sample.

Details

alloc="optim" is the only that requires cost function data. Total study and overhead study costs are optional. If given allocation will be done so total study cost is not surpassed.s

Value

Vector of strata sample sizes.

Examples

strata.allocation(Nh=rep(125,4), n=100, alloc="unif") #25, 25, 25, 25
strata.allocation(Nh=c(100, 50, 25), n=100, alloc="prop")

Parameter estimation of stratified data

Description

Function to make estimations of diferent parameters based on a stratified sample.

Usage

strata.estimator(
  N,
  Nh,
  data,
  estimator = c("total", "mean", "proportion", "class total"),
  replace = FALSE,
  alpha
)

Arguments

N

Population size.

Nh

Size of each population strata.

data

Stratified sample.

estimator

Estimator to compute. Can be one of "total", "mean", "proportion", "class total". Default is "total".

replace

Whether the sample to be taken can have repeated instances or not.

alpha

Optional value to calculate estimation error and build 1-alpha

Details

Nh length must be equal to number of strata in data.
data is meant to be a returned object of strata.sample function.

Value

A list containing different interest values:

estimator
variance
sampling.error
estimation.error
confint

Stratified sample

Description

With this function you receive a sample of each strata within your data with specified size for each strata.

Usage

strata.sample(data, n, replace = FALSE)

Arguments

data

Population data consisting of a number of columns of data and a last column specifying the strata each instance belongs to.

n

Numeric array of sample sizes for each strata to be taken.

replace

Whether the sample to be taken can have repeated instances or not.

Details

n length must be equal to number of strata in data.
On return list each strata sample can be accessed calling object$strataname where strataname are values of the last column of the original data.

Value

A list containing one strata sample per index.

Examples

data<-cbind(rnorm(500, 50, 20), rep(c("clase 1", "clase 2","clase 3","clase4"),125))
strata.sample(data=data, n=c(10,20,30,40))

Sample size estimation on stratified sampling

Description

Calculates the required sample size in order to achieve an absolute or relative sampling error less or equal to the specified for an specific estimator and an optional confidence interval in stratified sampling.

Usage

strata.samplesize(
  Nh,
  var,
  error,
  alpha,
  estimator = c("total", "mean", "proportion", "class total"),
  alloc = c("prop", "min", "optim"),
  ch,
  p,
  mean,
  replace = FALSE,
  relative = FALSE
)

Arguments

Nh

Vector of population strata sizes.

var

Vector of estimated strata variances.

error

Sampling error.

alpha

Significance level to obtain confidence intervals.

estimator

The estimator to be estimated. Default is "total".

alloc

The allocation to be used when taking samples. Default is "prop".

ch

Vector of cost per strata to select an individual for the sample.

p

Estimated population proportion. If estimator is not "proportion" or "class total" it will be ignored.

mean

Estimated population mean. If relative=FALSE it will be ignored.

replace

Whether the samples to be taken can have repeated instances or not.

relative

Whether the specified error is relative or not.

Details

With "proportion" and "class total" estimators variance vector must contain var return values equal to \frac{Nh}{(Nh-1)}p*(1-p) values.

Value

Number of instances of the sample to be taken.

Examples

strata.samplesize(c(120,100,110,50), c(458, 313,407,364), error=5, alpha=0.05, "mean", "prop")

Strata sample size by costs function

Description

This function returns the total sample size given a costs function consisting on the fixed total study cost, overhead study cost and a vector of costs by strata. can be given so the allocation is calculated to not exceed the total study cost.

Usage

strata.samplesize.cost(
  Nh,
  var,
  C,
  cini,
  ch,
  alloc = c("unif", "prop", "optim")
)

Arguments

Nh

Vector of population strata sizes.

var

Vector of strata variance values.

C

Total study cost.

cini

Overhead study cost.

ch

Vector of costs to take an individual from a strata for the sample.

alloc

The allocation method to be used. Default is "unif".

Details

Strata variance values are only necessary for optim allocation.

Value

Sample size.

Examples

strata.samplesize.cost(Nh=c(100,500,200), C=1000, cini=70, ch=c(9,5,12), alloc="prop")

Systematic samples

Description

Returns all possible systematic samples of size n

Usage

syst.all.samples(data, n)

Arguments

data

Population data

n

Sample size

Value

List with a sample per entrance

Examples

data<-c(1,3,5,2,4,6,2,7,3)
syst.all.samples(data, 3)

Analysis of variance of population data

Description

Analysis of variance of population data

Usage

syst.anova(data, n)

Arguments

data

Population data

n

Sample size

Value

Summary

Examples

data<-c(1,3,5,2,4,6,2,7,3)
syst.anova(data,3)

Parameter estimation on a systematic sample

Description

Parameter estimation on a systematic sample

Usage

syst.estimator(
  N,
  sample,
  estimator = c("total", "mean", "proportion", "class total"),
  method = c("srs", "strata", "syst"),
  alpha,
  data,
  t
)

Arguments

N

Population size

sample

Vector containing the systematic sample

estimator

Estimator to compute. Can be one of "total", "mean", "proportion", "class total". Default is "total".

method

Method of variance estimation. Can be one of "srs", "strata", "syst".

alpha

Optional value to calculate estimation error and build 1-alpha confidence interval.

data

Population data.

t

Number of systematic samples to take with interpenetrating samples method.

Details

Variance estimation has no direct formula in systematic sampling, thus estimation method must be done. Refer to syst.intracorr and syst.intercorr functions details for more information.
"syst" method uses interpenetrating samples method in which t systematic samples of size=\frac{n}{t} are taken to estimate. \frac{n}{t} must be even.
By choosing the start at random for all the samples they can be considered random taken. With this method population data and t must be given.

Value

A list containing different interest values:

estimator
variance
sampling.error
estimation.error
confint

Examples

data<-c(1,3,5,2,4,6,2,7,3)
sample<-syst.sample(9, 3, data)
syst.estimator(N=9, sample, "mean", "srs", 0.05)

Stratified correlation coefficient

Description

Stratified correlation coefficient

Usage

syst.intercorr(N, n, data)

Arguments

N

Population size

n

Sample size

data

Population data

Details

This value serves as a comparison between systematic and stratified sampling precision.
At value=1 the systematic precision is minimum. At value=0 both sampling methods precision are equal. At value= \frac{-1}{n-1} systematic precision is maximum.
Summarising at values between 1 and 0 stratified sampling estimation has more precision than systematic, so method="strata" should be set at syst.estimator. The other way method="syst" of interpenetrating samples method is better.

Value

Correlation coefficient

Examples

data<-c(1,3,5,2,4,6,2,7,3)
syst.intercorr(9,3,data)  #0.09022556

Intraclass correlation coefficient

Description

Intraclass correlation coefficient

Usage

syst.intracorr(N, n, data)

Arguments

N

Population size

n

Sample size

data

Population data

Details

This value serves as a comparison between systematic and simple random sampling precision.
At value=1 the systematic precision is minimum. At value=0 both sampling methods precision are equal. At value= \frac{-1}{n-1} systematic precision is maximum.
Summarising at values between 1 and 0 simple random sampling estimation has more precision than systematic, so method="srs" should be set at syst.estimator. The other way method="syst" of interpenetrating samples method is better.

Value

Intraclass correlation

Examples

data<-c(1,3,5,2,4,6,2,7,3)
syst.intracorr(9, 3, data)  #0.34375 example 1

Systematic sampling sample

Description

Retrieves a \frac{N}{n} systematic sample

Usage

syst.sample(N, n, data)

Arguments

N

Population size.

n

Sample size

data

Optional data of the population.

Details

If \frac{N}{n} is not an even number a 1 in floor(\frac{N}{n}) sample will be taken.

Value

Vector of size n with numbers from 1 to N indicating the index samples to be taken. If data is provided then the instances will be returned.

Examples

data<-runif(40)
syst.sample(40,8, data)