Help for package RegSDC

Type:

Package

Title:

Information Preserving Regression-Based Tools for Statistical Disclosure Control

Version:

1.0.0

Date:

2025-02-03

Author:

Øyvind Langsrud [aut, cre]

Maintainer:

Øyvind Langsrud <oyl@ssb.no>

Depends:

R (≥ 3.0.0)

Imports:

SSBtools (≥ 1.3.4), MASS, Matrix

Description:

Implementation of the methods described in the paper with the above title: Langsrud, Ø. (2019) <doi:10.1007/s11222-018-9848-9>. The package can be used to generate synthetic or hybrid continuous microdata, and the relationship to the original data can be controlled in several ways. A function for replacing suppressed tabular cell frequencies with decimal numbers is included.

License:

Apache License 2.0 | file LICENSE

Encoding:

UTF-8

URL:

https://github.com/olangsrud/RegSDC, https://olangsrud.github.io/RegSDC/

BugReports:

https://github.com/olangsrud/RegSDC/issues

RoxygenNote:

7.3.1

NeedsCompilation:

Packaged:

2025-02-03 14:37:18 UTC; oyl

Repository:

CRAN

Date/Publication:

2025-02-03 16:30:07 UTC

Calculation of C by solving equation 10 in the paper

Description

The limit calculated by FindAlpha is used when alpha =1 cannot be chosen (warning produced). In output, alpha is attribute.

Usage

CalculateCdirect(a, b, epsAlpha = 1e-07, AlphaHandler = warning, alpha = NULL)

CalculateC(a, b, ..., viaQR = NULL, returnAlpha = FALSE)

Arguments

a

matrix E in paper

b

matrix Eg in paper

epsAlpha

Precision constant for alpha calculation

AlphaHandler

Function (warning or stop) to be used when alpha<1

alpha

Possible with alpha as input instead of computing

...

Arguments to CalculateCdirect

viaQR

When TRUE QR is involved. This may be needed to handle colinear data. When NULL viaQR is set to TRUE if ordinary computations fail.

returnAlpha

When TRUE alpha (1 or value below 1) is returned instead of C. Attribute viaQR is included.

Details

When epsAlpha=NULL calculations are performed directly (alpha=1) and alpha is not attribute.

Value

Calculated C with attributes alpha and viaQR (when CalculateC)

Author(s)

Øyvind Langsrud

Examples

x <- 1:10
y <- matrix(rnorm(30) + 1:30, 10, 3)
a <- residuals(lm(y ~ x))
b <- residuals(lm(2 * y + matrix(rnorm(30), 10, 3) ~ x))

a1 <- a
b1 <- b
a1[, 3] <- a[, 1] + a[, 2]
b1[, 3] <- b[, 1] + b[, 2]

alpha <- FindAlpha(a, b)
FindAlphaSimple(a, b)  # Same result as above
CalculateC(a, b)
CalculateCdirect(a, b)  # Same result as above without viaQR attribute 
CalculateCdirect(a, b, alpha = alpha/(1 + 1e-07))  # Same result as above since epsAlpha = 1e-07
CalculateCdirect(a, b, alpha = alpha/2)  # OK
# CalculateCdirect(a,b, alpha = 2*alpha) # Not OK

FindAlpha(a, b1)
# FindAlphaSimple(a,b1) # Not working since b1 is collinear
CalculateC(a, b1, returnAlpha = TRUE)  # Almost same alpha as above (epsAlpha cause difference)

FindAlpha(b, a)
CalculateC(b, a, returnAlpha = TRUE)  # 1 returned (not same as above)
CalculateC(b, a)

FindAlpha(b1, a)   # alpha smaller than epsAlpha is set to 0 in CalculateC
CalculateC(b1, a)  # When alpha = 0 C is calculated by GenQR insetad of chol

Matrix difference (a-b) including checking for equal columns

Description

Each column is checked by all.equal

Usage

Cdiff(a, b, tolerance = sqrt(.Machine$double.eps))

Arguments

a

numerical matrix

b

numerical matrix

tolerance

parameter to all.equal

Value

(a-b) where equal columns are set to zero

Examples

a <- matrix(rnorm(6), 3, 2)
b <- matrix(rnorm(6), 3, 2)
a - b
Cdiff(a, b)
b[, 1] <- a[, 1] + (.Machine$double.eps)^(2/3) * b[, 1]
a - b
Cdiff(a, b)
a[, 2] <- b[, 2]
a - b
Cdiff(a, b)

Ensure constant term in matrix

Description

A column of ones may be added

Usage

EnsureIntercept(x)

Arguments

x

Input matrix

Value

The input matrix possibly with a column of ones added

Author(s)

Øyvind Langsrud

Examples

x <- matrix(c(5, 8, 4, 2, 7, 6), 3, 2)
EnsureIntercept(x)
EnsureIntercept(cbind(x, 2))
EnsureIntercept(cbind(x, 0))
EnsureIntercept(matrix(0, 4, 0))

Ensure that input is matrix (by as.matrix) and check number of rows (and columns)

Description

Ensure that input is matrix (by as.matrix) and check number of rows (and columns)

Usage

EnsureMatrix(x, nRow = NULL, nCol = NULL)

Arguments

x

NULL or input to as.matrix

nRow

Expected number of rows

nCol

Expected number of columns

Value

Input as a matrix

Author(s)

Øyvind Langsrud

Examples

x <- matrix(c(5, 8, 4, 2, 7, 6), 3, 2)
EnsureMatrix(x)
EnsureMatrix(x, 3)
EnsureMatrix(1:4)
EnsureMatrix(1:4, 4)
EnsureMatrix(NULL, 4)
try(EnsureMatrix(x, 4))
try(EnsureMatrix(1:3, 4))
EnsureMatrix(x, 3, 2)
try(EnsureMatrix(x, 3, 3))
try(EnsureMatrix(NULL, 3, 3))

Calculation of alpha

Description

Function to find the largest alpha that makes equation 10 in the paper solvable.

Usage

FindAlpha(a, b, tryViaQR = TRUE)

FindAlphaSimple(a, b)

Arguments

a

matrix E in paper

b

matrix Eg in paper

tryViaQR

When TRUE QR transformation used (to handle collinearity) when ordinary calculations fail.

Value

alpha

Note

FindAlphaSimple performs the calculations by a simple/direct method. FindAlpha is made to handle problematic special cases.

Author(s)

Øyvind Langsrud

Generalized QR decomposition

Description

Matrix X decomposed as Q and R (X=QR) where columns of Q are orthonormal. Ordinary QR or SVD may be used.

Usage

GenQR(x, doSVD = FALSE, findR = TRUE, makeunique = findR, tol = 1e-07)

Arguments

x

Matrix to be decomposed

doSVD

When TRUE SVD instead of QR

findR

When FALSE only Q returned

makeunique

When TRUE force uniqueness by positive diagonal elements (QR) or by column sums (SVD)

tol

As input to qr or, in the case of svd(), similar as input to MASS::ginv().

Details

To handle dependency a usual decomposition of X is PX=QR where P is a permutation matrix. This function returns RP^T as R. When SVD, Q=U and R=SV^T.

Value

List with Q and R or just Q

Author(s)

Øyvind Langsrud

Examples

   GenQR(matrix(rnorm(15),5,3))
   GenQR(matrix(rnorm(15),5,3)[,c(1,2,1,3)])
   GenQR(matrix(rnorm(15),5,3)[,c(1,2,1,3)],TRUE)

Extended variant of RegSDCipso

Description

Possible to generate several y's and to re-scale residuals. Regression fitting by a sparse matrix algorithm is also possible (see reference).

Usage

IpsoExtra(
  y,
  x = NULL,
  ensureIntercept = TRUE,
  returnParts = FALSE,
  nRep = 1,
  resScale = NULL,
  digits = 9,
  rmse = NULL,
  sparseLimit = 500,
  printInc = TRUE
)

Arguments

y

Matrix of confidential variables

x

Matrix of non-confidential variables

ensureIntercept

Whether to ensure/include a constant term. Non-NULL x is subjected to EnsureIntercept

returnParts

Alternative output two matrices: yHat (fitted) and yRes (generated residuals).

nRep

Integer, when >1, several y's will be generated. Extra columns in output.

resScale

Residuals will be scaled by resScale

digits

Digits used to detect perfect fit (caused by fitted values as input). This checking will be done only when rmse is in input. When perfect fit, rmse will be used instead of resScale.

rmse

Desired root mean square error (residual standard error). Will be used when resScale is NULL or cannot be used (see parameter digits). This parameter forces the rmse value for one y variable (the first).

sparseLimit

Limit for the number of rows of a reduced x-matrix within the algorithm. When exceeded, a sparse algorithm is used (see reference).

Value

Generated version of y

Author(s)

Øyvind Langsrud

References

Douglas Bates and R Development Core Team (2022), Comparing Least Squares Calculations, R Vignette, vignette("Comparisons", package="Matrix").

Examples

x <- matrix(1:5, 5, 1)
y <- matrix(10 * (sample(7:39, 15) + 4 * (1:15)), 5, 3)
colnames(y) <- paste("y", 1:3, sep = "")
y1 <- y[, 1, drop = FALSE]

IpsoExtra(y, x)  # Same as RegSDCipso(y, x)

IpsoExtra(y, x, resScale = 0)  # Fitted values (whole numbers in this case)
IpsoExtra(y, x, nRep = 2, resScale = 1e-05)  # Downscaled residuals 

ySynth <- IpsoExtra(y1, x, nRep = 2, rmse = 0.25)  # Downscaled residuals 
summary(lm(ySynth ~ x))  # Identical regression results with Residual standard error: 0.25

IpsoExtra(fitted(lm(y1 ~ x)), x, nRep = 2, resScale = 0.1)  # resScale no effect since perfect fit
IpsoExtra(fitted(lm(y1 ~ x)), x, nRep = 2, resScale = 0.1, rmse = 2)  # with warning

# Using data in the paper
IpsoExtra(RegSDCdata("sec7y"), RegSDCdata("sec7x"))  # Similar to Y*
IpsoExtra(RegSDCdata("sec7y"), RegSDCdata("sec7x"), rmse = 1)

Suppressed tabular data: Reduce dummy matrix, X (and estimate Y)

Description

In section 7 in the paper Z = t(X) %*% Y where X is a dummy matrix. Some elements of Y can be found directly as elements in Z. Corresponding rows of X will be removed. After removing rows, some columns will only have zeros and these will also be removed.

Usage

ReduceX(x, z = NULL, y = NULL, digits = 9)

Arguments

x

X as a matrix

z

Z as a matrix

y

Y as a matrix

digits

When non-NULL and when NULL y input, output y estimates close to whole numbers will be rounded using digits as input to RoundWhole.

Details

To estimate Y, this function finds some values directly from Z and other values by running Z2Yhat on reduced versions of X and Z.

Value

A list of four elements:

x

Reduced x

z

Corresponding reduced z or NULL when no z in input

yKnown

Logical vector specifying elements of y that can be found directly as elements in z

y

As y in input (not reduced) or estimated y when NULL y in input

Author(s)

Øyvind Langsrud

Examples

# Same data as in the paper
z <- RegSDCdata("sec7z")
x <- RegSDCdata("sec7x")
y <- RegSDCdata("sec7y")  # Now z is t(x) %*% y 

a <- ReduceX(x, z, y)
b <- ReduceX(x, z)
d <- ReduceX(x, z = NULL, y)  # No z in output

# Identical output for x and z
identical(a$x, b$x)
identical(a$x, d$x)
identical(a$z, b$z)

# Same y in output as input
identical(a$y, y)
identical(d$y, y)

# Estimate of y (yHat) when NULL y input
b$y

# These elements of y can be found directly in in z
y[a$yKnown, , drop = FALSE]
# They can be found by searching for unit colSums
colSums(x)[colSums(x) == 1]

# These trivial data rows can be omitted when processing data
x[!a$yKnown, ]
# Now several columns can be omitted since zero colSums
colSums0 <- colSums(x[!a$yKnown, ]) == 0
# The resulting matrix is output from the function
identical(x[!a$yKnown, !colSums0], a$x)

# Output z can be computed from this output x
identical(t(a$x) %*% y[!a$yKnown, , drop = FALSE], a$z)

Regression-based SDC Tools - Synthetic addition with residual correlation control

Description

Implementation of equation 6 (arbitrary residual data) and equation 7 (residual correlations) in the paper. The alpha limit is calculated (equation 9). The limit is used when alpha =1 cannot be chosen (warning produced). In output, alpha is attribute.

Usage

RegSDCadd(y, resCorr = NULL, x = NULL, yStart = NULL, ensureIntercept = TRUE)

Arguments

y

Matrix of confidential variables

resCorr

Required residual correlations (possibly recycled)

x

Matrix of non-confidential variables

yStart

Arbitrary data whose residuals will be used. Will be calculated from resCorr when NULL.

ensureIntercept

Whether to ensure/include a constant term. Non-NULL x is subjected to EnsureIntercept

Details

Use epsAlpha=NULL to avoid calculation of alpha. Use of alpha (<1) will produce a warning. Input matrices are subjected to EnsureMatrix.

Value

Generated version of y with alpha as attribute

Author(s)

Øyvind Langsrud

Examples

x <- matrix(1:10, 10, 1)
y <- matrix(rnorm(30) + 1:30, 10, 3)
yOut <- RegSDCadd(y, c(0.1, 0.2, 0.3), x)

# Correlations between residuals as required
diag(cor(residuals(lm(y ~ x)), residuals(lm(yOut ~ x))))

# Identical covariance matrices
cov(y) - cov(yOut)
cov(residuals(lm(y ~ x))) - cov(residuals(lm(yOut ~ x)))

# Identical regression results
summary(lm(y[, 1] ~ x))
summary(lm(yOut[, 1] ~ x))

# alpha as attribute
attr(yOut, "alpha")

# With yStart as input and alpha limit in use (warning produced)
yOut <- RegSDCadd(y, NULL, x, 2 * y + matrix(rnorm(30), 10, 3))
attr(yOut, "alpha")

# Same correlation for all variables
RegSDCadd(y, 0.2, x)
# But in this case RegSDCcomp is equivalent and faster
RegSDCcomp(y, 0.2, x)


# Make nearly collinear data
y[, 3] <- y[, 1] + y[, 2] + 0.001 * y[, 3]
# Not possible to achieve correlations. Small alpha with warning.
RegSDCadd(y, c(0.1, 0.2, 0.3), x)
# Exact collinear data
y[, 3] <- y[, 1] + y[, 2]
# Zero alpha with warning
RegSDCadd(y, c(0.1, 0.2, 0.3), x)

Regression-based SDC Tools - Synthetic addition

Description

Residuals from arbitrary data with a synthetic addition

Usage

RegSDCaddGen(
  y,
  yStart,
  x = NULL,
  epsAlpha = 1e-07,
  AlphaHandler = warning,
  alphaAttr = TRUE,
  makeunique = TRUE,
  ensureIntercept = TRUE
)

Arguments

y

Matrix of confidential variables

yStart

Arbitrary data whose residuals will be used

x

Matrix of non-confidential variables

epsAlpha

Precision constant for alpha calculation

AlphaHandler

Function (warning or stop) to be used when alpha<1

alphaAttr

When TRUE alpha is attribute in output

makeunique

Parameter to be used in GenQR

ensureIntercept

Whether to ensure/include a constant term. Non-NULL x is subjected to EnsureIntercept

Details

Use epsAlpha=NULL to avoid calculation of alpha. Use of alpha (<1) will produce a warning. Input matrices are subjected to EnsureMatrix.

Value

Generated version of y

Author(s)

Øyvind Langsrud

Regression-based SDC Tools - Component score correlation control

Description

Implementation of equation 8 in the paper.

Usage

RegSDCcomp(
  y,
  compCorr = NA,
  x = NULL,
  doSVD = FALSE,
  makeunique = TRUE,
  ensureIntercept = TRUE
)

Arguments

y

Matrix of confidential variables

compCorr

Required component score correlations (possibly recycled)

x

Matrix of non-confidential variables

doSVD

SVD when TRUE and QR when FALSE

makeunique

Parameter to be used in GenQR

ensureIntercept

Whether to ensure/include a constant term. Non-NULL x is subjected to EnsureIntercept

Details

NA component score correlation means independent random. Input matrices are subjected to EnsureMatrix.

Value

Generated version of y

Author(s)

Øyvind Langsrud

Examples

x <- matrix(1:10, 10, 1)
y <- matrix(rnorm(30) + 1:30, 10, 3)

# Same as IPSO (RegSDCipso)
RegSDCcomp(y, NA, x)

# Using QR and SVD
yQR <- RegSDCcomp(y, c(0.1, 0.2, NA), x)
ySVD <- RegSDCcomp(y, c(0.1, 0.2, NA), x, doSVD = TRUE)

# Calculation of residuals
r <- residuals(lm(y ~ x))
rQR <- residuals(lm(yQR ~ x))
rSVD <- residuals(lm(ySVD ~ x))

# Correlations for two first components as required
diag(cor(GenQR(r)$Q, GenQR(rQR)$Q))
diag(cor(GenQR(r, doSVD = TRUE)$Q, GenQR(rSVD, doSVD = TRUE)$Q))

# Identical covariance matrices
cov(yQR) - cov(ySVD)
cov(rQR) - cov(rSVD)

# Identical regression results
summary(lm(y[, 1] ~ x))
summary(lm(yQR[, 1] ~ x))
summary(lm(ySVD[, 1] ~ x))

Function that returns a dataset

Description

Function that returns a dataset

Usage

RegSDCdata(dataset)

Arguments

dataset

Name of data set within the RegSDC package

Details

sec7data: Data in section 7 of the paper as a data frame

sec7y: Y in section 7 of the paper as a matrix

sec7x: X in section 7 of the paper as a matrix

sec7z: Z in section 7 of the paper as a matrix

sec7xAll: Xall in section 7 of the paper as a matrix

sec7zAll: Zall in section 7 of the paper as a matrix

sec7zAllSupp: As Zall with suppressed values set to NA

Value

data frame

Author(s)

Øyvind Langsrud

Examples

RegSDCdata("sec7data")
RegSDCdata("sec7y")
RegSDCdata("sec7x")
RegSDCdata("sec7z")
RegSDCdata("sec7xAll")
RegSDCdata("sec7zAll")
RegSDCdata("sec7zAllSupp")

Regression-based SDC Tools - General data generation

Description

IPSO by QR or SVD, scores from arbitrary data, and ROMM

Usage

RegSDCgen(
  y,
  x = NULL,
  doSVD = FALSE,
  yNew = NULL,
  lambda = Inf,
  makeunique = TRUE,
  ensureIntercept = TRUE,
  returnParts = FALSE
)

Arguments

y

Matrix of confidential variables

x

Matrix of non-confidential variables

doSVD

SVD when TRUE and QR when FALSE

yNew

Matrix of y-data for new scores (simulated when NULL)

lambda

ROMM parameter

makeunique

Parameter to be used in GenQR

ensureIntercept

Whether to ensure/include a constant term. Non-NULL x is subjected to EnsureIntercept

returnParts

Alternative output two matrices: yHat (fitted) and yRes (generated residuals).

Details

doSVD has effect on decomposition of y and yNew. Input matrices are subjected to EnsureMatrix.

Value

Generated version of y

Author(s)

Øyvind Langsrud

Examples

exY <- matrix(rnorm(15), 5, 3)
RegSDCgen(exY)
RegSDCgen(exY, yNew = exY + 0.001 * matrix(rnorm(15), 5, 3))  # Close to exY
RegSDCgen(exY, lambda = 0.001)  # Close to exY

Regression-based SDC Tools - Generalized microaggregation

Description

Implementation of the methodology in section 6 in the paper

Usage

RegSDChybrid(
  y,
  clusters = NULL,
  xLocal = NULL,
  xGlobal = NULL,
  clusterPieces = NULL,
  xClusterPieces = NULL,
  groupedClusters = NULL,
  xGroupedClusters = NULL,
  alternative = NULL,
  alpha = NULL,
  ySim = NULL,
  returnParts = FALSE,
  epsAlpha = 1e-07,
  makeunique = TRUE,
  tolerance = sqrt(.Machine$double.eps)
)

Arguments

y

Matrix of confidential variables

clusters

Vector of cluster coding

xLocal

Matrix of x-variables to be crossed with clusters

xGlobal

Matrix of x-variables NOT to be crossed with clusters

clusterPieces

Vector of coding of cluster pieces

xClusterPieces

Matrix of x-variables to be crossed with cluster pieces

groupedClusters

Vector of coding of grouped clusters

xGroupedClusters

Matrix of x-variables to be crossed with grouped clusters

alternative

One of "" (default), "a", "b" or "c"

alpha

Possible to specify parameter used internally by alternative "c"

ySim

Possible to specify the internally simulated data manually

returnParts

Alternative output six matrices: y1 and y2 (fitted), e3s and e4s (new residuals), e3 and e4 (original residuals)

epsAlpha

Precision constant for alpha calculation

makeunique

Parameter to be used in GenQR

tolerance

Parameter to Cdiff used within the algorithm

Details

Input matrices are subjected to EnsureMatrix. Necessary constant terms (intercept) are automatically included. That is, a column of ones is not needed in the input matrices.

Value

Generated version of y

Author(s)

Øyvind Langsrud

Examples

#################################################
# Generate example data for introductory examples
################################################# 
y <- matrix(rnorm(30) + 1:30, 10, 3)
x <- matrix(1:10, 10, 1)  # x <- 1:10 is equivalent

# Same as RegSDCipso(y)
yOut <- RegSDChybrid(y)

# With a single cluster both are same as RegSDCipso(y, x)
yOut <- RegSDChybrid(y, xLocal = x)
yOut <- RegSDChybrid(y, xGlobal = x)

# Define two clusters
clust <- rep(1:2, each = 5)

# MHa and MHb in paper
yMHa <- RegSDChybrid(y, clusters = clust, xLocal = x)
yMHb <- RegSDChybrid(y, clusterPieces = clust, xLocal = x)

# An extended variant of MHb as mentioned in paper paragraph below definition of MHa/MHb
yMHbExt <- RegSDChybrid(y, clusterPieces = clust, xClusterPieces = x)

# Identical means within clusters
aggregate(y, list(clust = clust), mean)
aggregate(yMHa, list(clust = clust), mean)
aggregate(yMHb, list(clust = clust), mean)
aggregate(yMHbExt, list(clust = clust), mean)

# Identical global regression results
summary(lm(y[, 1] ~ x))
summary(lm(yMHa[, 1] ~ x))
summary(lm(yMHb[, 1] ~ x))
summary(lm(yMHbExt[, 1] ~ x))

# MHa: Identical local regression results
summary(lm(y[, 1] ~ x, subset = clust == 1))
summary(lm(yMHa[, 1] ~ x, subset = clust == 1))

# MHb: Different results
summary(lm(yMHb[, 1] ~ x, subset = clust == 1))

# MHbExt: Same estimates and different std. errors
summary(lm(yMHbExt[, 1] ~ x, subset = clust == 1))

###################################################
#  Generate example data for more advanced examples
###################################################
x <- matrix((1:90) * (1 + runif(90)), 30, 3)
x1 <- x[, 1]
x2 <- x[, 2]
x3 <- x[, 3]
y <- matrix(rnorm(90), 30, 3) + x
clust <- paste("c", rep(1:3, each = 10), sep = "")

######## Run main algorithm
z0 <- RegSDChybrid(y, clusters = clust, xLocal = x3, xGlobal = cbind(x1, x2))

# Corresponding models by lm
lmy <- lm(y ~ clust + x1 + x2 + x3:clust)
lm0 <- lm(z0 ~ clust + x1 + x2 + x3:clust)

# Preserved regression coef (x3 within clusters)
coef(lmy) - coef(lm0)

# Preservation of x3 coef locally can also be seen by local regression
coef(lm(y ~ x3, subset = clust == "c2")) - coef(lm(z0 ~ x3, subset = clust == "c2"))

# Covariance matrix preserved
cov(resid(lmy)) - cov(resid(lm0))

# But not preserved within clusters
cov(resid(lmy)[clust == "c2", ]) - cov(resid(lm0)[clust == "c2", ])

######## Modification (a)
za <- RegSDChybrid(y, clusters = clust, xLocal = x3, xGlobal = cbind(x1, x2), alternative = "a")
lma <- lm(za ~ clust + x1 + x2 + x3:clust)

# Now covariance matrices preserved within clusters
cov(resid(lmy)[clust == "c2", ]) - cov(resid(lma)[clust == "c2", ])

# If we estimate coef for x1 and x2 within clusters, 
# they become identical and identical to global estimates
coef(lma)
coef(lm(za ~ clust + x1:clust + x2:clust + x3:clust))

######## Modification (c) with automatic calculation of alpha 
# The result depends on the randomly generated data
# When the result is that alpha=1, modification (b) is equivalent
zc <- RegSDChybrid(y, clusters = clust, xLocal = x3, xGlobal = cbind(x1, x2), alternative = "c")
lmc <- lm(zc ~ clust + x1 + x2 + x3:clust)

# Preserved regression coef as above
coef(lmy) - coef(lmc)

# Again covariance matrices preserved within clusters
cov(resid(lmy)[clust == "c2", ]) - cov(resid(lmc)[clust == "c2", ])

# If we estimate coef for x1 and x2 within clusters, 
# results are different from modification (a) above
coef(lmc)
coef(lm(zc ~ clust + x1:clust + x2:clust + x3:clust))


####################################################
# Make groups of clusters (d) and cluster pieces (e)
####################################################
clustGr <- paste("gr", ceiling(rep(1:3, each = 10)/2 + 0.1), sep = "")
clustP <- c("a", "a", rep("b", 28))

######## Modifications (c), (d) and (e)
zGrP <- RegSDChybrid(y, clusters = clust, clusterPieces = clustP, groupedClusters = clustGr,
                     xLocal = x3, xGroupedClusters = x2, xGlobal = x1, alternative = "c")

# Corresponding models by lm
lmGrP <- lm(zGrP ~ clust:clustP + x1 + x2:clustGr + x3:clust - 1)
lmY <- lm(y ~ clust:clustP + x1 + x2:clustGr + x3:clust - 1)

# Preserved regression coef
coef(lmY) - coef(lmGrP)

# Identical means within cluster pieces
aggregate(y, list(clust = clust, clustP = clustP), mean)
aggregate(zGrP, list(clust = clust, clustP = clustP), mean)

# Covariance matrix preserved
cov(resid(lmY)) - cov(resid(lmGrP))

# Covariance matrices preserved within clusters
cov(resid(lmY)[clust == "c2", ]) - cov(resid(lmGrP)[clust == "c2", ])

# Covariance matrices not preserved within cluster pieces
cov(resid(lmY)[clustP == "a", ]) - cov(resid(lmGrP)[clustP == "a", ])

Regression-based SDC Tools - Ordinary synthetic data (IPSO)

Description

Implementation of equation 4 in the paper.

Usage

RegSDCipso(y, x = NULL, ensureIntercept = TRUE)

Arguments

y

Matrix of confidential variables

x

Matrix of non-confidential variables

ensureIntercept

Whether to ensure/include a constant term. Non-NULL x is subjected to EnsureIntercept

Details

Input matrices are subjected to EnsureMatrix.

Value

Generated version of y

Author(s)

Øyvind Langsrud

Examples

x <- matrix(1:5, 5, 1)
y <- matrix(rnorm(15) + 1:15, 5, 3)
ySynth <- RegSDCipso(y, x)

# Identical regression results
summary(lm(y[, 1] ~ x))
summary(lm(ySynth[, 1] ~ x))

# Identical covariance matrices
cov(y) - cov(ySynth)
cov(residuals(lm(y ~ x))) - cov(residuals(lm(ySynth ~ x)))

Regression-based SDC Tools - Scores from new data

Description

Implementation of equation 12 in the paper.

Usage

RegSDCnew(y, yNew, x = NULL, doSVD = FALSE, ensureIntercept = TRUE)

Arguments

y

Matrix of confidential variables

yNew

Matrix of y-data for new scores

x

Matrix of non-confidential variables

doSVD

SVD when TRUE and QR when FALSE

ensureIntercept

Whether to ensure/include a constant term. Non-NULL x is subjected to EnsureIntercept

Details

doSVD has effect on decomposition of y and yNew. Input matrices are subjected to EnsureMatrix.

Value

Generated version of y

Author(s)

Øyvind Langsrud

Examples

x <- matrix(1:5, 5, 1)
y <- matrix(rnorm(15) + 1:15, 5, 3)

# Same as IPSO (RegSDCipso)
RegSDCnew(y, matrix(rnorm(15), 5, 3), x)

# Close to y
RegSDCnew(y, y + 0.001 * matrix(rnorm(15), 5, 3), x)

Regression-based SDC Tools - Random orthogonal matrix masking (ROMM)

Description

Implementation based on equations 11, 12 and 17 in the paper.

Usage

RegSDCromm(y, lambda = Inf, x = NULL, doSVD = FALSE, ensureIntercept = TRUE)

Arguments

y

Matrix of confidential variables

lambda

ROMM parameter

x

Matrix of non-confidential variables

doSVD

SVD when TRUE and QR when FALSE

ensureIntercept

Whether to ensure/include a constant term. Non-NULL x is subjected to EnsureIntercept

Details

doSVD has effect on decomposition of y. The exact behaviour of the method depends on the choice of the decomposition method because of the sequentially phenomenon mentioned in the paper. The similarity to the original data will tend to be highest for the first component. Input matrices are subjected to EnsureMatrix.

Value

Generated version of y

Author(s)

Øyvind Langsrud

Examples

x <- matrix(1:5, 5, 1)
y <- matrix(rnorm(15) + 1:15, 5, 3)

# Same as IPSO (RegSDCipso)
RegSDCromm(y, Inf, x)

# Close to IPSO
RegSDCromm(y, 100, x)

# Close to y
RegSDCromm(y, 0.001, x)

Suppressed tabular data: Inner cell frequencies as decimal numbers

Description

Assume that frequencies to be published, z, can be computed from inner frequencies, y, via z = t(x) %*% y, where x is a dummy matrix. Assuming correct suppression, this function will generate safe inner cell frequencies as decimal numbers.

Usage

SuppressDec(
  x,
  z = NULL,
  y = NULL,
  suppressed = NULL,
  digits = 9,
  nRep = 1,
  yDeduct = NULL,
  resScale = NULL,
  rmse = NULL,
  sparseLimit = 500
)

Arguments

x

Dummy matrix where the dimensions matches z and/or y input. Sparse matrix (Matrix package) is possible.

z

Frequencies to be published. All, only the safe ones or with suppressed as NA.

y

Inner cell frequencies (see details).

suppressed

Logical vector defining the suppressed elements of z.

digits

Output close to whole numbers will be rounded using digits as input to RoundWhole.

nRep

Integer, when >1, several y's will be generated. Extra columns in output.

yDeduct

Values to be subtracted from y and added back after the calculations. Can be used to perform the modulo method described in the paper (see examples).

resScale

Residuals will be scaled by resScale

rmse

Desired root mean square error (residual standard error). Will be used when resScale is NULL or cannot be used.

sparseLimit

Limit for the number of rows of a reduced x-matrix within the algorithm. When exceeded, a sparse algorithm is used (see IpsoExtra).

Details

This function makes use of ReduceX and RegSDCipso. It is not required that y consists of cell frequencies. A multivariate y or z is also possible. Then several values are possible as digits, resScale and rmse input.

Value

The inner cell frequencies as decimal numbers

Note

Capital letters, X, Y and Z, are used in the paper.

Author(s)

Øyvind Langsrud

Examples

# Same data as in the paper
z <- RegSDCdata("sec7z")
x <- RegSDCdata("sec7x")
y <- RegSDCdata("sec7y")  # Now z is t(x) %*% y 
zAll <- RegSDCdata("sec7zAll")
zAllSupp <- RegSDCdata("sec7zAllSupp")
xAll <- RegSDCdata("sec7xAll")

# When no suppression, output is identical to y
SuppressDec(xAll, zAll, y)
SuppressDec(xAll, zAll)  # y can be seen in z

# Similar to Y* in paper (but other random values)
SuppressDec(x, z, y)

# Residual standard error forced to be 1
SuppressDec(x, z, y, rmse = 1)

# Seven ways of obtaining the same output
SuppressDec(x, z, rmse = 1)  # slower, y must be estimated
SuppressDec(x, y = y, rmse = 1)
SuppressDec(xAll, zAllSupp, y, rmse = 1)
SuppressDec(xAll, zAllSupp, rmse = 1)  # slower, y must be estimated
SuppressDec(xAll, zAll, y, is.na(zAllSupp), rmse = 1)
SuppressDec(xAll, zAll, suppressed = is.na(zAllSupp), rmse = 1)  # y seen in z
SuppressDec(xAll, y = y, suppressed = is.na(zAllSupp), rmse = 1)

# YhatMod4 and YhatMod10 in Table 2 in paper
SuppressDec(xAll, zAllSupp, y, yDeduct = 4 * (y%/%4), resScale = 0)
SuppressDec(xAll, zAllSupp, y, yDeduct = 10 * (y%/%10), rmse = 0)

# As data in Table 3 in paper (but other random values)
SuppressDec(xAll, zAllSupp, y, yDeduct = 10 * (y%/%10), resScale = 0.1)

# rmse instead of resScale and 5 draws
SuppressDec(xAll, zAllSupp, y, yDeduct = 10 * (y%/%10), rmse = 1, nRep = 5)

Suppressed tabular data: Yhat from X and Z

Description

Implementation of equation 21 in the paper.

Usage

Z2Yhat(z, x, digits = 9)

Arguments

z

Z as a matrix

x

X as a matrix

digits

When non-NULL, output values close to whole numbers will be rounded using digits as input to RoundWhole.

Details

Generalized inverse is computed by ginv. In practise, the computations can be speeded up using reduced versions of X and Z. See ReduceX.

Value

Yhat as a matrix

Author(s)

Øyvind Langsrud

Examples

# Same data as in the paper
z <- RegSDCdata("sec7z")
x <- RegSDCdata("sec7x")
Z2Yhat(z, x)

# With y known, yHat can be computed in other ways
y <- RegSDCdata("sec7y")  # Now z is t(x) %*% y 
fitted(lm(y ~ x - 1))
IpsoExtra(y, x, FALSE, resScale = 0)

Calculation of C by solving equation 10 in the paper

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Matrix difference (a-b) including checking for equal columns

Description

Usage

Arguments

Value

Examples

Ensure constant term in matrix

Description

Usage

Arguments

Value

Author(s)

Examples

Ensure that input is matrix (by as.matrix) and check number of rows (and columns)

Description

Usage

Arguments

Value

Author(s)

Examples

Calculation of alpha

Description

Usage

Arguments

Value

Note

Author(s)

See Also

Generalized QR decomposition

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Extended variant of RegSDCipso

Description

Usage

Arguments

Value

Author(s)

References

Examples

Suppressed tabular data: Reduce dummy matrix, X (and estimate Y)

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Regression-based SDC Tools - Synthetic addition with residual correlation control

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Regression-based SDC Tools - Synthetic addition

Description

Usage

Arguments

Details

Value

Author(s)

Regression-based SDC Tools - Component score correlation control

Description

Usage

Arguments

Details