Type: | Package |
Title: | Information Preserving Regression-Based Tools for Statistical Disclosure Control |
Version: | 1.0.0 |
Date: | 2025-02-03 |
Author: | Øyvind Langsrud [aut, cre] |
Maintainer: | Øyvind Langsrud <oyl@ssb.no> |
Depends: | R (≥ 3.0.0) |
Imports: | SSBtools (≥ 1.3.4), MASS, Matrix |
Description: | Implementation of the methods described in the paper with the above title: Langsrud, Ø. (2019) <doi:10.1007/s11222-018-9848-9>. The package can be used to generate synthetic or hybrid continuous microdata, and the relationship to the original data can be controlled in several ways. A function for replacing suppressed tabular cell frequencies with decimal numbers is included. |
License: | Apache License 2.0 | file LICENSE |
Encoding: | UTF-8 |
URL: | https://github.com/olangsrud/RegSDC, https://olangsrud.github.io/RegSDC/ |
BugReports: | https://github.com/olangsrud/RegSDC/issues |
RoxygenNote: | 7.3.1 |
NeedsCompilation: | no |
Packaged: | 2025-02-03 14:37:18 UTC; oyl |
Repository: | CRAN |
Date/Publication: | 2025-02-03 16:30:07 UTC |
Calculation of C by solving equation 10 in the paper
Description
The limit calculated by FindAlpha
is used when alpha =1 cannot be chosen (warning produced).
In output, alpha is attribute.
Usage
CalculateCdirect(a, b, epsAlpha = 1e-07, AlphaHandler = warning, alpha = NULL)
CalculateC(a, b, ..., viaQR = NULL, returnAlpha = FALSE)
Arguments
a |
matrix E in paper |
b |
matrix Eg in paper |
epsAlpha |
Precision constant for alpha calculation |
AlphaHandler |
Function (warning or stop) to be used when alpha<1 |
alpha |
Possible with alpha as input instead of computing |
... |
Arguments to CalculateCdirect |
viaQR |
When TRUE QR is involved. This may be needed to handle colinear data. When NULL viaQR is set to TRUE if ordinary computations fail. |
returnAlpha |
When TRUE alpha (1 or value below 1) is returned instead of C. Attribute viaQR is included. |
Details
When epsAlpha=NULL calculations are performed directly (alpha=1) and alpha is not attribute.
Value
Calculated C with attributes alpha and viaQR (when CalculateC)
Author(s)
Øyvind Langsrud
Examples
x <- 1:10
y <- matrix(rnorm(30) + 1:30, 10, 3)
a <- residuals(lm(y ~ x))
b <- residuals(lm(2 * y + matrix(rnorm(30), 10, 3) ~ x))
a1 <- a
b1 <- b
a1[, 3] <- a[, 1] + a[, 2]
b1[, 3] <- b[, 1] + b[, 2]
alpha <- FindAlpha(a, b)
FindAlphaSimple(a, b) # Same result as above
CalculateC(a, b)
CalculateCdirect(a, b) # Same result as above without viaQR attribute
CalculateCdirect(a, b, alpha = alpha/(1 + 1e-07)) # Same result as above since epsAlpha = 1e-07
CalculateCdirect(a, b, alpha = alpha/2) # OK
# CalculateCdirect(a,b, alpha = 2*alpha) # Not OK
FindAlpha(a, b1)
# FindAlphaSimple(a,b1) # Not working since b1 is collinear
CalculateC(a, b1, returnAlpha = TRUE) # Almost same alpha as above (epsAlpha cause difference)
FindAlpha(b, a)
CalculateC(b, a, returnAlpha = TRUE) # 1 returned (not same as above)
CalculateC(b, a)
FindAlpha(b1, a) # alpha smaller than epsAlpha is set to 0 in CalculateC
CalculateC(b1, a) # When alpha = 0 C is calculated by GenQR insetad of chol
Matrix difference (a-b) including checking for equal columns
Description
Each column is checked by all.equal
Usage
Cdiff(a, b, tolerance = sqrt(.Machine$double.eps))
Arguments
a |
numerical matrix |
b |
numerical matrix |
tolerance |
parameter to |
Value
(a-b) where equal columns are set to zero
Examples
a <- matrix(rnorm(6), 3, 2)
b <- matrix(rnorm(6), 3, 2)
a - b
Cdiff(a, b)
b[, 1] <- a[, 1] + (.Machine$double.eps)^(2/3) * b[, 1]
a - b
Cdiff(a, b)
a[, 2] <- b[, 2]
a - b
Cdiff(a, b)
Ensure constant term in matrix
Description
A column of ones may be added
Usage
EnsureIntercept(x)
Arguments
x |
Input matrix |
Value
The input matrix possibly with a column of ones added
Author(s)
Øyvind Langsrud
Examples
x <- matrix(c(5, 8, 4, 2, 7, 6), 3, 2)
EnsureIntercept(x)
EnsureIntercept(cbind(x, 2))
EnsureIntercept(cbind(x, 0))
EnsureIntercept(matrix(0, 4, 0))
Ensure that input is matrix (by as.matrix) and check number of rows (and columns)
Description
Ensure that input is matrix (by as.matrix) and check number of rows (and columns)
Usage
EnsureMatrix(x, nRow = NULL, nCol = NULL)
Arguments
x |
NULL or input to as.matrix |
nRow |
Expected number of rows |
nCol |
Expected number of columns |
Value
Input as a matrix
Author(s)
Øyvind Langsrud
Examples
x <- matrix(c(5, 8, 4, 2, 7, 6), 3, 2)
EnsureMatrix(x)
EnsureMatrix(x, 3)
EnsureMatrix(1:4)
EnsureMatrix(1:4, 4)
EnsureMatrix(NULL, 4)
try(EnsureMatrix(x, 4))
try(EnsureMatrix(1:3, 4))
EnsureMatrix(x, 3, 2)
try(EnsureMatrix(x, 3, 3))
try(EnsureMatrix(NULL, 3, 3))
Calculation of alpha
Description
Function to find the largest alpha that makes equation 10 in the paper solvable.
Usage
FindAlpha(a, b, tryViaQR = TRUE)
FindAlphaSimple(a, b)
Arguments
a |
matrix E in paper |
b |
matrix Eg in paper |
tryViaQR |
When TRUE QR transformation used (to handle collinearity) when ordinary calculations fail. |
Value
alpha
Note
FindAlphaSimple performs the calculations by a simple/direct method. FindAlpha is made to handle problematic special cases.
Author(s)
Øyvind Langsrud
See Also
See examples in the documentation of CalculateC
Generalized QR decomposition
Description
Matrix X decomposed as Q and R (X=QR) where columns of Q are orthonormal. Ordinary QR or SVD may be used.
Usage
GenQR(x, doSVD = FALSE, findR = TRUE, makeunique = findR, tol = 1e-07)
Arguments
x |
Matrix to be decomposed |
doSVD |
When TRUE SVD instead of QR |
findR |
When FALSE only Q returned |
makeunique |
When TRUE force uniqueness by positive diagonal elements (QR) or by column sums (SVD) |
tol |
As input to qr or, in the case of svd(), similar as input to MASS::ginv(). |
Details
To handle dependency a usual decomposition of X is PX=QR where P is a permutation matrix. This function returns RP^T as R. When SVD, Q=U and R=SV^T.
Value
List with Q and R or just Q
Author(s)
Øyvind Langsrud
Examples
GenQR(matrix(rnorm(15),5,3))
GenQR(matrix(rnorm(15),5,3)[,c(1,2,1,3)])
GenQR(matrix(rnorm(15),5,3)[,c(1,2,1,3)],TRUE)
Extended variant of RegSDCipso
Description
Possible to generate several y's and to re-scale residuals. Regression fitting by a sparse matrix algorithm is also possible (see reference).
Usage
IpsoExtra(
y,
x = NULL,
ensureIntercept = TRUE,
returnParts = FALSE,
nRep = 1,
resScale = NULL,
digits = 9,
rmse = NULL,
sparseLimit = 500,
printInc = TRUE
)
Arguments
y |
Matrix of confidential variables |
x |
Matrix of non-confidential variables |
ensureIntercept |
Whether to ensure/include a constant term. Non-NULL x is subjected to |
returnParts |
Alternative output two matrices: yHat (fitted) and yRes (generated residuals). |
nRep |
Integer, when >1, several y's will be generated. Extra columns in output. |
resScale |
Residuals will be scaled by resScale |
digits |
Digits used to detect perfect fit (caused by fitted values as input). This checking will be done only when rmse is in input. When perfect fit, rmse will be used instead of resScale. |
rmse |
Desired root mean square error (residual standard error). Will be used when resScale is NULL or cannot be used (see parameter digits). This parameter forces the rmse value for one y variable (the first). |
sparseLimit |
Limit for the number of rows of a reduced x-matrix within the algorithm. When exceeded, a sparse algorithm is used (see reference). |
Value
Generated version of y
Author(s)
Øyvind Langsrud
References
Douglas Bates and R Development Core Team (2022),
Comparing Least Squares Calculations,
R Vignette,
vignette("Comparisons", package="Matrix")
.
Examples
x <- matrix(1:5, 5, 1)
y <- matrix(10 * (sample(7:39, 15) + 4 * (1:15)), 5, 3)
colnames(y) <- paste("y", 1:3, sep = "")
y1 <- y[, 1, drop = FALSE]
IpsoExtra(y, x) # Same as RegSDCipso(y, x)
IpsoExtra(y, x, resScale = 0) # Fitted values (whole numbers in this case)
IpsoExtra(y, x, nRep = 2, resScale = 1e-05) # Downscaled residuals
ySynth <- IpsoExtra(y1, x, nRep = 2, rmse = 0.25) # Downscaled residuals
summary(lm(ySynth ~ x)) # Identical regression results with Residual standard error: 0.25
IpsoExtra(fitted(lm(y1 ~ x)), x, nRep = 2, resScale = 0.1) # resScale no effect since perfect fit
IpsoExtra(fitted(lm(y1 ~ x)), x, nRep = 2, resScale = 0.1, rmse = 2) # with warning
# Using data in the paper
IpsoExtra(RegSDCdata("sec7y"), RegSDCdata("sec7x")) # Similar to Y*
IpsoExtra(RegSDCdata("sec7y"), RegSDCdata("sec7x"), rmse = 1)
Suppressed tabular data: Reduce dummy matrix, X (and estimate Y)
Description
In section 7 in the paper Z = t(X) %*% Y
where X
is a dummy matrix.
Some elements of Y can be found directly as elements in Z. Corresponding rows of X will be removed.
After removing rows, some columns will only have zeros and these will also be removed.
Usage
ReduceX(x, z = NULL, y = NULL, digits = 9)
Arguments
x |
X as a matrix |
z |
Z as a matrix |
y |
Y as a matrix |
digits |
When non-NULL and when NULL y input, output y estimates close to whole numbers will be rounded using
|
Details
To estimate Y, this function finds some values directly from Z and other values by running Z2Yhat
on reduced versions of X and Z.
Value
A list of four elements:
x |
Reduced |
z |
Corresponding reduced |
yKnown |
Logical vector specifying elements of y that can be found directly as elements in z |
y |
As |
Author(s)
Øyvind Langsrud
Examples
# Same data as in the paper
z <- RegSDCdata("sec7z")
x <- RegSDCdata("sec7x")
y <- RegSDCdata("sec7y") # Now z is t(x) %*% y
a <- ReduceX(x, z, y)
b <- ReduceX(x, z)
d <- ReduceX(x, z = NULL, y) # No z in output
# Identical output for x and z
identical(a$x, b$x)
identical(a$x, d$x)
identical(a$z, b$z)
# Same y in output as input
identical(a$y, y)
identical(d$y, y)
# Estimate of y (yHat) when NULL y input
b$y
# These elements of y can be found directly in in z
y[a$yKnown, , drop = FALSE]
# They can be found by searching for unit colSums
colSums(x)[colSums(x) == 1]
# These trivial data rows can be omitted when processing data
x[!a$yKnown, ]
# Now several columns can be omitted since zero colSums
colSums0 <- colSums(x[!a$yKnown, ]) == 0
# The resulting matrix is output from the function
identical(x[!a$yKnown, !colSums0], a$x)
# Output z can be computed from this output x
identical(t(a$x) %*% y[!a$yKnown, , drop = FALSE], a$z)
Regression-based SDC Tools - Synthetic addition with residual correlation control
Description
Implementation of equation 6 (arbitrary residual data) and equation 7 (residual correlations) in the paper. The alpha limit is calculated (equation 9). The limit is used when alpha =1 cannot be chosen (warning produced). In output, alpha is attribute.
Usage
RegSDCadd(y, resCorr = NULL, x = NULL, yStart = NULL, ensureIntercept = TRUE)
Arguments
y |
Matrix of confidential variables |
resCorr |
Required residual correlations (possibly recycled) |
x |
Matrix of non-confidential variables |
yStart |
Arbitrary data whose residuals will be used. Will be calculated from resCorr when NULL. |
ensureIntercept |
Whether to ensure/include a constant term. Non-NULL x is subjected to |
Details
Use epsAlpha=NULL to avoid calculation of alpha. Use of alpha (<1) will produce a warning. Input matrices are subjected to EnsureMatrix
.
Value
Generated version of y with alpha as attribute
Author(s)
Øyvind Langsrud
Examples
x <- matrix(1:10, 10, 1)
y <- matrix(rnorm(30) + 1:30, 10, 3)
yOut <- RegSDCadd(y, c(0.1, 0.2, 0.3), x)
# Correlations between residuals as required
diag(cor(residuals(lm(y ~ x)), residuals(lm(yOut ~ x))))
# Identical covariance matrices
cov(y) - cov(yOut)
cov(residuals(lm(y ~ x))) - cov(residuals(lm(yOut ~ x)))
# Identical regression results
summary(lm(y[, 1] ~ x))
summary(lm(yOut[, 1] ~ x))
# alpha as attribute
attr(yOut, "alpha")
# With yStart as input and alpha limit in use (warning produced)
yOut <- RegSDCadd(y, NULL, x, 2 * y + matrix(rnorm(30), 10, 3))
attr(yOut, "alpha")
# Same correlation for all variables
RegSDCadd(y, 0.2, x)
# But in this case RegSDCcomp is equivalent and faster
RegSDCcomp(y, 0.2, x)
# Make nearly collinear data
y[, 3] <- y[, 1] + y[, 2] + 0.001 * y[, 3]
# Not possible to achieve correlations. Small alpha with warning.
RegSDCadd(y, c(0.1, 0.2, 0.3), x)
# Exact collinear data
y[, 3] <- y[, 1] + y[, 2]
# Zero alpha with warning
RegSDCadd(y, c(0.1, 0.2, 0.3), x)
Regression-based SDC Tools - Synthetic addition
Description
Residuals from arbitrary data with a synthetic addition
Usage
RegSDCaddGen(
y,
yStart,
x = NULL,
epsAlpha = 1e-07,
AlphaHandler = warning,
alphaAttr = TRUE,
makeunique = TRUE,
ensureIntercept = TRUE
)
Arguments
y |
Matrix of confidential variables |
yStart |
Arbitrary data whose residuals will be used |
x |
Matrix of non-confidential variables |
epsAlpha |
Precision constant for alpha calculation |
AlphaHandler |
Function (warning or stop) to be used when alpha<1 |
alphaAttr |
When TRUE alpha is attribute in output |
makeunique |
Parameter to be used in GenQR |
ensureIntercept |
Whether to ensure/include a constant term. Non-NULL x is subjected to |
Details
Use epsAlpha=NULL to avoid calculation of alpha. Use of alpha (<1) will produce a warning. Input matrices are subjected to EnsureMatrix
.
Value
Generated version of y
Author(s)
Øyvind Langsrud
Regression-based SDC Tools - Component score correlation control
Description
Implementation of equation 8 in the paper.
Usage
RegSDCcomp(
y,
compCorr = NA,
x = NULL,
doSVD = FALSE,
makeunique = TRUE,
ensureIntercept = TRUE
)
Arguments
y |
Matrix of confidential variables |
compCorr |
Required component score correlations (possibly recycled) |
x |
Matrix of non-confidential variables |
doSVD |
SVD when TRUE and QR when FALSE |
makeunique |
Parameter to be used in GenQR |
ensureIntercept |
Whether to ensure/include a constant term. Non-NULL x is subjected to |
Details
NA component score correlation means independent random. Input matrices are subjected to EnsureMatrix
.
Value
Generated version of y
Author(s)
Øyvind Langsrud
Examples
x <- matrix(1:10, 10, 1)
y <- matrix(rnorm(30) + 1:30, 10, 3)
# Same as IPSO (RegSDCipso)
RegSDCcomp(y, NA, x)
# Using QR and SVD
yQR <- RegSDCcomp(y, c(0.1, 0.2, NA), x)
ySVD <- RegSDCcomp(y, c(0.1, 0.2, NA), x, doSVD = TRUE)
# Calculation of residuals
r <- residuals(lm(y ~ x))
rQR <- residuals(lm(yQR ~ x))
rSVD <- residuals(lm(ySVD ~ x))
# Correlations for two first components as required
diag(cor(GenQR(r)$Q, GenQR(rQR)$Q))
diag(cor(GenQR(r, doSVD = TRUE)$Q, GenQR(rSVD, doSVD = TRUE)$Q))
# Identical covariance matrices
cov(yQR) - cov(ySVD)
cov(rQR) - cov(rSVD)
# Identical regression results
summary(lm(y[, 1] ~ x))
summary(lm(yQR[, 1] ~ x))
summary(lm(ySVD[, 1] ~ x))
Function that returns a dataset
Description
Function that returns a dataset
Usage
RegSDCdata(dataset)
Arguments
dataset |
Name of data set within the RegSDC package |
Details
sec7data: Data in section 7 of the paper as a data frame
sec7y: Y in section 7 of the paper as a matrix
sec7x: X in section 7 of the paper as a matrix
sec7z: Z in section 7 of the paper as a matrix
sec7xAll: Xall in section 7 of the paper as a matrix
sec7zAll: Zall in section 7 of the paper as a matrix
sec7zAllSupp: As Zall with suppressed values set to NA
Value
data frame
Author(s)
Øyvind Langsrud
Examples
RegSDCdata("sec7data")
RegSDCdata("sec7y")
RegSDCdata("sec7x")
RegSDCdata("sec7z")
RegSDCdata("sec7xAll")
RegSDCdata("sec7zAll")
RegSDCdata("sec7zAllSupp")
Regression-based SDC Tools - General data generation
Description
IPSO by QR or SVD, scores from arbitrary data, and ROMM
Usage
RegSDCgen(
y,
x = NULL,
doSVD = FALSE,
yNew = NULL,
lambda = Inf,
makeunique = TRUE,
ensureIntercept = TRUE,
returnParts = FALSE
)
Arguments
y |
Matrix of confidential variables |
x |
Matrix of non-confidential variables |
doSVD |
SVD when TRUE and QR when FALSE |
yNew |
Matrix of y-data for new scores (simulated when NULL) |
lambda |
ROMM parameter |
makeunique |
Parameter to be used in GenQR |
ensureIntercept |
Whether to ensure/include a constant term. Non-NULL x is subjected to |
returnParts |
Alternative output two matrices: yHat (fitted) and yRes (generated residuals). |
Details
doSVD has effect on decomposition of y and yNew. Input matrices are subjected to EnsureMatrix
.
Value
Generated version of y
Author(s)
Øyvind Langsrud
Examples
exY <- matrix(rnorm(15), 5, 3)
RegSDCgen(exY)
RegSDCgen(exY, yNew = exY + 0.001 * matrix(rnorm(15), 5, 3)) # Close to exY
RegSDCgen(exY, lambda = 0.001) # Close to exY
Regression-based SDC Tools - Generalized microaggregation
Description
Implementation of the methodology in section 6 in the paper
Usage
RegSDChybrid(
y,
clusters = NULL,
xLocal = NULL,
xGlobal = NULL,
clusterPieces = NULL,
xClusterPieces = NULL,
groupedClusters = NULL,
xGroupedClusters = NULL,
alternative = NULL,
alpha = NULL,
ySim = NULL,
returnParts = FALSE,
epsAlpha = 1e-07,
makeunique = TRUE,
tolerance = sqrt(.Machine$double.eps)
)
Arguments
y |
Matrix of confidential variables |
clusters |
Vector of cluster coding |
xLocal |
Matrix of x-variables to be crossed with clusters |
xGlobal |
Matrix of x-variables NOT to be crossed with clusters |
clusterPieces |
Vector of coding of cluster pieces |
xClusterPieces |
Matrix of x-variables to be crossed with cluster pieces |
groupedClusters |
Vector of coding of grouped clusters |
xGroupedClusters |
Matrix of x-variables to be crossed with grouped clusters |
alternative |
One of "" (default), "a", "b" or "c" |
alpha |
Possible to specify parameter used internally by alternative "c" |
ySim |
Possible to specify the internally simulated data manually |
returnParts |
Alternative output six matrices: y1 and y2 (fitted), e3s and e4s (new residuals), e3 and e4 (original residuals) |
epsAlpha |
Precision constant for alpha calculation |
makeunique |
Parameter to be used in GenQR |
tolerance |
Parameter to |
Details
Input matrices are subjected to EnsureMatrix
.
Necessary constant terms (intercept) are automatically included.
That is, a column of ones is not needed in the input matrices.
Value
Generated version of y
Author(s)
Øyvind Langsrud
Examples
#################################################
# Generate example data for introductory examples
#################################################
y <- matrix(rnorm(30) + 1:30, 10, 3)
x <- matrix(1:10, 10, 1) # x <- 1:10 is equivalent
# Same as RegSDCipso(y)
yOut <- RegSDChybrid(y)
# With a single cluster both are same as RegSDCipso(y, x)
yOut <- RegSDChybrid(y, xLocal = x)
yOut <- RegSDChybrid(y, xGlobal = x)
# Define two clusters
clust <- rep(1:2, each = 5)
# MHa and MHb in paper
yMHa <- RegSDChybrid(y, clusters = clust, xLocal = x)
yMHb <- RegSDChybrid(y, clusterPieces = clust, xLocal = x)
# An extended variant of MHb as mentioned in paper paragraph below definition of MHa/MHb
yMHbExt <- RegSDChybrid(y, clusterPieces = clust, xClusterPieces = x)
# Identical means within clusters
aggregate(y, list(clust = clust), mean)
aggregate(yMHa, list(clust = clust), mean)
aggregate(yMHb, list(clust = clust), mean)
aggregate(yMHbExt, list(clust = clust), mean)
# Identical global regression results
summary(lm(y[, 1] ~ x))
summary(lm(yMHa[, 1] ~ x))
summary(lm(yMHb[, 1] ~ x))
summary(lm(yMHbExt[, 1] ~ x))
# MHa: Identical local regression results
summary(lm(y[, 1] ~ x, subset = clust == 1))
summary(lm(yMHa[, 1] ~ x, subset = clust == 1))
# MHb: Different results
summary(lm(yMHb[, 1] ~ x, subset = clust == 1))
# MHbExt: Same estimates and different std. errors
summary(lm(yMHbExt[, 1] ~ x, subset = clust == 1))
###################################################
# Generate example data for more advanced examples
###################################################
x <- matrix((1:90) * (1 + runif(90)), 30, 3)
x1 <- x[, 1]
x2 <- x[, 2]
x3 <- x[, 3]
y <- matrix(rnorm(90), 30, 3) + x
clust <- paste("c", rep(1:3, each = 10), sep = "")
######## Run main algorithm
z0 <- RegSDChybrid(y, clusters = clust, xLocal = x3, xGlobal = cbind(x1, x2))
# Corresponding models by lm
lmy <- lm(y ~ clust + x1 + x2 + x3:clust)
lm0 <- lm(z0 ~ clust + x1 + x2 + x3:clust)
# Preserved regression coef (x3 within clusters)
coef(lmy) - coef(lm0)
# Preservation of x3 coef locally can also be seen by local regression
coef(lm(y ~ x3, subset = clust == "c2")) - coef(lm(z0 ~ x3, subset = clust == "c2"))
# Covariance matrix preserved
cov(resid(lmy)) - cov(resid(lm0))
# But not preserved within clusters
cov(resid(lmy)[clust == "c2", ]) - cov(resid(lm0)[clust == "c2", ])
######## Modification (a)
za <- RegSDChybrid(y, clusters = clust, xLocal = x3, xGlobal = cbind(x1, x2), alternative = "a")
lma <- lm(za ~ clust + x1 + x2 + x3:clust)
# Now covariance matrices preserved within clusters
cov(resid(lmy)[clust == "c2", ]) - cov(resid(lma)[clust == "c2", ])
# If we estimate coef for x1 and x2 within clusters,
# they become identical and identical to global estimates
coef(lma)
coef(lm(za ~ clust + x1:clust + x2:clust + x3:clust))
######## Modification (c) with automatic calculation of alpha
# The result depends on the randomly generated data
# When the result is that alpha=1, modification (b) is equivalent
zc <- RegSDChybrid(y, clusters = clust, xLocal = x3, xGlobal = cbind(x1, x2), alternative = "c")
lmc <- lm(zc ~ clust + x1 + x2 + x3:clust)
# Preserved regression coef as above
coef(lmy) - coef(lmc)
# Again covariance matrices preserved within clusters
cov(resid(lmy)[clust == "c2", ]) - cov(resid(lmc)[clust == "c2", ])
# If we estimate coef for x1 and x2 within clusters,
# results are different from modification (a) above
coef(lmc)
coef(lm(zc ~ clust + x1:clust + x2:clust + x3:clust))
####################################################
# Make groups of clusters (d) and cluster pieces (e)
####################################################
clustGr <- paste("gr", ceiling(rep(1:3, each = 10)/2 + 0.1), sep = "")
clustP <- c("a", "a", rep("b", 28))
######## Modifications (c), (d) and (e)
zGrP <- RegSDChybrid(y, clusters = clust, clusterPieces = clustP, groupedClusters = clustGr,
xLocal = x3, xGroupedClusters = x2, xGlobal = x1, alternative = "c")
# Corresponding models by lm
lmGrP <- lm(zGrP ~ clust:clustP + x1 + x2:clustGr + x3:clust - 1)
lmY <- lm(y ~ clust:clustP + x1 + x2:clustGr + x3:clust - 1)
# Preserved regression coef
coef(lmY) - coef(lmGrP)
# Identical means within cluster pieces
aggregate(y, list(clust = clust, clustP = clustP), mean)
aggregate(zGrP, list(clust = clust, clustP = clustP), mean)
# Covariance matrix preserved
cov(resid(lmY)) - cov(resid(lmGrP))
# Covariance matrices preserved within clusters
cov(resid(lmY)[clust == "c2", ]) - cov(resid(lmGrP)[clust == "c2", ])
# Covariance matrices not preserved within cluster pieces
cov(resid(lmY)[clustP == "a", ]) - cov(resid(lmGrP)[clustP == "a", ])
Regression-based SDC Tools - Ordinary synthetic data (IPSO)
Description
Implementation of equation 4 in the paper.
Usage
RegSDCipso(y, x = NULL, ensureIntercept = TRUE)
Arguments
y |
Matrix of confidential variables |
x |
Matrix of non-confidential variables |
ensureIntercept |
Whether to ensure/include a constant term. Non-NULL x is subjected to |
Details
Input matrices are subjected to EnsureMatrix
.
Value
Generated version of y
Author(s)
Øyvind Langsrud
Examples
x <- matrix(1:5, 5, 1)
y <- matrix(rnorm(15) + 1:15, 5, 3)
ySynth <- RegSDCipso(y, x)
# Identical regression results
summary(lm(y[, 1] ~ x))
summary(lm(ySynth[, 1] ~ x))
# Identical covariance matrices
cov(y) - cov(ySynth)
cov(residuals(lm(y ~ x))) - cov(residuals(lm(ySynth ~ x)))
Regression-based SDC Tools - Scores from new data
Description
Implementation of equation 12 in the paper.
Usage
RegSDCnew(y, yNew, x = NULL, doSVD = FALSE, ensureIntercept = TRUE)
Arguments
y |
Matrix of confidential variables |
yNew |
Matrix of y-data for new scores |
x |
Matrix of non-confidential variables |
doSVD |
SVD when TRUE and QR when FALSE |
ensureIntercept |
Whether to ensure/include a constant term. Non-NULL x is subjected to |
Details
doSVD has effect on decomposition of y and yNew. Input matrices are subjected to EnsureMatrix
.
Value
Generated version of y
Author(s)
Øyvind Langsrud
Examples
x <- matrix(1:5, 5, 1)
y <- matrix(rnorm(15) + 1:15, 5, 3)
# Same as IPSO (RegSDCipso)
RegSDCnew(y, matrix(rnorm(15), 5, 3), x)
# Close to y
RegSDCnew(y, y + 0.001 * matrix(rnorm(15), 5, 3), x)
Regression-based SDC Tools - Random orthogonal matrix masking (ROMM)
Description
Implementation based on equations 11, 12 and 17 in the paper.
Usage
RegSDCromm(y, lambda = Inf, x = NULL, doSVD = FALSE, ensureIntercept = TRUE)
Arguments
y |
Matrix of confidential variables |
lambda |
ROMM parameter |
x |
Matrix of non-confidential variables |
doSVD |
SVD when TRUE and QR when FALSE |
ensureIntercept |
Whether to ensure/include a constant term. Non-NULL x is subjected to |
Details
doSVD has effect on decomposition of y.
The exact behaviour of the method depends on the choice of the decomposition method because of
the sequentially phenomenon mentioned in the paper.
The similarity to the original data will tend to be highest for the first component.
Input matrices are subjected to EnsureMatrix
.
Value
Generated version of y
Author(s)
Øyvind Langsrud
Examples
x <- matrix(1:5, 5, 1)
y <- matrix(rnorm(15) + 1:15, 5, 3)
# Same as IPSO (RegSDCipso)
RegSDCromm(y, Inf, x)
# Close to IPSO
RegSDCromm(y, 100, x)
# Close to y
RegSDCromm(y, 0.001, x)
Suppressed tabular data: Inner cell frequencies as decimal numbers
Description
Assume that frequencies to be published, z
, can be computed from inner
frequencies, y
, via z = t(x) %*% y
,
where x
is a dummy matrix.
Assuming correct suppression, this function will generate safe inner cell frequencies as decimal numbers.
Usage
SuppressDec(
x,
z = NULL,
y = NULL,
suppressed = NULL,
digits = 9,
nRep = 1,
yDeduct = NULL,
resScale = NULL,
rmse = NULL,
sparseLimit = 500
)
Arguments
x |
Dummy matrix where the dimensions matches z and/or y input. Sparse matrix (Matrix package) is possible. |
z |
Frequencies to be published. All, only the safe ones or with suppressed as NA. |
y |
Inner cell frequencies (see details). |
suppressed |
Logical vector defining the suppressed elements of z. |
digits |
Output close to whole numbers will be rounded using |
nRep |
Integer, when >1, several y's will be generated. Extra columns in output. |
yDeduct |
Values to be subtracted from y and added back after the calculations. Can be used to perform the modulo method described in the paper (see examples). |
resScale |
Residuals will be scaled by resScale |
rmse |
Desired root mean square error (residual standard error). Will be used when resScale is NULL or cannot be used. |
sparseLimit |
Limit for the number of rows of a reduced x-matrix within the algorithm. When exceeded, a sparse algorithm is used
(see |
Details
This function makes use of ReduceX
and RegSDCipso
.
It is not required that y
consists of cell frequencies. A multivariate y
or z
is also possible.
Then several values are possible as digits
, resScale
and rmse
input.
Value
The inner cell frequencies as decimal numbers
Note
Capital letters, X, Y and Z, are used in the paper.
Author(s)
Øyvind Langsrud
Examples
# Same data as in the paper
z <- RegSDCdata("sec7z")
x <- RegSDCdata("sec7x")
y <- RegSDCdata("sec7y") # Now z is t(x) %*% y
zAll <- RegSDCdata("sec7zAll")
zAllSupp <- RegSDCdata("sec7zAllSupp")
xAll <- RegSDCdata("sec7xAll")
# When no suppression, output is identical to y
SuppressDec(xAll, zAll, y)
SuppressDec(xAll, zAll) # y can be seen in z
# Similar to Y* in paper (but other random values)
SuppressDec(x, z, y)
# Residual standard error forced to be 1
SuppressDec(x, z, y, rmse = 1)
# Seven ways of obtaining the same output
SuppressDec(x, z, rmse = 1) # slower, y must be estimated
SuppressDec(x, y = y, rmse = 1)
SuppressDec(xAll, zAllSupp, y, rmse = 1)
SuppressDec(xAll, zAllSupp, rmse = 1) # slower, y must be estimated
SuppressDec(xAll, zAll, y, is.na(zAllSupp), rmse = 1)
SuppressDec(xAll, zAll, suppressed = is.na(zAllSupp), rmse = 1) # y seen in z
SuppressDec(xAll, y = y, suppressed = is.na(zAllSupp), rmse = 1)
# YhatMod4 and YhatMod10 in Table 2 in paper
SuppressDec(xAll, zAllSupp, y, yDeduct = 4 * (y%/%4), resScale = 0)
SuppressDec(xAll, zAllSupp, y, yDeduct = 10 * (y%/%10), rmse = 0)
# As data in Table 3 in paper (but other random values)
SuppressDec(xAll, zAllSupp, y, yDeduct = 10 * (y%/%10), resScale = 0.1)
# rmse instead of resScale and 5 draws
SuppressDec(xAll, zAllSupp, y, yDeduct = 10 * (y%/%10), rmse = 1, nRep = 5)
Suppressed tabular data: Yhat from X and Z
Description
Implementation of equation 21 in the paper.
Usage
Z2Yhat(z, x, digits = 9)
Arguments
z |
Z as a matrix |
x |
X as a matrix |
digits |
When non-NULL, output values close to whole numbers will be rounded using
|
Details
Generalized inverse is computed by ginv
.
In practise, the computations can be speeded up using reduced versions of X and Z. See ReduceX
.
Value
Yhat as a matrix
Author(s)
Øyvind Langsrud
See Also
Examples
# Same data as in the paper
z <- RegSDCdata("sec7z")
x <- RegSDCdata("sec7x")
Z2Yhat(z, x)
# With y known, yHat can be computed in other ways
y <- RegSDCdata("sec7y") # Now z is t(x) %*% y
fitted(lm(y ~ x - 1))
IpsoExtra(y, x, FALSE, resScale = 0)