Title: | Functions for the Lognormal Distribution |
Version: | 0.1.10 |
Author: | Thomas Wutzler |
Maintainer: | Thomas Wutzler <twutz@bgc-jena.mpg.de> |
Description: | The lognormal distribution (Limpert et al. (2001) <doi:10.1641/0006-3568(2001)051%5B0341:lndats%5D2.0.co;2>) can characterize uncertainty that is bounded by zero. This package provides estimation of distribution parameters, computation of moments and other basic statistics, and an approximation of the distribution of the sum of several correlated lognormally distributed variables (Lo 2013 <doi:10.12988/ams.2013.39511>) and the approximation of the difference of two correlated lognormally distributed variables (Lo 2012 <doi:10.1155/2012/838397>). |
Imports: | Matrix |
Suggests: | markdown, rmarkdown, testthat, knitr, dplyr, ggplot2, mvtnorm, purrr, tidyr |
VignetteBuilder: | knitr |
License: | GPL-2 |
RoxygenNote: | 7.1.1 |
URL: | https://github.com/bgctw/lognorm |
Encoding: | UTF-8 |
NeedsCompilation: | no |
Packaged: | 2021-11-21 17:28:10 UTC; twutz |
Repository: | CRAN |
Date/Publication: | 2021-11-21 17:50:10 UTC |
lognorm: Functions for the Lognormal Distribution
Description
The lognormal distribution (Limpert et al. (2001) <doi:10.1641/0006-3568(2001)051 can characterize uncertainty that is bounded by zero. This package provides estimation of distribution parameters, computation of moments and other basic statistics, and an approximation of the distribution of the sum of several correlated lognormally distributed variables (Lo 2013 <doi:10.12988/ams.2013.39511>) and the approximation of the difference of two correlated lognormally distributed variables (Lo 2012 <doi:10.1155/2012/838397>).
Details
Essential functions are
Compute summary statistics:
getLognormMoments
computing distribution parameters from summary statistics:
getParmsLognormForModeAndUpper
estimate distribution parameters from summary sample:
estimateParmsLognormFromSample
Approximate the sum of correlated lognormals:
estimateSumLognormalSample
Approximate the difference of two correlated lognormals:
estimateDiffLognormal
Utilities for correlated data. These functions maybe moved to a separate package in future.
Estimate summary statistics of autocorrelated data
standard error of the mean:
seCor
effective number of observations
computeEffectiveNumObs
variance:
varCor
Return the vector of effective components of the autocorrelation:
computeEffectiveAutoCorr
Otherwise refer to the vignettes
See Also
Useful links:
remove NA values at the start and end
Description
remove NA values at the start and end
Usage
.trimNA(x)
Arguments
x |
numeric vector |
Value
subset of x with leading and trailing NAs removed
Estimate vector of effective components of the autocorrelation
Description
Estimate vector of effective components of the autocorrelation
Usage
computeEffectiveAutoCorr(res, type = "correlation")
Arguments
res |
numeric of autocorrelated numbers, usually observation - model residuals |
type |
type of residuals (see |
Details
Returns all components before first negative autocorrelation
Value
numeric vector: strongest components of the autocorrelation function
References
Zieba 2011 Standard Deviation of the Mean of Autocorrelated
Observations Estimated with the Use of the Autocorrelation Function Estimated
From the Data
Examples
# generate autocorrelated time series
res <- stats::filter(rnorm(1000), filter = rep(1,5), circular = TRUE)
res[100:120] <- NA
(effAcf <- computeEffectiveAutoCorr(res))
Compute the effective number of observations taking into account autocorrelation
Description
Compute the effective number of observations taking into account autocorrelation
Usage
computeEffectiveNumObs(
res,
effAcf = computeEffectiveAutoCorr(res),
na.rm = FALSE,
exact.na = TRUE
)
Arguments
res |
numeric of autocorrelated numbers, usually observation - model residuals |
effAcf |
autocorrelation coefficients. The first entry is fixed at 1 for zero distance. |
na.rm |
if not set to TRUE will return NA in there are missings in the series |
exact.na |
if set to FALSE then do not count and correct for missing in the sum of autocorrelation terms. This is faster, but results are increasingly biased high with increasing number of missings. |
Details
Assumes records of all times present. DO NOT REMOVE OR FILTER NA records before. The length of the time series is used.
Handling of NA values: The formula from Zieba 2011 is extended to subtract the number of missing pairs in the count of correlation terms. If 'exact.na=false' the original formula is used (after trimming edge-NAs).
Value
integer scalar: effective number of observations
References
Zieba & Ramza (2011)
Standard Deviation of the Mean of Autocorrelated
Observations Estimated with the Use of the Autocorrelation Function
Estimated From the Data.
Metrology and Measurement Systems,
Walter de Gruyter GmbH, 18 10.2478/v10178-011-0052-x
Bayley & Hammersley (1946)
The "effective" number of independent observations in an autocorrelated
time series.
Supplement to the Journal of the Royal Statistical Society, JSTOR,8,184-197
Examples
# generate autocorrelated time series
res <- stats::filter(rnorm(1000), filter = rep(1,5), circular = TRUE)
res[100:120] <- NA
# plot the series of autocorrelated random variables
plot(res)
# plot their empirical autocorrelation function
acf(res, na.action = na.pass)
#effAcf <- computeEffectiveAutoCorr(res)
# the effective number of parameters is less than number of 1000 samples
(nEff <- computeEffectiveNumObs(res, na.rm = TRUE))
Inference on the difference of two lognormals
Description
The distribution of y = a - b + s, where a and b are two lognormal random variables and s is a constant to be estimated, can be approximated by a lognormal distribution.
Usage
estimateDiffLognormal(mu_a, mu_b, sigma_a, sigma_b, corr = 0)
pDiffLognormalSample(
mu_a,
mu_b,
sigma_a,
sigma_b,
corr = 0,
q = 0,
nSample = 1e+05
)
Arguments
mu_a |
center parameter of the first term |
mu_b |
center parameter of the second term |
sigma_a |
scale parameter of the first term |
sigma_b |
scale parameter of the second term |
corr |
correlation between the two random variables |
q |
vector of quantiles |
nSample |
number of samples |
Value
estimateDiffLognormal: numeric vector with components mu, sigma, and shift, the components of the shifted lognormal distribution.
pDiffLognormalSample: vector of probabilities
Functions
-
estimateDiffLognormal
: Estimate the shifted-lognormal approximation to difference of two lognormals -
pDiffLognormalSample
: Distribution function for the difference of two lognormals based on sampling. Default provides the probability that the difference is significantly larger than zero.
Estimate lognormal distribution parameters from a sample
Description
Estimate lognormal distribution parameters from a sample
Usage
estimateParmsLognormFromSample(x, na.rm = FALSE)
estimateStdErrParms(x, na.rm = FALSE)
Arguments
x |
numeric vector of sampled values |
na.rm |
a logical value indicating whether NA values should be stripped before the computation proceeds. |
Details
The expected value of a can be determined with higher accuracy the larger the sample. Here, the uncorrelated assumption is applied at the log scale and distribution parameters are returned with the same expected value as the sample, but with uncertainty (sigma) decreased by sqrt(nfin - 1).
Since with low relative error, the lognormal becomes very close to the normal distribution, the distribution of the mean can be well approximated by a normal with sd(mean(x)) ~ sd(x)/sqrt(n-1).
Value
numeric vector with components mu
and sigma
,
i.e., the center parameter (mean at log scale, log(median)) and
the scale parameter (standard deviation at log scale)
Functions
-
estimateParmsLognormFromSample
: Estimate lognormal distribution parameters from a sample -
estimateStdErrParms
: Estimate parameters of the lognormal distribution of the mean from an uncorrelated sample
Examples
.mu <- log(1)
.sigma <- log(2)
n = 200
x <- exp(rnorm(n, mean = .mu, sd = .sigma))
exp(pL <- estimateParmsLognormFromSample(x)) # median and multiplicative stddev
c(mean(x), meanx <- getLognormMoments(pL["mu"],pL["sigma"])[,"mean"])
c(sd(x), sdx <- sqrt(getLognormMoments(pL["mu"],pL["sigma"])[,"var"]))
# stddev decreases (each sample about 0.9) to about 0.07
# for the mean with n replicated samples
se <- estimateStdErrParms(x)
sqrt(getLognormMoments(se["mu"],se["sigma"])[,"var"])
sd(x)/sqrt(n-1) # well approximated by normal
# expected value stays the same
c(meanx, getLognormMoments(se["mu"],se["sigma"])[,"mean"])
Estimate the parameters of the lognormal approximation to the sum
Description
Estimate the parameters of the lognormal approximation to the sum
Estimate the parameters of the lognormal approximation to the sum
Usage
estimateSumLognormalSample(
mu,
sigma,
resLog,
effAcf = computeEffectiveAutoCorr(resLog),
isGapFilled = logical(0),
na.rm = TRUE
)
estimateSumLognormalSampleExpScale(mean, sigmaOrig, ...)
estimateSumLognormal(
mu,
sigma,
effAcf = c(),
corr = Diagonal(length(mu)),
corrLength = if (inherits(corr, "ddiMatrix")) 0 else nTerm,
sigmaSum = numeric(0),
isStopOnNoTerm = FALSE,
na.rm = isStopOnNoTerm
)
Arguments
mu |
numeric vector of center parameters of terms at log scale |
sigma |
numeric vector of scale parameter of terms at log scale |
resLog |
time series of model-residuals at log scale to estimate correlation |
effAcf |
numeric vector of effective autocorrelation
This overrides arguments |
isGapFilled |
logical vector whether entry is gap-filled rather than an original measurement, see details |
na.rm |
neglect terms with NA values in mu or sigma |
mean |
numeric vector of expected values |
sigmaOrig |
numeric vector of standard deviation at original scale |
... |
further arguments passed to |
corr |
numeric matrix of correlations between the random variables |
corrLength |
integer scalar: set correlation length to smaller values to speed up computation by neglecting correlations among terms further apart. Set to zero to omit correlations. |
sigmaSum |
numeric scalar: possibility to specify a precomputed scale parameter instead of computing it. |
isStopOnNoTerm |
if no finite estimate is provided then by default NA is returned for the sum. Set this to TRUE to issue an error instead. |
Details
If there are no gap-filled values, i.e. all(!isGapFilled)
or
!length(isGapFilled)
(the default), distribution parameters
are estimated using all the samples. Otherwise, the scale parameter
(uncertainty) is first estimated using only the non-gapfilled records.
Also use isGapFilled == TRUE for records, where sigma cannot be trusted. When setting sigma to missing, this is also affecting the expected value.
If there are only gap-filled records, assume uncertainty to be (before v0.1.5: the largest uncertainty of given gap-filled records.) the mean of the given multiplicative standard deviation
Value
numeric vector with components mu
, sigma
,
and nEff
,
i.e. the parameters of the lognormal distribution at log scale
and the number of effective observations.
Functions
-
estimateSumLognormalSample
: In addition toestimateSumLognormal
take care of missing values and estimate correlation terms. -
estimateSumLognormalSampleExpScale
: Before callingestimateSumLognormalSample
estimate lognormal parameters from value and its uncertainty given on original scale. -
estimateSumLognormal
: Estimate the parameters of the lognormal approximation to the sum
References
Lo C (2013) WKB approximation for the sum of two
correlated lognormal random variables.
Applied Mathematical Sciences, Hikari, Ltd., 7 , 6355-6367
10.12988/ams.2013.39511
Examples
# distribution of the sum of two lognormally distributed random variables
mu1 = log(110)
mu2 = log(100)
sigma1 = log(1.2)
sigma2 = log(1.6)
(coefSum <- estimateSumLognormal(
c(mu1,mu2), c(sigma1,sigma2) ))
# repeat with correlation
(coefSumCor <- estimateSumLognormal(
c(mu1,mu2), c(sigma1,sigma2), effAcf = c(1,0.9) ))
# expected value is equal, but variance with correlated variables is larger
getLognormMoments(coefSum["mu"],coefSum["sigma"])
getLognormMoments(coefSumCor["mu"],coefSumCor["sigma"])
Construct the full correlation matrix from autocorrelation components.
Description
Construct the full correlation matrix from autocorrelation components.
Usage
getCorrMatFromAcf(nRow, effAcf)
Arguments
nRow |
number of rows in correlation matrix |
effAcf |
numeric vector of effective autocorrelation components . The first entry, which is defined as 1, is not used. |
Compute summary statistics of a log-normal distribution
Description
Compute summary statistics of a log-normal distribution
Usage
getLognormMoments(mu, sigma, m = exp(mu + sigma2/2) - shift, shift = 0)
getLognormMedian(mu, sigma, shift = 0)
getLognormMode(mu, sigma, shift = 0)
Arguments
mu |
numeric vector: location parameter |
sigma |
numeric vector: scale parameter |
m |
mean at original scale, may override default based on mu |
shift |
shift for the shifted lognormal distribution |
Value
for getLognormMoments
a numeric matrix with columns
mean
(expected value at original scale)
, var
(variance at original scale)
, and cv
(coefficient of variation: sqrt(var)/mean).
For the other functions a numeric vector of the required summary.
Functions
-
getLognormMoments
: get the expected value, variance, and coefficient of variation -
getLognormMedian
: get the median -
getLognormMode
: get the mode
References
Limpert E, Stahel W & Abbt M (2001)
Log-normal Distributions across the Sciences: Keys and Clues.
Oxford University Press (OUP) 51, 341,
10.1641/0006-3568(2001)051[0341:lndats]2.0.co;2
See Also
scaleLogToOrig
Examples
# start by estimating lognormal parameters from moments
.mean <- 1
.var <- c(1.3,2)^2
parms <- getParmsLognormForMoments(.mean, .var)
#
# computed moments must equal previous ones
(ans <- getLognormMoments(parms[,"mu"], parms[,"sigma"]))
cbind(.var, ans[,"var"])
#
getLognormMedian(mu = log(1), sigma = log(2))
getLognormMode(mu = log(1), sigma = c(log(1.2),log(2)))
Calculate mu and sigma of lognormal from summary statistics.
Description
Calculate mu and sigma of lognormal from summary statistics.
Usage
getParmsLognormForMedianAndUpper(median, upper, sigmaFac = qnorm(0.99))
getParmsLognormForMeanAndUpper(mean, upper, sigmaFac = qnorm(0.99))
getParmsLognormForLowerAndUpper(lower, upper, sigmaFac = qnorm(0.99))
getParmsLognormForLowerAndUpperLog(lowerLog, upperLog, sigmaFac = qnorm(0.99))
getParmsLognormForModeAndUpper(mle, upper, sigmaFac = qnorm(0.99))
getParmsLognormForMoments(mean, var, sigmaOrig = sqrt(var))
getParmsLognormForExpval(mean, sigmaStar)
Arguments
median |
geometric mu (median at the original exponential scale) |
upper |
numeric vector: value at the upper quantile, i.e. practical maximum |
sigmaFac |
sigmaFac=2 is 95% sigmaFac=2.6 is 99% interval. |
mean |
expected value at original scale |
lower |
value at the lower quantile, i.e. practical minimum |
lowerLog |
value at the lower quantile, i.e. practical minimum at log scale |
upperLog |
value at the upper quantile, i.e. practical maximum at log scale |
mle |
numeric vector: mode at the original scale |
var |
variance at original scale |
sigmaOrig |
standard deviation at original scale |
sigmaStar |
multiplicative standard deviation |
Details
For getParmsLognormForMeanAndUpper
there are two valid solutions, and the one with lower sigma
, i.e. the not so strongly skewed solution is returned.
Value
numeric matrix with columns 'mu' and 'sigma', the parameter of the lognormal distribution. Rows correspond to rows of inputs.
Functions
-
getParmsLognormForMedianAndUpper
: Calculates mu and sigma of lognormal from median and upper quantile. -
getParmsLognormForMeanAndUpper
: Calculates mu and sigma of lognormal from mean and upper quantile. -
getParmsLognormForLowerAndUpper
: Calculates mu and sigma of lognormal from lower and upper quantile. -
getParmsLognormForLowerAndUpperLog
: Calculates mu and sigma of lognormal from lower and upper quantile at log scale. -
getParmsLognormForModeAndUpper
: Calculates mu and sigma of lognormal from mode and upper quantile. -
getParmsLognormForMoments
: Calculate mu and sigma from moments (mean anc variance) -
getParmsLognormForExpval
: Calculate mu and sigma from expected value and geometric standard deviation
References
Limpert E, Stahel W & Abbt M (2001)
Log-normal Distributions across the Sciences: Keys and Clues.
Oxford University Press (OUP) 51, 341,
10.1641/0006-3568(2001)051[0341:lndats]2.0.co;2
Examples
# example 1: a distribution with mode 1 and upper bound 5
(thetaEst <- getParmsLognormForModeAndUpper(1,5))
mle <- exp(thetaEst[1] - thetaEst[2]^2)
all.equal(mle, 1, check.attributes = FALSE)
# plot the distributions
xGrid = seq(0,8, length.out = 81)[-1]
dxEst <- dlnorm(xGrid, meanlog = thetaEst[1], sdlog = thetaEst[2])
plot( dxEst~xGrid, type = "l",xlab = "x",ylab = "density")
abline(v = c(1,5),col = "gray")
# example 2: true parameters, which should be rediscovered
theta0 <- c(mu = 1, sigma = 0.4)
mle <- exp(theta0[1] - theta0[2]^2)
perc <- 0.975 # some upper percentile, proxy for an upper bound
upper <- qlnorm(perc, meanlog = theta0[1], sdlog = theta0[2])
(thetaEst <- getParmsLognormForModeAndUpper(
mle,upper = upper,sigmaFac = qnorm(perc)) )
#plot the true and the rediscovered distributions
xGrid = seq(0,10, length.out = 81)[-1]
dx <- dlnorm(xGrid, meanlog = theta0[1], sdlog = theta0[2])
dxEst <- dlnorm(xGrid, meanlog = thetaEst[1], sdlog = thetaEst[2])
plot( dx~xGrid, type = "l")
#plot( dx~xGrid, type = "n")
#overplots the original, coincide
lines( dxEst ~ xGrid, col = "red", lty = "dashed")
# example 3: explore varying the uncertainty (the upper quantile)
x <- seq(0.01,1.2,by = 0.01)
mle = 0.2
dx <- sapply(mle*2:8,function(q99){
theta = getParmsLognormForModeAndUpper(mle,q99,qnorm(0.99))
#dx <- dDistr(x,theta[,"mu"],theta[,"sigma"],trans = "lognorm")
dx <- dlnorm(x,theta[,"mu"],theta[,"sigma"])
})
matplot(x,dx,type = "l")
# Calculate mu and sigma from expected value and geometric standard deviation
.mean <- 1
.sigmaStar <- c(1.3,2)
(parms <- getParmsLognormForExpval(.mean, .sigmaStar))
# multiplicative standard deviation must equal the specified value
cbind(exp(parms[,"sigma"]), .sigmaStar)
Scale standard deviation between log and original scale.
Description
When comparing values at log scale that have different sd at original scale, better compare log(mean) instead of mu.
Usage
scaleLogToOrig(logmean, sigma)
scaleOrigToLog(mean, sd)
Arguments
logmean |
log of the expected value |
sigma |
standard deviation at log scale |
mean |
expected value at original scale |
sd |
standard deviation at original scale |
Value
numeric matrix with columns mean
, and sd
at original scale
Functions
-
scaleLogToOrig
: get logmean and sigma at log scale -
scaleOrigToLog
: get mean and sd at original scale
Examples
xLog <- data.frame(logmean = c(0.8, 0.8), sigma = c(0.2, 0.3))
xOrig <- as.data.frame(scaleLogToOrig(xLog$logmean, xLog$sigma))
xLog2 <- as.data.frame(scaleOrigToLog(xOrig$mean, xOrig$sd))
all.equal(xLog, xLog2)
xLog3 <- as.data.frame(getParmsLognormForMoments(xOrig$mean, xOrig$sd^2))
all.equal(xLog$sigma, xLog3$sigma) # but mu < logmean
Compute the standard error accounting for empirical autocorrelations
Description
Compute the standard error accounting for empirical autocorrelations
Usage
seCor(
x,
effCor = if (missing(effCov)) computeEffectiveAutoCorr(x) else effCov/var(x, na.rm =
TRUE),
na.rm = FALSE,
effCov,
nEff = computeEffectiveNumObs(x, effCor, na.rm = na.rm)
)
Arguments
x |
numeric vector |
effCor |
numeric vector of effective correlation components
first entry at zero lag equals one. See |
na.rm |
logical. Should missing values be removed? |
effCov |
alternative to specifying effCor: numeric vector of
effective covariance components
first entry is the variance. See |
nEff |
possibility to specify precomputed number of effective observations for speedup. |
Details
The default uses empirical autocorrelation
estimates from the supplied data up to first negative component.
For short series of x
it is strongly recommended to to
provide effCov
that was estimated on a longer time series.
Value
numeric scalar of standard error of the mean of x
set off-diagonal values of a matrix
Description
set off-diagonal values of a matrix
Usage
setMatrixOffDiagonals(x, diag = 1:length(value), value, isSymmetric = FALSE)
Arguments
x |
numeric square matrix |
diag |
integer vector specifying the diagonals 0 is the center +1 the first row to upper and -2 the second row to lower |
value |
numeric vector of values to fill in |
isSymmetric |
set to TRUE to to only specify the upper diagonal element but also set the lower in the mirrored diagonal |
Value
matrix with modified diagonal elements
Compute the unbiased variance accounting for empirical autocorrelations
Description
Compute the unbiased variance accounting for empirical autocorrelations
Usage
varCor(
x,
effCor = computeEffectiveAutoCorr(x),
na.rm = FALSE,
nEff = computeEffectiveNumObs(x, effAcf = effCor)
)
Arguments
x |
numeric vector |
effCor |
numeric vector of effective correlation components
first entry at zero lag equals one. See |
na.rm |
logical. Should missing values be removed? |
nEff |
possibility to specify precomputed number of effective observations for speedup. |
Details
The default uses empirical autocorrelation
estimates from the supplied data up to first negative component.
For short series of x
it is strongly recommended to to
provide effCov
that was estimated on a longer time series.
Value
numeric scalar of unbiased variation of x