Help for package fitPS

Type:

Package

Title:

Fit Zeta Distributions to Forensic Data

Version:

1.0.6

Description:

Fits Zeta distributions (discrete power laws) to data that arises from forensic surveys of clothing on the presence of glass and paint in various populations. The general method is described to some extent in Coulson, S.A., Buckleton, J.S., Gummer, A.B., and Triggs, C.M. (2001) <doi:10.1016/S1355-0306(01)71847-3>, although the implementation differs.

License:

GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]

Encoding:

UTF-8

LazyData:

true

Depends:

R (≥ 4.1.0)

Imports:

doParallel, dplyr, foreach, Hmisc, iterators, knitr, ks, methods, pbapply, Rdpack, readxl, VGAM

RdMacros:

Rdpack

RoxygenNote:

7.3.3

URL:

https://github.com/jmcurran/fitPS

BugReports:

https://github.com/jmcurran/fitPS/issues

Suggests:

rmarkdown, sp, testthat (≥ 3.0.0), xtable

VignetteBuilder:

knitr

Config/testthat/edition:

NeedsCompilation:

Packaged:

2026-06-10 05:21:46 UTC; james

Author:

James Curran [aut, cre]

Maintainer:

James Curran <j.curran@auckland.ac.nz>

Repository:

CRAN

Date/Publication:

2026-06-10 21:40:02 UTC

S3 method for objects of class `psData`

Description

Tests to see if two objects of class psData are equal. That is their type is the same, and the data contained in data is the same. See readData for a description of the psData class.

Usage

## S3 method for class 'psData'
lhs == rhs

Arguments

lhs

an object of class psData.

rhs

an object of class psData.

Details

NOTE: the notes member variable is ignored in this function as it is unlikely that a user would want to see if the notes are the same.

Value

TRUE if the two objects are equal

Examples

p = readData(system.file("extdata", "p.xlsx", package = "fitPS"))
p1 = makePSData(n = 0:2, count = c(98, 1, 1), type = "P")
p2 = makePSData(n = 0:2, count = c(97, 2, 1), type = "P")
p == p1 ## TRUE
p == p2 ## FALSE
p1 == p2 ## FALSE

Number of Groups of Glass Data

Description

Count data from six different surveys looking at the number of sources/groups of glass found on the upper surfaces of clothing taken from the general public.

Usage

data(Psurveys)

Format

A list with nine objects of class psData—see readData for more details. The elements of the list are named: coulson, jackson, lau, lewis.all, lewis.clothing, lewis.shoes, pettard, ross, and roux, corresponding to the lead author in each of the references given below. lau, pettard, and ross were taken from Coulson et al. (2001) rather than the original source. The three objects starting with lewis represent the combined data (all), the groups of glass found on the outer clothing (clothing), and the groups of glass found on shoes/footwear (shoes).

Source

Coulson, S. A., Buckleton, J. S., Gummer, A. B., and Triggs, C. M. (2001) doi:10.1016/S1355-0306(01)71847-3 Glass on clothing and shoes of members of the general population and people suspected of breaking crimes, Science & Justice, 41(1):39–48.

References

Lau L, Beveridge AD, Callowhill BC, Conners N, Foster K, Groves RJ, Ohashi KN, Sumner AM, Wong H (1997). “The Frequency of Occurrence of Paint and Glass on the Clothing of High School Students.” Canadian Society of Forensic Science Journal, 30(4), 233–240. doi:10.1080/00085030.1997.10757103.

Lewis AD, Alexander LC, Ovide O, Duffett O, Curran JM, Buzzini P, Trejos T (2023). “A study on the occurrence of glass and paint across various cities in the United States—Part I: Background presence of glass in the general population.” Forensic Chemistry, 34, 100497. doi:10.1016/j.forc.2023.100497.

Petterd CI, McCallum I, Bradford L, Brinch K, Stewart S (1998). “Glass particles in the clothing of the general population in Canberra—a survey.” In Proceedings of the 14th International Symposium on the Forensic Sciences.

Ross P, Nguyen H (1998). “A survey of clothing for the presence of glass fragments.” In Proceedings of the 14th International Symposium on the Forensic Sciences.

Roux C, Kirk R, Benson S, Van Haren T, Petterd CI (2001). “Glass particles in footwear of members of the public in south-eastern Australia—a survey.” Forensic Science International, 116(2), 149–156. doi:10.1016/S0379-0738(00)00355-8.

Jackson F, Maynard P, Cavanagh-Steer K, Dusting T, Roux C (2013). “A survey of glass found on the headwear and head hair of a random population vs. people working with glass.” Forensic Science International, 226(1), 125–131. doi:10.1016/j.forsciint.2012.12.017.

Size of Groups of Glass Data

Description

Count data from six different surveys looking at the number of sources/groups of glass found on the upper surfaces of clothing taken from the general public.

Usage

data(Psurveys)

Format

A list with five objects of class psData—see readData for more details. The elements of the list are named: jackson, lau, pettard, ross, and roux, corresponding to the lead author in each of the references given below. lau, pettard, and ross were taken from Coulson et al. (2001) rather than the original source.

Source

References

Ross P, Nguyen H (1998). “A survey of clothing for the presence of glass fragments.” In Proceedings of the 14th International Symposium on the Forensic Sciences.

Coulson SA, Buckleton JS, Gummer AB, Triggs CM (20011). “Glass on clothing and shoes of members of the general population and people suspected of breaking crimes.” Science & Justice, 41(1), 39–48. doi:10.1016/S1355-0306(01)71847-3.

Add data to a psData object

Description

Add one or more new observations to an existing clothing survey object.

Usage

add(x, newData)

Arguments

x

an object of class psData—see readData for details.

newData

either a vector, matrix or data.frame containing the new data. If a vector or magtrix is supplied then it must be either of length or have two columns. If a data.frame is supplied then the columns must be labelled "n" and "rn". The new data MUST NOT contain values that already exist in x$n

Value

an object of class pSData

Examples

add(Ssurveys$lau, c(11, 1))

Converts an object of class `psData` to a `data.frame`

Description

Converts an object of class psData—see readData—to a data.frame that can be used with in functions in other packages such as vglm to fit more complicated models.

Usage

## S3 method for class 'psData'
as.data.frame(x, ...)

Arguments

x

an object of class psData—see readDatafor more details.

...

any other arguments passed to data.frame.

Details

If x is a psData object of type "P", i.e. it relates to numbers of groups of glass, then a data.frame with a single variable count will be return where count = rep(x$data$n + 1, x$data$rn). The counts have one added to them because the zeta distribution requires that the counts are greater than or equal to one. If x is a psData object of type "P", i.e. it relates to group sizes, then a data.frame with a single variable count will be return where count = rep(x$data$n, x$data$rn).

Value

a data.frame with a single variable count. The number of rows in the data.frame is equal to sum(x$data$rn).

Examples

p = readData(system.file("extdata", "p.xlsx", package = "fitPS"))
p.df = as.data.frame(p)
table(p.df$count)
p$data

Bootstrap confidence intervals or regions

Description

Use boostrapping to generate confidence intervals, or confidence regions in the case of the zero-inflated model.

Usage

bootCI(x, ...)

## Default S3 method:
bootCI(
  x,
  level = 0.95,
  B = 2000,
  model = c("zeta", "ziz"),
  returnBootValues = FALSE,
  silent = FALSE,
  plot = FALSE,
  parallel = TRUE,
  progressBar = FALSE,
  pbopts = list(type = "txt"),
  ...
)

## S3 method for class 'psData'
bootCI(x, ...)

## S3 method for class 'psFit'
bootCI(x, ...)

Arguments

x

a object either of class psData—see readData for more details—or of class psFit.

...

other arguments.

level

the confidence level required—restricted to [0.75, 1). This may be a vector, in which case multiple intervals, or confidence regions will be returned.

B

the number of bootstrap samples to take.

model

which model to fit to the data, either "zeta" or "ziz". Maybe abbreviated to "z" and "zi". Default is "zeta".

returnBootValues

if TRUE then the vector (or data.frame) of bootstrapped values is returned. This can be useful for debugging or understanding the results. Default is FALSE.

silent

if TRUE, then no output will be displayed whilst the bootstrapping is being undertaken. plot if TRUE then the contours for the confidence region will be plotted. This only works if model = "ziz". It is ignored otherwise. parallel if TRUE then the bootstrapping is performed in parallel.

plot

if TRUE and model == "ziz", then a plot of the bootstrapped values will be produced and confidence contour lines will be drawn for each value in level.

parallel

if TRUE, then the package will attempt to use multiple cores to speed up computation.

progressBar

if TRUE, then progress bars will be displayed to show progress on the bootstrapping.

pbopts

a list of arguments for the pboptions function that affect the progress bars. Ignored if progressBar = FALSE.

Details

This function uses bootstrapping to compute a confidence interval for the shape parameter in the case of the zeta model and a confidence region in the case of the zero-inflated zeta model. A smoothed bootstrap approach is taken rather than a simple percentile method. The kernel density estimation is performed by the ks package using a smoothed cross-validated bandwidth selection procedure.

Value

If returnBootVals == TRUE then the results are returned in a list with elements named ci and bootVals for the zeta model and confRegion and bootVals for the zero-inflated zeta model. The structure of ci and confregion is described below. If model == "zeta", then either a vector or a data.frame with elements/columns named "lower" and "upper" representing the lower and upper bounds of the confidence interval(s). Multiple bounds are returned in a data.frame when level has more than one value. If model == "ziz ", then a list with length equal to the length of level is returned. The name of each element in the list is the level with list has a single element named "95%". It is possible for there to be multiple contours for the confidence region for a given level. If there is only one contour for each value of level, then each element of the list consists of a list with elements named pi and shape which specify the coordinates of the contour(s) for that level. There is a third element named level which gives the height of the kernel density estimate at that contour. If there are multiple contours for a given value of level then each list element is a list of lists with the structure given above (level, pi, and shape). NOTE: it is quite possible that there are multiple contours for a given height. If you want a way of thinking about this consider a mountain range with two mountains of equal height. If you draw the contours for (almost) any elevation, then you would expect to capture a region from each mountain.

Methods (by class)

bootCI(default): Bootstrap confidence intervals or regions
bootCI(psData): Bootstrap confidence intervals or regions
bootCI(psFit): Bootstrap confidence intervals or regions

Examples

## Not run: 
data(Psurveys)
roux = Psurveys$roux
confRegion = bootCI(roux, model = "ziz", parallel = FALSE, plot = TRUE)

## This will not work unless you have the sp package installed
## Count how many of the points lie within the 95% confidence region
lapply(confRegion, function(cr){
  table(sp::point.in.polygon(fit$pi,fit$shape, cr$pi, cr$shape))
. })

## End(Not run)

Compare two surveys on the basis of their shape parameters

Description

Compare two surveys on the basis of their shape parameters

Usage

compareSurveys(x, ...)

## Default S3 method:
compareSurveys(
  x,
  y,
  xname = NULL,
  yname = NULL,
  alternative = c("two.sided", "less", "greater"),
  null.value = 0,
  print = TRUE,
  ...
)

## S3 method for class 'psData'
compareSurveys(x, y, ...)

## S3 method for class 'psFit'
compareSurveys(x, y, ...)

compare.surveys(x, ...)

comp.survs(x, ...)

Arguments

x

either an object of class psData—see readData or an object of class psFit—see fitDist.

y

either an object of class psData—see readData or an object of class psFit—see fitDist.

xname

an optional name for the first survey object.

yname

an optional name for the second survey object.

alternative

one of "two.sided", "less", or "greater", depending on the type of hypothesis test you wish to carry out. These may be replaced by single letter (or more) abbreviations.

null.value

the true value of the difference in the shape parameters under the null hypothesis.

print

if TRUE then the function will print summary output to the screen. This lets output be suppressed in situations where the user wants the function to run silently.

...

further arguments to be passed to or from methods.

Details

This function **only** works for the zeta distribution. It does not work for the zero-inflated zeta distribution. If the results from fitting ZIZ models are passed to this function, then it will ignore the zero-inflated part and simply refit a zeta model.

There is very little reason for null.value to be set to be anything other than 0. However it has been included for flexibility.

alternative = "greater" is the alternative that x has a larger shape parameter than y. alternative = "less" is the alternative that x has a smaller shape parameter than y.

Value

The function returns a list of class "htest" with the following elements:

statistic: – the test statistic.
p.value: – the P-value associated with the estimate.
estimate: – the estimated difference in the shape parameters.
null.value: – the specified hypothesized value of the difference in shape parameters—0 by default.
stderr: – the standard error of the difference.
alternative: – a character string describing the alternative hypothesis.
method: – a character string describing the method.
data.name: – a character string with the names of the two input data sets separated by " and ".

Methods (by class)

compareSurveys(default): Compare two surveys on the basis of their shape parameters
compareSurveys(psData): Compare two surveys on the basis of their shape parameters
compareSurveys(psFit): Compare two surveys on the basis of their shape parameters

Functions

compare.surveys(): Compare two surveys on the basis of their shape parameters
comp.survs(): Compare two surveys on the basis of their shape parameters

Examples

data(Psurveys)
lau = Psurveys$lau
jackson = Psurveys$jackson
compareSurveys(lau, jackson)

## Example with fitted objects - note the function just refits the models
fit.lau = fitDist(lau)
fit.jackson = fitDist(jackson)
compareSurveys(fit.lau, fit.jackson)

## Example with a bigger difference
compareSurveys(Psurveys$roux, lau)

Compare two or more surveys on the basis of their shape parameters using a Likelihood Ratio Test

Description

Compare two or more surveys on the basis of their shape parameters using a Likelihood Ratio Test

Usage

compareSurveysLRT(...)

Arguments

...

two or more objects of class "psData"—see readData.

Details

This function **only** works for the zeta distribution. The function carries out a likelihood ratio test (LRT) to test the null hypothesis

H_0: \alpha_1 = \alpha_2 = \ldots = \alpha_K

versus the alternative

H_1: \alpha_i \neq \alpha_j \mbox{ for some } i \neq j \in \left\{1, \ldots, K\right\},

where \alpha_i is the shape parameter for the zeta distribution of the i^\mathrm{th} survey.

Value

The function returns a list of class "htest" with the following elements:

statistic: – the test statistic.
parameter: – the degrees of freedom for the test
p.value: – the P-value associated with the estimate.
method: – a character string describing the method hypothesis.
data.name: – the names of the data sets used in the test

Examples

data(Psurveys)
lau = Psurveys$lau
jackson = Psurveys$jackson
compareSurveysLRT(lau, jackson)

## Example with three surveys
roux = Psurveys$roux
compareSurveysLRT(lau, jackson, roux)

S3 confint method for objects of class psFit

Description

S3 confint method for objects of class psFit

Usage

## S3 method for class 'psFit'
confint(object, parm, level = 0.95, ...)

Arguments

object

an object of class psFit—see fitDist for more details

parm

added for compatibility. Should be left empty as it is ignored.

level

the confidence level required—restricted to [0.75, 1)

...

in theory other parameters to be passed to confint, but in reality passed as extra parameters to the internal function plZIZ.

Details

NOTE: the method for ZIZ model is a little computationally intensive and possibly (almost certainly) unstable.

Value

if the zeta model is used (i.e object comes from a call to fitDist),then a list with two items: wald and prof containing the Wald and profile likelihood confidence intervals respectively for the shape parameter of the fitted zeta distribution is returned. In general these should be relatively close to each other. These values use the zeta distribution shape parameter and must satisfy shape > 1. If a zero-inflated zeta model is used (i.e. object comes from a call to fitZIDist) then list of a confidence regions is returned with and element for each value of level. The confidence regions are data.frames with variables pi and shape which can be used with lines or polygon to draw a the confidence region.

Examples

data(Psurveys)
roux = Psurveys$roux
fit = fitDist(roux)
confint(fit)

## Not run: 
fit.zi = fitZIDist(roux)
cr = confint(fit.zi, level = c(0.80, 0.95))
plot(cr[["0.95"]], type = "l")
polygon(cr[["0.8"]])

## End(Not run)

## Not run: 
fit.zi = fitZIDist(roux, method = "bayes")
cr = confint(fit.zi, level = c(0.80, 0.95))
plot(cr[["0.95"]], type = "l")
polygon(cr[["0.8"]])

## End(Not run)

Bayesian credible intervals or regions

Description

Use kernel density estimation to generate credible intervals, or credible regions in the case of the zero-inflated model.

Usage

credint(psFit, level = 0.95, plot = FALSE, silent = FALSE, ...)

credInt(psFit, level = 0.95, plot = FALSE, silent = FALSE, ...)

Arguments

psFit

a object of class psFit.

level

the credible level required—restricted to [0.75, 1). This may be a vector, in which case multiple intervals, or credible regions will be returned.

plot

if TRUE and model == "ziz", then a plot of the bootstrapped values will be produced and confidence contour lines will be drawn for each value in level.

silent

if TRUE, then no output will be displayed whilst the the kernel density estimation is being undertaken.

...

other arguments fed to plot. If plot == FALSE, then these will be ignored

Details

This function uses kernel density estimation to compute a Bayesian credible interval for the shape parameter in the case of the zeta model and a credible region in the case of the zero-inflated zeta model. A smoothing approach is taken rather than a simple percentile method. The kernel density estimation is performed by the ks package using a smoothed cross-validated bandwidth selection procedure.

Value

If psData$model == "zeta", then either a vector or a data.frame with elements/columns named "lower" and "upper" representing the lower and upper bounds of the confidence interval(s). Multiple bounds are returned in a data.frame when level has more than one value. If psData$model == "ziz", then a list with length equal to the length of level is returned. The name of each element in the list is the level with list has a single element named "95%". It is possible for there to be multiple contours for the confidence region for a given level. If there is only one contour for each value of level, then each element of the list consists of a list with elements named pi and shape which specify the coordinates of the contour(s) for that level. There is a third element named level which gives the height of the kernel density estimate at that contour. If there are multiple contours for a given value of level then each list element is a list of lists with the structure given above (level, pi, and shape). NOTE: it is quite possible that there are multiple contours for a given height. If you want a way of thinking about this consider a mountain range with two mountains of equal height. If you draw the contours for (almost) any elevation, then you would expect to capture a region from each mountain.

Functions

credInt(): Bayesian credible intervals or regions

Examples

## Not run: 
data(Psurveys)
roux = Psurveys$roux
fit = fitzidist(roux, method == "bayes")
credRegion = credint(roux, plot = TRUE)

## This will not work unless you have the sp package installed
## Count how many of the points lie within the 95% confidence region
lapply(credRegion, function(cr){
  table(sp::point.in.polygon(fit$pi,fit$shape, cr$pi, cr$shape))
. })

## End(Not run)

Fit a Zeta Distribution to Forensic Data

Description

This function uses maximum likelihood estimation (MLE), or Bayesian estimation (MCMC), to estimate the shape parameter of a zeta distribution from a set of observed counts for either the number of groups/sources of forensically interesting material (mostly glass or paint) recovered from clothing, or the number of fragments/particles in each group. This, in turn, allows the estimation of the P and S probabilities, as described by Evett and Buckleton (1990), which are used in computing the likelihood ratio (LR) for activity level propositions. The data arise from clothing surveys. The general method is described in Coulson et al. (2001), although poor typesetting and a lack of defined terms make it hard to follow. This package improves on the estimation in that linear interpolation is not required, and standard numerical optimisation is used instead. The zeta distribution has probability mass function

p(k) = \frac{k^{-s}}{\zeta(s)}

where \zeta(s) is the Riemann zeta function. Coulson et al. (2001) did not have an easy way to rapidly compute this quantity, hence their use of linear interpolation.

Usage

fitDist(
  x,
  nterms = 10,
  method = c("mle", "bayes", "integrate", "numerical", "mcmc", "laplace", "importance"),
  prior,
  bayesOptions = NULL,
  ...
)

fitdist(
  x,
  nterms = 10,
  method = c("mle", "bayes", "integrate", "numerical", "mcmc", "laplace", "importance"),
  prior,
  bayesOptions = NULL,
  ...
)

Arguments

x

an object of type psData, usually obtained from readData.

nterms

the number of terms to compute the probability distribution for.

method

primary fitting method. Use "mle" for maximum likelihood estimation or "bayes" for Bayesian estimation. Legacy Bayesian aliases "integrate", "numerical", and "mcmc" are accepted with a deprecation warning and translated to method = "bayes" with the corresponding bayesOptions$posteriorMethod.

prior

optional prior object used by the Bayesian methods. This is retained for backward compatibility. New code should usually pass priors through bayesOptions. If omitted, makePrior() is used.

bayesOptions

optional list controlling Bayesian fitting. The posteriorMethod element selects "numerical", "mcmc", "laplace", or "importance". The default is "numerical". The prior element may contain a prior object returned by makePrior().

...

other arguments that control the estimation methods. If method == "mle", then the user can provide an optional argument start which is the starting value for the numerical optimisation. If this is not provided, then start = 1 by default. If you specify your own starting value, it must satisfy shape > 1.

If method == "bayes", then there are five optional parameters (which, despite the documentation, are actually case-insensitive):

shape0: – The initial value of the shape parameter. The default is 2.
a: – The lower bound for the default uniform prior on \log(\mathrm{shape} - 1). The default is -2.
b: – The upper bound for the default uniform prior on \log(\mathrm{shape} - 1). The default is +2.
nIter: – The number of samples to save from the chain. Must be greater than zero, and ideally greater than 1000.
nBurnIn: – The number of samples to discard from the chain. Must be greater than zero. **NOTE**: the sampler runs for nIter + nBurnIn iterations, so you do not need to factor this number into your number of samples, nIter.
silent: – A logical variable which allows the user to get a progress bar if they want. TRUE by default.

Details

The function returns an object of class psFit which is a list containing seven or eight elements:

psData: – an object of class psData–see readData,
fit: – the fitted object from optim,
shape: – the maximum likelihood estimate, or the posterior mean, of the shape parameter,
var.shape: – the maximum likelihood estimate, or posterior estimate, of the variance of the shape parameter,
fitted: – a named vector containing the first nterms of the fitted distribution.
model: – set to "zeta" for this model.
method: – the method of estimation used, either "mle" or "bayes".
chain: – if method == "bayes", then this element will contain the Markov Chain from the sampler, that is, hopefully a sample from the posterior density of the shape parameter. If method == "mle", then this element does not exist.

The output can be used in a variety of ways. If the interest is just in the shape parameter estimate, then the shape member of the psFit object contains this information. It is also displayed along with a number of fitted probabilities by the print.psFit method. The fitted object can also be plotted using the plot method plot.psFit, and to create a probability function with probfun. The shape value stored in the fitted object is the zeta distribution shape parameter and must satisfy shape > 1.

This function implements both maximum likelihood estimation (MLE) and Bayesian estimation. Both modes of estimation require additional information such as starting values and parameters for priors. Please read the documentation for the ... argument closely because it explains what you can change and what the default values are.

Currently the Bayesian estimation is done using the prior returned by makePrior. By default this is a Uniform[a, b] prior on \log(\mathrm{shape} - 1), so the prior support always has shape > 1. This may become more flexible in the future. Similarly, the estimation is done using a simple Metropolis-Hastings sampler. It might be more efficient to sample through adaptive rejection sampling, but it is unclear whether it is worth the effort.

Value

an object of class psFit–see Details.

Functions

fitdist(): Fit a Zeta Distribution to Forensic Data

References

Coulson, S. A., Buckleton, J. S., Gummer, A. B., and Triggs, C.M., "Glass on clothing and shoes of members of the general population and people suspected of breaking crimes", Science & Justice 2001: 41(1): 39–48.

Evett, I. W. and Buckleton, J. S., "The interpretation of glass evidence. A practical approach", Journal of the Forensic Science Society 1990: 30(4): 215–223.

Examples

p = readData(system.file("extdata", "p.xlsx", package = "fitPS"))
fit = fitDist(p)
fit

## Compare to the Bayesian estimates
fit2 = fitDist(p, method = "bayes")
fit2

fit3 = fitDist(
  p,
  method = "bayes",
  bayesOptions = list(posteriorMethod = "numerical")
)
fit3

Fit a Zero-Inflated Zeta Distribution to Forensic Data

Description

This function uses maximum likelihood estimation (MLE) or Bayesian estimation (MCMC) to estimate the mixing parameter and the shape parameter of a zero-inflated zeta distribution from a set of observed counts for either the number of groups/sources of forensically interesting material (mostly glass or paint) recovered from clothing, or the number of fragments/particles in each group. This, in turn, allows the estimation of the P and S probabilities, as described by Evett and Buckleton (1990), which are used in computing the likelihood ratio (LR) for activity level propositions. The data arise from clothing surveys. The zero-inflated zeta distribution has probability mass function

p(k) = \begin{cases} \pi + \frac{(1-\pi)}{\zeta(s)}&,k=0, \\ \frac{(1-\pi)k^{-s}}{\zeta(s)}&,k=1,2,\ldots \end{cases}

where \zeta(s) is the Riemann zeta function.

Usage

fitZIDist(
  x,
  nterms = 10,
  method = c("mle", "bayes", "integrate", "numerical", "mcmc", "laplace", "importance"),
  prior,
  bayesOptions = NULL,
  ...
)

fitZIdist(
  x,
  nterms = 10,
  method = c("mle", "bayes", "integrate", "numerical", "mcmc", "laplace", "importance"),
  prior,
  bayesOptions = NULL,
  ...
)

fitzidist(
  x,
  nterms = 10,
  method = c("mle", "bayes", "integrate", "numerical", "mcmc", "laplace", "importance"),
  prior,
  bayesOptions = NULL,
  ...
)

Arguments

x

an object of type psData, usually obtained from readData.

nterms

the number of terms to compute the probability distribution for.

method

primary fitting method. Use "mle" for maximum likelihood estimation or "bayes" for Bayesian estimation. Legacy Bayesian aliases "integrate", "numerical", "mcmc", "laplace", and "importance" are accepted with a deprecation warning and translated to method = "bayes" with the corresponding bayesOptions$posteriorMethod.

prior

optional prior object used by Bayesian posterior approximation methods where applicable. This is retained for consistency with fitDist(); new code should usually pass priors through bayesOptions.

bayesOptions

...

other arguments that control the estimation methods. If method == "mle", then the user can provide an optional argument start which is the starting value for the numerical optimisation. If this is not provided, then start = c(0.5, 2) by default. If you specify your own starting value, keep the mixing parameter greater than 0.5 and use shape > 1.

If method == "bayes", then there are seven optional parameters (which, despite the documentation, are actually case-insensitive):

theta0: – The initial values of the mixing parameter and shape parameter. The default is c(0.5, 2).
a: – The lower bound for the default uniform prior on \log(\mathrm{shape} - 1). The default is -2.
b: – The upper bound for the default uniform prior on \log(\mathrm{shape} - 1). The default is +2.
shape1: – The first shape parameter for the beta prior on the mixing distribution, Beta(shape1, shape2). The default is 1.
shape2: – The second shape parameter for the beta prior on the mixing distribution, Beta(shape1, shape2). The default is 1.
nIter: – The number of samples to save from the chain. Must be greater than zero, and ideally greater than 1000.
nBurnIn: – The number of samples to discard from the chain. Must be greater than zero. **NOTE**: the sampler runs for nIter + nBurnIn iterations, so you do not need to factor this number into your number of samples, nIter.
silent: – A logical variable which allows the user to get a progress bar if they want. TRUE by default.

Details

The function returns an object of class psFit which is a list containing eight or nine elements:

psData: – an object of class psData–see readData,
fit: – the fitted object from optim,
pi: - the maximum likelihood estimate, or the posterior mean, of the mixing parameter,
shape: – the maximum likelihood estimate, or the posterior mean, of the shape parameter,
var.cov: – the estimated (posterior) variance-covariance matrix for the parameters,
fitted: – a named vector containing the first nterms of the fitted distribution.
model: – set to "ziz" for this model,
method: – the method of estimation used, either "mle" or "bayes",
chain: – if method == "bayes", then this element will contain the Markov Chain from the sampler, that is, hopefully a sample from the posterior density of the mixing parameter and the shape parameter. If method == "mle", then this element does not exist.

The output can be used in a variety of ways. If the interest is just in the mixing and shape parameter estimates, then the pi and shape members of the psFit object contain this information. It is also displayed along with a number of fitted probabilities by the print.psFit method. The fitted object can also be plotted using the plot method plot.psFit, and to create a probability function with probfun. The shape value stored in the fitted object is the zeta distribution shape parameter and must satisfy shape > 1.

Bayesian zero-inflated zeta estimation is selected with method = "bayes". The posterior approximation is selected with bayesOptions$posteriorMethod. The default, "numerical", uses deterministic two-dimensional grid integration over pi and shape. The legacy Metropolis-Hastings sampler remains available with bayesOptions = list(posteriorMethod = "mcmc"). The prior for the mixing proportion is Beta(shape1, shape2), and the shape prior is supplied by bayesOptions$prior or prior. If no shape prior is supplied, makePrior() is used.

Value

an object of class psFit–see Details.

References

Evett, I. W. and Buckleton, J. S., "The interpretation of glass evidence. A practical approach", Journal of the Forensic Science Society 1990: 30(4): 215–223.

Examples

data(Psurveys)
roux = Psurveys$roux
fit = fitZIDist(roux)
fit

Fit a Logarithmic Distribution to Forensic Data

Description

This function uses maximum likelihood estimation (MLE) to estimate the shape parameter of a logarithmic distribution from a set of observed counts for either the number of groups/sources of forensically interesting material (mostly glass or paint) recovered from clothing, or the number of fragments/particles in each group. This, in turn, allows the estimation of the P and S probabilities, as described by Evett and Buckleton (1990), which used in computing the likelihood ratio (LR) for activity level propositions. The data itself arises from clothing surveys. The logarithmic distribution has probability mass function

p(k) = \frac{\pi^k}{k\log_e(1-\pi)},0<\pi<1.

Usage

fitlogDist(x, nterms = 10, start = 0.5, ...)

fitLogdist(x, nterms = 10, start = 0.5, ...)

fitlogdist(x, nterms = 10, start = 0.5, ...)

Arguments

x

an object of type psData, usually obtained from readData.

nterms

the number of terms to compute the probability distribution for.

start

a starting value for the optimiser.

...

other parameters - not currently used.

Details

The function returns an object of class psFit which is a list contains seven elements:

psData: – an object of class psData–see readData,
fit: – the fitted object from optim,
pi: - the maximum likelihood estimate of the shape parameter,
var: – the estimated variance for the shape parameter,
fitted: – a named vector containing the first nterms of the fitted distribution.

Value

an object of class psFit–see Details.

References

Evett, I. W. and Buckleton, J. S., "The interpretation of glass evidence. A practical approach", Journal of the Forensic Science Society 1990: 30(4): 215–223.

Examples

data(Psurveys)
roux = Psurveys$roux
fit = fitlogDist(roux)
fit

S3 fitted method for an object of class `psFit`

Description

S3 fitted method for an object of class psFit

Usage

## S3 method for class 'psFit'
fitted(object, n = NULL, ...)

Arguments

object

an object of class psFit, usually from fitDist or fitZIDist.

n

This parameter is NULL by default. If it is not NULL then it must be either the number of fitted terms to be return, or, a vector containing the desired fitted values.

...

other arguments passed to fitted—not used.

Value

a named vector of fitted probabilities

Extract the log-likelihood from a zeta model fit

Description

Returns the maximised log-likelihood for a fitted zeta model.

Usage

## S3 method for class 'psFit'
logLik(object, ...)

Arguments

object

An object of class '"psFit"'.

...

Additional arguments passed to methods. Currently ignored.

Details

This method allows generic functions such as [stats::AIC()] and [stats::BIC()] to work with objects of class '"psFit"'.

Value

An object of class '"logLik"' with attributes '"df"' and '"nobs"'.

Create a survey data set manually

Description

Create a survey data set from the command line rather than reading data in from a file. This function is likely to be only useful where there are a very small number of group sizes, or sizes of groups of glass.

Usage

makePSData(n, count = NULL, type = c("P", "S"), notes = NULL)

makeData(n, count = NULL, type = c("P", "S"), notes = NULL)

createPSData(n, count = NULL, type = c("P", "S"), notes = NULL)

Arguments

n

Either the number of groups of glass or the size of different groups of glass, or a vector of observed groups of glass, or group sizes. See details for a longer explanation.

count

Either the number of people in the survey sample who had n groups of glass on their clothing, or the number of people who had a group of glass of size n.

type

either "P" or "S"

notes

a bibentry or a character string which allows extra information about the data to be stored, such as the source, or reference. NULL by default.

Details

If count is NULL, then it is assumed that n consists of actual observed group sizes or numbers of groups of glass found on a survey of N individuals. That is, one could provide n = rep(0:1, 98, 1) or n = 0:1, count = c(98, 1). The former is more useful when performing simulation studies.

Value

an object of type psData—see readData for more details.

Examples

## recreate the data read in the readData example
p1 = makePSData(n = c(0, 1, 2), count = c(98, 1, 1), type = "P")
s1 = makePSData(n = 1:3, count = c(1, 1, 1), type = "S")
p1
s1

Define a prior density

Description

Construct a prior that can be used in the fitdist function.

Usage

makePrior(family = c("loguniform", "uniform", "custom"), range, logd)

Arguments

family

One of "loguniform", "uniform" or "custom".

range

Optionally the range for which the prior density is evaluated. It is zero outside of this range. For zeta models this range is on the standard shape scale, where shape > 1.

logd

Optionally (required when family="custom".) a function that evaluates the log density of the prior inside the range.

Details

The default is a LogUniform[-2, 2] prior on shape - 1. Equivalently, the prior range on the fitPS standard zeta shape parameter is 1 + exp(c(-2, 2)).

Value

an object of type psPrior

Examples

## With default parameters, the prior will be LogUniform[-2, 2]
## on shape - 1, so the prior support is above shape = 1.
p1 <- makePrior()

# plot the prior density
xPlot <- seq(from = 1.01, to = 10, length = 100)
plot(xPlot, exp(p1$logd(xPlot)), type = "l")

# Alternatively, a Uniform[a, b] prior can be used on standard shape
p2 <- makePrior(family = "uniform", range = c(1.01, 10))
plot(xPlot, exp(p2$logd(xPlot)), type = "l")

# A custom prior needs the log density function to be specified
# We define an exponential prior with rate = 1 on shape - 1
logdexp <- function(x) dexp(x - 1, rate = 1 / 10, log = TRUE)
p3 <- makePrior(family = "custom", range = c(1.01, 10), logd = logdexp)

plot(xPlot, exp(p3$logd(xPlot)), type = "l")

An S3 method for computing the mean of clothing survey for the number of groups or size of groups

Description

An S3 method for computing the mean of clothing survey for the number of groups or size of groups

Usage

## S3 method for class 'psData'
mean(x, ...)

Arguments

x

an object of class psData—readData for more details.

...

other arguments which are passed to sum

Value

the mean of the data. If there are r_i observations of the value n_i then the mean is given by

\sum_i\frac{r_i\times n_i}{\sum_i{r_i}}

Examples

data(Psurveys)
mean(Psurveys$roux)

S3 plot method for an object of class `psFit`

Description

S3 plot method for an object of class psFit

Usage

## S3 method for class 'psFit'
plot(
  x,
  ylim = c(0, 1),
  conf = FALSE,
  conf.level = 0.95,
  ci.type = c("wald", "prof"),
  log.scale = FALSE,
  ...
)

Arguments

x

an object of class psFit, usually from fitDist or fitZIDist.

ylim

the limits of the y-axis.

conf

if TRUE, and the model is the the zeta model (as opposed to the zero-inflated zeta (ZIZ), then confidence intervals (based on the standard error of the shape parameter) are drawn on the plot. If the ZIZ model has been used, then this is ignored.

conf.level

the confidence level for the confidence intervals. Must be between 0.75 and 0.99.

ci.type

Specifies the type of confidence interval. If conf == TRUE, then then ci.type can be either "wald" "prof" (or an abbreviation), depending on whether the Wald interval or the profile likelihood interval should be used. Note that these are intervals on the shape parameter and not the density heights. Therefore the intervals around the probabilities should not really be thought of as confidence intervals but rather something more similar to a "sensitivity" interval.

log.scale

if TRUE the y-axis is changed to a logarithmic (base 10) axis.

...

other arguments passed to plot.

Value

No return value, called for side effects

Examples

p = readData(system.file("extdata", "p.xlsx", package = "fitPS"))
fit = fitDist(p)
plot(fit)

## An example with Wald generated intervals
plot(fit, conf = TRUE)

plot(fit, conf = TRUE, ci.type = "p")

Plot a posterior density for a fitted power-series model

Description

Plot the marginal posterior density for a parameter in a Bayesian 'psFit' object. The posterior density is estimated from stored MCMC samples when they are available. For numerical integration fits, the stored posterior density function is evaluated on a grid.

Usage

plotPosterior(
  object,
  parameter = "shape",
  level = 0.95,
  showEstimate = TRUE,
  showInterval = TRUE,
  nGrid = 512,
  xlab = NULL,
  ylab = "Posterior density",
  main = NULL,
  ...
)

Arguments

object

an object of class psFit, usually from fitDist or fitZIDist.

parameter

character; the posterior parameter to plot. The default is "shape". Zero-inflated Bayesian fits also support "pi".

level

numeric; credible level for the interval, if displayed.

showEstimate

logical; if TRUE, draw a vertical line at the posterior point estimate stored in the fitted object.

showInterval

logical; if TRUE, draw vertical lines for the equal-tail credible interval.

nGrid

integer; number of grid points used when a stored posterior density function is evaluated directly.

xlab, ylab, main

optional plot labels.

...

other graphical arguments passed to plot.

Details

This function intentionally does not overload plot.psFit, which continues to plot fitted probabilities. For MCMC fits, the density is estimated from object$chain. For numerical integration fits from fitDist(..., method = "integrate"), the stored posterior density function in object$pdf is evaluated on an automatically chosen grid.

Value

Invisibly returns a data frame containing the plotted posterior density grid. The return value has attributes named "estimate" and "interval" when those values are available.

Examples

## Not run: 
data(Psurveys)
roux = Psurveys$roux
fit = fitDist(roux, method = "bayes")
plotPosterior(fit)

## End(Not run)

S3 predict method for an object of class `psFit`

Description

S3 predict method for an object of class psFit

Usage

## S3 method for class 'psFit'
predict(
  object,
  newdata,
  interval = c("none", "prof", "wald"),
  level = 0.95,
  ...
)

Arguments

object

an object of class psFit, usually from fitDist or fitZIDist.

newdata

an optional vector of integers at which to calculate \Pr(X = x).

interval

either "none", "prof", or "wald" and can be abbreviated. If "prof" or "wald" AND the zeta model has been used, then an interval, based on the bounds of a 100 * level confidence interval for the shape parameter, is given for each predicted probability. The interval is provided based on either a profile likelihood or a Wald confidence interval for shape, and therefore cannot really be regarded as a confidence interval for the probabilities. The intervals might be more sensibly regarded as a measure of how sensitive the probabilities are to the choice of shape parameter. NOTE: this parameter is ignored if the Zero-inflated (ZIZ) model has been used.

level

the level of a confidence interval. Ignored if interval == "none".

...

other arguments passed to predict—not used

Value

either a named vector of fitted probabilities, or a data.frame with columns predicted, lower, and upper and the row names set to show what terms are being calculated

Examples

data(Psurveys)
roux = Psurveys$roux
fit = fitDist(roux)
predict(fit, interval = "prof")

S3 print method for an object of class `psData`

Description

S3 print method for an object of class psData

Usage

## S3 method for class 'psData'
print(x, ...)

Arguments

x

an object of class psData, usually from readData or makePSData

...

other arguments passed to print

Value

No return value, called for side effects

S3 print method for an object of class `psFit`

Description

S3 print method for an object of class psFit

Usage

## S3 method for class 'psFit'
print(x, ...)

Arguments

x

an object of class psFit, usually from fitDist or #' fitZIDist.

...

other arguments passed to print.

Value

No return value, called for side effects.

Probability Functions

Description

Creates a probability function that allows the computation of any P or S term.

Usage

probfun(psFitobj)

Arguments

psFitobj

an object of class psFit–see fitDist and fitZIDist.

Value

a function that can be used to calculate any P or S term.

Examples

p = readData(system.file("extdata", "p.xlsx", package = "fitPS"))
fit = fitDist(p)
P = probfun(fit)
P(0:5)

Generate zero inflated zeta random variates

Description

Generate zero inflated zeta random variates

Usage

rZIzeta(n, pi = 0.5, shape = 2, offset = 0)

rzizeta(n, pi = 0.5, shape = 2, offset = 0)

rzizeta(n, pi = 0.5, shape = 2, offset = 0)

Arguments

n

the number of observations.

pi

the mixing parameter for the zero-inflated zeta model—must be in (0, 1).

shape

the shape parameter for the zero-inflated zeta. Must be greater than 1.

offset

the zeta distribution returns random variates that are greater than, or equal to one. If the offset is greater than 0, then the distribution is anchored on (has minimum value of) 1 - offset.

Details

Technically this function returns values from the one-inflated zeta distribution. However, if offset is greater than zero (and typically we expect it to be 1), then the minimium random variate value is 1 - offset. We chose the name "zero-inflated zeta" as more people are familiar with zero-inflated models.

Value

a vector of random variates from a zero-inflated zeta model

Examples

data(Psurveys)
roux = Psurveys$roux
fit.zi = fitZIDist(roux)
x = rZIzeta(n = sum(roux$data$rn), pi = fit.zi$pi, shape = fit.zi$shape)
table(x)

Read count data from file

Description

Reads observed counts of either the number of groups or the size of the groups. The file must have only two columns. One of the columns must be labelled P or S and the other count. It does not matter if the column names are in upper case or not. The P column can have labels 0, 1, 2, ... representing the observation of 0, 1, 2, or more groups. The corresponding count column should contain a positive (non-zero) count for each number of groups. Similarly, if the file contains S counts, then the S column can contain labels 1, 2, ... representing the observation of 1, 2, ... fragments in a group. Note that zeros are neither allowed, or useful, in the file as they both simply result in log-likelihood terms of zero, and therefore make no difference.

Usage

readData(fileName, notes = NULL, ...)

Arguments

fileName

the name of the file to be read. Must be either a modern (xlsx) Excel file or a csv file.

notes

any additional information about the data, such as the source or a reference.

...

any additional parameters which will be passed to either read_excel or read.csv depending on the extension of your input file.

Value

an object of class psData which is a list containing member variables:

type: – either "P" or "S"
data: – a data.frame which contains columns n and rn, representing the number of groups/fragments, and the number of times that was seen, respectively.
notes: — either a bibentry or a character string which allows extra information about the data to be stored, such as the source, or reference.

Examples

p = readData(system.file("extdata", "p.xlsx", package = "fitPS"))
p
s = readData(system.file("extdata", "s.xlsx", package = "fitPS"))
s

Generate random variates from a zeta distribution

Description

Generate random variates from a zeta distribution

Usage

rzeta(n, shape)

Arguments

n

Same as Poisson.

shape

The standard zeta shape parameter, greater than 1.

See rzeta.

S3 summary method for an object of class `psFit`

Description

S3 summary method for an object of class psFit

Usage

## S3 method for class 'psFit'
summary(object, ...)

Arguments

object

an object of class psFit, usually from fitDist or fitZIDist

...

other arguments passed to summary

Details

Experimental because I am unsure if it is useful. If object is a zero-inflated zeta fitted object, then the function carries out a likelihood ratio test for the value of pi. Currently not implemented for the logarithmic distribution because we are currently not interested in the logarithmic distribution.

Value

No return value, called for side effects

Variance generic

Description

Variance generic

Usage

var(x, ...)

Arguments

x

an object for which we want to compute the sample variance.

...

Any additional arguments to be passed to var.

An S3 method for computing the variance of clothing survey for the number of groups or size of groups

Description

An S3 method for computing the variance of clothing survey for the number of groups or size of groups

Usage

## S3 method for class 'psData'
var(x, ...)

Arguments

x

an object of class psData—readData for more details.

...

other arguments which are passed to sum

Value

the mean of the data. If there are r_i observations of the value n_i then the variance is computed by \mathrm{E}[X^2]-\mathrm{E}[X]^2, where \mathrm{E}[X] is computed using

\sum_i\frac{r_i\times n_i}{\sum_i{r_i}}

, and \mathrm{E}[X^2] is computed by

\sum_i\frac{r_i\times n_i^2}{\sum_i{r_i}}

. We realise that the computational formula, \mathrm{E}[X^2]-\mathrm{E}[X]^2, is usually not regarded as computationally stable, but the magnitude of the numbers involved is such that, that this is not likely to cause an issue.

Examples

data(Psurveys)
var(Psurveys$roux)

Package {fitPS}

S3 method for objects of class psData

Description

Usage

Arguments

Details

Value

Examples

Number of Groups of Glass Data

Description

Usage

Format

Source

References

Size of Groups of Glass Data

Description

Usage

Format

Source

References

Add data to a psData object

Description

Usage

Arguments

Value

Examples

Converts an object of class psData to a data.frame

Description

Usage

Arguments

Details

Value

Examples

Bootstrap confidence intervals or regions

Description

Usage

Arguments

Details

Value

Methods (by class)

Examples

Compare two surveys on the basis of their shape parameters

Description

Usage

Arguments

Details

Value

Methods (by class)

Functions

Examples

Compare two or more surveys on the basis of their shape parameters using a Likelihood Ratio Test

Description

Usage

Arguments

Details

Value

Examples

S3 confint method for objects of class psFit

Description

Usage

Arguments

Details

Value

Examples

Bayesian credible intervals or regions

Description

Usage

Arguments

Details

Value

Functions

Examples

Fit a Zeta Distribution to Forensic Data

Description

Usage

Arguments

Details

Value

Functions

References

S3 method for objects of class `psData`

Converts an object of class `psData` to a `data.frame`

S3 fitted method for an object of class `psFit`

S3 plot method for an object of class `psFit`

S3 predict method for an object of class `psFit`