Type: Package
Title: Bayesian Methods to Estimate the Proportion of Liars in Coin Flip Experiments
Version: 0.2.0
Author: David Hugh-Jones <davidhughjones@gmail.com>
Maintainer: David Hugh-Jones <davidhughjones@gmail.com>
Description: Implements Bayesian methods, described in Hugh-Jones (2019) <doi:10.1007/s40881-019-00069-x>, for estimating the proportion of liars in coin flip-style experiments, where subjects report a random outcome and are paid for reporting a "good" outcome.
License: MIT + file LICENSE
URL: https://github.com/hughjonesd/truelies
BugReports: https://github.com/hughjonesd/truelies/issues
Imports: hdrcde
Suggests: dplyr, ggplot2, MASS, purrr, tidyr
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.1.1
NeedsCompilation: no
Packaged: 2019-08-26 20:30:58 UTC; david
Repository: CRAN
Date/Publication: 2019-08-26 20:40:03 UTC

truelies: Bayesian Methods to Estimate the Proportion of Liars in Coin Flip Experiments

Description

Implements Bayesian methods, described in Hugh-Jones (2019) <doi:10.1007/s40881-019-00069-x>, for estimating the proportion of liars in coin flip-style experiments, where subjects report a random outcome and are paid for reporting a "good" outcome.

Usage

To estimate the proportion of liars in an experiment, use update_prior() followed by dist_mean():

posterior <- update_prior(heads = 33, N = 50, P = 0.5, prior = dunif)
dist_mean(posterior)

To get confidence intervals for an estimate, use dist_hdr():

dist_hdr(posterior, conf_level = 0.95)

To test whether two different samples have the same proportion of liars, use difference_dist() followed by dist_hdr():

p2 <- update_prior(heads = 42, N = 49, P = 0.5, prior = dunif)
dd <- difference_dist(posterior, p2)
dist_hdr(dd, 0.95, bounds = c(-1, 1))

To test power for detecting a given proportion of liars, use power_calc():

power_calc(N = 100, P = 0.5, lambda = 0.2)

To test power for detecting differences between groups, use power_calc_difference():

power_calc_difference(N1 = 100, P = 5/6, lambda1 = 0.1, lambda2 = 0.25)

To compare different samples by empirical Bayes estimation, use empirical_bayes():

heads <- c(Baseline = 30, Treatment1 = 38, Treatment2 = 45)
N <- c(50, 52, 57)
result <- empirical_bayes(heads, N, P = 0.5)

Testing the package

To run tests on the package:

source(system.file("test-statistics.R", package = "truelies"))

You will need dplyr, purrr, tidyr and ggplot2 installed.

This will take some time and will produce data frames of test results for different parameter values, along with some plots.

Author(s)

David Hugh-Jones

References

Hugh-Jones, David (2019). True Lies: Comment on Garbarino, Slonim and Villeval (2018). Journal of the Economic Science Association. https://link.springer.com/article/10.1007/s40881-019-00069-x.

See Also

Useful links:


Calculate probability that one posterior is larger than another

Description

Given two distributions with density functions \phi_1, \phi_2, this calculates:

\int_0^1 \int_0^{l_1}\phi_1(l_1) \phi_2(l_2) d l_2 d l_1,

the probability that the value of the first distribution is greater.

Usage

compare_dists(dist1, dist2)

Arguments

dist1

Density of distribution 1, as a one-argument function.

dist2

Density of distribution 2.

Value

A probability scalar.

Examples


d1 <- update_prior(30, 50, P = 0.5, prior = stats::dunif)
d2 <- update_prior(25, 40, P = 0.5, prior = stats::dunif)
compare_dists(d1, d2)


Find density of the difference of two distributions

Description

Given two probability density functions dist1 and dist2, difference_dist returns the density of “dist1 - dist2'.

Usage

difference_dist(dist1, dist2)

Arguments

dist1, dist2

Probability density functions

Details

At the moment this only works when dist1 and dist2 are defined on [0, 1].

Value

A probability density function defined on [-1, 1].

Examples


d1 <- update_prior(30, 50, P = 0.5, prior = stats::dunif)
d2 <- update_prior(32, 40, P = 0.5, prior = stats::dunif)
dd <- difference_dist(d1, d2)
dist_hdr(dd, 0.95)

Compute highest density region for a density function

Description

This is a wrapper for hdrcde::hdr. The highest density region is the interval that covers conf_level of the data and has the highest average density. See:

Usage

dist_hdr(dist, conf_level, bounds = attr(dist, "limits"))

Arguments

dist

A one-argument function

conf_level

A scalar between 0 and 1

bounds

A length 2 vector of the bounds of the distribution's support

Details

Rob J Hyndman (1996) “Computing and graphing highest density regions”. American Statistician, 50, 120-126.

Value

A length 2 vector of region endpoints

Examples


d1 <- update_prior(33, 50, P = 0.5, prior = stats::dunif)
dist_hdr(d1, 0.95)


Find mean of a probability density function

Description

Find mean of a probability density function

Usage

dist_mean(dist, l = attr(dist, "limits")[1], r = attr(dist,
  "limits")[2])

Arguments

dist

A one-argument function returned from update_prior()

l

Lower bound of the density's support

r

Upper bound of the density's support

Value

A scalar

Examples


d1 <- update_prior(10, 40, P = 5/6, prior = stats::dunif)
dist_mean(d1)


Find quantiles given a probability density function

Description

Find quantiles given a probability density function

Usage

dist_quantile(dist, probs, bounds = attr(dist, "limits"))

Arguments

dist

A one argument function

probs

A vector of probabilities

bounds

A length 2 vector of the bounds of the distribution's support

Value

A vector of quantiles

Examples


d1 <- update_prior(33, 50, P = 0.5, prior = stats::dunif)
dist_quantile(d1, c(0.025, 0.975))


Estimate proportions of liars in multiple samples using empirical Bayes

Description

This function creates a prior by fitting a Beta distribution to the heads/N vector, using MASS::fitdistr(). The prior is then updated using data from each individual sample to give the posterior distributions.

Usage

empirical_bayes(heads, ...)

## Default S3 method:
empirical_bayes(heads, N, P, ...)

## S3 method for class 'formula'
empirical_bayes(formula, data, P, subset, ...)

Arguments

heads

A vector of numbers of the good outcome reported

...

Ignored

N

A vector of sample sizes

P

Probability of bad outcome

formula

A two-sided formula of the form heads ~ group. heads is a logical vector specifying whether the "good" outcome was reported. group specifies the sample.

data

A data frame or matrix. Each row represents one individual.

subset

A logical or numeric vector specifying the subset of data to use

Details

The formula interface allows calling the function directly on experimental data.

Value

A list with two components:

Examples


heads <- c(Baseline = 30, Treatment1 = 38, Treatment2 = 45)
N <- c(50, 52, 57)
res <- empirical_bayes(heads, N, P = 0.5)

compare_dists(res$posteriors$Baseline, res$posteriors$Treatment1)
plot(res$prior, ylim = c(0, 4), col = "grey", lty = 2)
plot(res$posteriors$Baseline, add = TRUE, col = "blue")
plot(res$posteriors$Treatment1, add = TRUE, col = "orange")
plot(res$posteriors$Treatment2, add = TRUE, col = "red")


# starting from raw data:
raw_data <- data.frame(
        report = sample(c("heads", "tails"),
          size = 300,
          replace = TRUE,
          prob = c(.8, .2)
        ),
        group = rep(LETTERS[1:10], each = 30)
    )
empirical_bayes(I(report == "heads") ~ group, data = raw_data, P = 0.5)

Calculate power to detect non-zero lying

Description

This uses simulations to estimate the power to detect a given level of lying in a sample of size N by this package's methods.

Usage

power_calc(N, P, lambda, alpha = 0.05, prior = stats::dunif,
  nsims = 200)

Arguments

N

Total number in sample

P

Probability of bad outcome

lambda

Probability of a subject lying

alpha

Significance level to use for the null hypothesis

prior

Prior over lambda. A function which takes a vector of values between 0 and 1, and returns the probability density. The default is the uniform distribution.

nsims

Number of simulations to run

Value

Estimated power, a scalar between 0 and 1.

Examples


power_calc(N = 50, P = 0.5, lambda = 0.2)


Estimate power to detect differences in lying between two samples

Description

Using simulations, estimate power to detect differences in lying using compare_dists(), given values for \lambda, the probability of lying, in each sample.

Usage

power_calc_difference(N1, N2 = N1, P, lambda1, lambda2, alpha = 0.05,
  alternative = c("two.sided", "greater", "less"),
  prior = stats::dunif, nsims = 200)

Arguments

N1

N of sample 1

N2

N of sample 2

P

Probability of bad outcome

lambda1

Probability of lying in sample 1

lambda2

Probability of lying in sample 2

alpha

Significance level

alternative

"two.sided", "greater" (sample 1 is greater), or "less". Can be abbreviated

prior

Prior over lambda. A function which takes a vector of values between 0 and 1, and returns the probability density. The default is the uniform distribution.

nsims

Number of simulations to run

Value

Estimated power, a scalar between 0 and 1.

Examples


power_calc_difference(N1 = 100, P = 0.5, lambda = 0, lambda2 = 0.25)


Print/plot an object of class densityFunction.

Description

Print/plot an object of class densityFunction.

Usage

## S3 method for class 'densityFunction'
print(x, ...)

## S3 method for class 'densityFunction'
plot(x, ...)

Arguments

x

The object

...

Unused

Examples


d1 <- update_prior(33, 50, P = 0.5, prior = stats::dunif)
d1
plot(d1)

# show the actual R code (techies only)
unclass(d1)


Calculate posterior distribution of the proportion of liars

Description

update_prior uses the equation for the posterior:

\phi(\lambda | R; N,P) = Pr(R|\lambda; N,P) \phi(\lambda) / \int Pr(R | \lambda'; N,P) \phi(\lambda') d \lambda'

where \phi is the prior and Pr(R | \lambda; N, P) is the probability of R reports of heads given that people lie with probability \lambda:

Pr(R | \lambda; N, P) = binom(N, (1-P) + \lambda P)

Usage

update_prior(heads, N, P, prior = stats::dunif, npoints = 1000)

Arguments

heads

Number of good outcomes reported

N

Total number in sample

P

Probability of bad outcome

prior

Prior over lambda. A function which takes a vector of values between 0 and 1, and returns the probability density. The default is the uniform distribution.

npoints

How many points to integrate on?

Value

The probability density of the posterior distribution, as a one-argument function.

Examples

posterior <- update_prior(heads = 30, N = 50, P = 0.5, prior = stats::dunif)
plot(posterior)

mirror server hosted at Truenetwork, Russian Federation.