Type: | Package |
Title: | Nonparametric Statistical Methods |
Version: | 2.0.0 |
Date: | 2024-05-25 |
Author: | John Kloke [aut, cre], Joseph McKean [aut] |
Maintainer: | John Kloke <johndkloke@gmail.com> |
Description: | Accompanies the book "Nonparametric Statistical Methods Using R, 2nd Edition" by Kloke and McKean (2024, ISBN:9780367651350). Includes methods, datasets, and random number generation useful for the study of robust and/or nonparametric statistics. Emphasizes classical nonparametric methods for a variety of designs — especially one-sample and two-sample problems. Includes methods for general scores, including estimation and testing for the two-sample location problem as well as Hogg's adaptive method. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
LazyLoad: | yes |
LazyData: | yes |
URL: | https://github.com/kloke/npsm, https://github.com/kloke/book |
Depends: | R (≥ 3.5.0), Rfit |
Imports: | methods, class, plyr |
Suggests: | boot, survival, sm, HSAUR2, remotes, profileR, car, dplyr, tree |
NeedsCompilation: | no |
Packaged: | 2024-05-26 02:51:46 UTC; john |
Repository: | CRAN |
Date/Publication: | 2024-05-26 08:00:04 UTC |
Hogg's Q1 and Q2.
Description
Q1 is a measure of skewness and Q2 is a measure of tail heaviness.
Usage
Q1(z)
Arguments
z |
n by 1 vector |
Details
Used as selector statistics in adaptive schemes. Both Q1 and Q2 are ratios. For Q1, the numerator is upper 5% mean minus the middle 50% mean, while the denominator is difference between the middle 5% mean and the lower 5% mean. For Q2, the numerator is upper 5% mean minus the lower 5% mean, while the denominator is difference between the upper 50% mean and the lower 50% mean. These statistics are not robust.
Value
Returns the calculated ratio as a numeric scalar.
Author(s)
John Kloke
References
Hogg, R. McKean, J, Craig, A (2013) Introduction to Mathematical Statistics, 7th Ed. Boston: Pearson.
See Also
Cyclone Data
Description
A data set discussed in Hollander and Wolfe (1999) and Exercise 5.8.9 of Kloke and McKean (2014)/Exercise 5.9.15 of Kloke and McKean (2024). It contains part of a study on the effects of cloud seeding of cyclones.
Usage
data(SCUD)
Format
Twenty-one observations on three variables.
trt
treatment indicator (1) is Seeded and (2) is control
M
predictor M, the geostrophic meridional circulation index
RI
measure of precipitation
References
Hollander, M. and Wolfe, D.A. (1999), Nonparametric Statistical Methods, New York: Wiley.
Kloke, J. and McKean, J.W. (2014), Nonparametric statistical methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistical methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
Examples
plot(RI ~ M,data=SCUD)
Analysis of Covariance Example for a two by three two-way design
Description
This a simulated data set which is used as an example of analysis of covariance. The data frame acov231 contains the data. The responses are in column 1, column 2 contains the levels of factor A, column 3 contains the levels of factor B, and the 4th column contains the covariate. All true parameters (effects) are 0 in this generated data set.
Usage
data(acov231)
Format
A data frame with 33 observations and 4 variables.
response
numeric. the response.
fA
numeric. factor A with 2 levels.
fB
numeric. factor B with 3 levels.
covariate
numeric. a covariate.
References
Kloke, J. and McKean J.W. (2014), Nonparametric Statistical Methods using R, Boca Raton, FL: Chapman-Hall.
Examples
levs = c(2,3)
data = acov231[,1:3]
xcov = matrix(acov231[,4],ncol=1)
temp = kancova(levs,data,xcov)
Aligned Rank Test
Description
Aligned rank test for a group/treatment effect after adjusting for covariates.
Usage
aligned.test(x, y, g, scores = Rfit::wscores,...)
Arguments
x |
n by p design matrix |
y |
n by 1 response vector |
g |
n by 1 vector denoting group/treatment membership. |
scores |
Which scores should be used for the fit and the test. An object of class scores. |
... |
optional arguments. passed to rfit. |
Details
Data are aligned based on the design matrix x using a rank-based fit via rfit.
Value
statistic |
The value of the test statistic. |
p.value |
The p-value based on a chisq(k-1) distribution where k is the number of groups/treatments. |
Author(s)
John Kloke
References
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
See Also
rfit
Examples
y<-rt(30,2)
x<-runif(30)
g<-rep(1:3,each=10)
aligned.test(x,y,g)
Career Information for a Random Sample of 1000 Baseball Players
Description
Demographics and position information on 1000 randomly selected baseball players who debuted after 1945.
Usage
data("baseball_players1000")
Format
A data frame with 1000 observations on the following 28 variables.
playerID
a character vector
birthYear
a numeric vector
birthMonth
a numeric vector
birthDay
a numeric vector
birthCountry
a character vector
birthState
a character vector
nameFirst
a character vector
nameLast
a character vector
weight
a numeric vector
height
a numeric vector
bats
a character vector
throws
a character vector
debutYear
a numeric vector
G_all
a numeric vector
G_p
a numeric vector
G_c
a numeric vector
G_1b
a numeric vector
G_2b
a numeric vector
G_3b
a numeric vector
G_ss
a numeric vector
G_lf
a numeric vector
G_cf
a numeric vector
G_rf
a numeric vector
G_of
a numeric vector
G_dh
a numeric vector
G_ph
a numeric vector
G_pr
a numeric vector
pitcher
a logical vector
Details
A random subset of baseball players who debuted after 1945 and played in at least 160 games. Includes information on birth (date and location); height (inches) and weight (pounds); whether they bat left (L), right (R), or switch (B); and games played at each postion. The variable pitcher is a derived variable based on if the majority of games were played as a pitcher (i.e.; G_pr/G_all > 0.5).
Source
https://github.com/chadwickbureau/baseballdatabank
References
https://github.com/chadwickbureau/baseballdatabank/blob/master/readme2014.txt
Examples
data(baseball_players1000)
hist(baseball_players1000$weight,xlab="Weight (lbs)",
probability=TRUE, ylim=c(0,0.02),
main="Histogram of Weight for 1000 Baseball Players")
lines(density(baseball_players1000$weight,na.rm=TRUE))
Batting statistics for the 2010 baseball season.
Description
Batting (average, home runs, RBIs) statistics for 2010 full time players. By full time we mean that the batter had at least 450 official at bats during the season.
Usage
data(bb2010)
Format
A data frame with 122 observations on the following 3 variables.
ave
batting average
hr
home runs
rbi
runs batted in
Source
baseballguru.com
Examples
plot(hr~ave,data=bb2010)
Blood plasma measurements related to total triglyceride level
Description
Data table from Table 9.11 of Hollander and Wolfe (1999). The data consists of triglyceride levels on 13 patients. Two factors, each at two levels, were recorded: Sex and Obesity. The concomitant variables are chylomicrons, age, and three lipid variables (very low-density lipoproteins (VLDL), low-density lipoproteins (LDL), and high-density lipoproteins (HDL)).
Usage
data(blood.plasma)
Format
A data frame with 13 observations on 8 variables.
Total
Triglyceride level, response
Sex
Sex, 2 levels
Obese
Obesity, 2 levels
Chylo
Chylomicrons, covariate
VLDL
Very low density, lipids, covariate
LDL
Low density, lipids, covariate
HDL
High density, lipids, covariate
Age
Age
Source
Hollander, M. and Wolfe, D.A. (1999), Nonparametric Statistical Methods, New York: Wiley.
References
Hollander, M. and Wolfe, D.A. (1999), Nonparametric Statistical Methods, New York: Wiley.
Examples
data(blood.plasma)
plot(Total~Age,data=blood.plasma)
boxplot(Total~Obese,data=blood.plasma)
Basic Summaries of Boxscores for the Milwaukee Brewers 1982 Season
Description
Basic Summaries of Boxscores for the Major League Baseball team Milwaukee (WI) Brewers 1982 Season. The Brewers won the American League championship that year. Brewer, Robin Yount won the Most Valueable Player (MVP) award. #Robin Yount. MVP.
Usage
data("brewers1982")
Format
A data frame with 163 observations on the following 8 variables.
Date
a character vector
Opp
a character vector
R
a numeric vector
RA
a numeric vector
Time
a character vector
Attendance
a numeric vector
home
a logical vector
win
a logical vector
Examples
data(brewers1982)
# proportion of wins for a given number of runs scored
pwin <- with(brewers1982,tapply(win,R,mean))
pwin
# graphical display of the above
plot(names(pwin),pwin,xlab='Runs', ylab='Proportion of Wins',main='Brewers 1982')
Survival time based on two treatments
Description
Survival times (in days) for undergoing standard treatment (S) and a new treatment (N).
Usage
data("cancertrt")
Format
A data frame with 17 observations on the following 3 variables.
time
Survival time in days
event
Indicator for event
trt
a factor with levels
N
S
References
Higgins (2004), Introduction to Modern Nonparametric Statistics, Pacific Grove, CA:Brooks/Cole–Thomson Learning
Examples
data(cancertrt)
with(cancertrt,gehan.test(time,event,trt))
Center Matrix
Description
Centers a matrix.
Usage
centerx(x)
Arguments
x |
a matrix |
Details
Returns a centered matrix, i.e., each column of the matrix is replaced by deviations from its column mean.
Value
The centered matrix.
Author(s)
John Kloke, Joseph McKean
See Also
scale
Examples
x <- cbind(seq(1,5,length=5),seq(10,20,length=5))
xc <- centerx(x)
apply(xc,1,mean)
Cloud Dewpoint
Description
A regression example with response cloud point of a liquid and predictor the percent of Iodine 8 added to the liquid; see Chapter 3 of Hettmansperger and McKean (2011) or Exercise 4.9.10 of Kloke and McKean (2014)/Exercise 4.7.7 of Kloke and McKean (2024).
Usage
data(cloud)
Format
Nineteen observations on two variables.
cloud.point
Cloud point of the liquid
I8
Percent Iodine 8 added
Source
Draper, N.R. and Smith, H. (1966), Applied Regression Analysis, New York: John Wiley and Sons.
References
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
Kloke, J. and McKean, J.W. (2014), Nonparametric statistical methods Using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistical methods Using R, Second Edition, Boca Raton, FL: Chapman-Hall.
Examples
rfit(cloud.point ~ I8,data=cloud)
Confidence interval for a correlation based on a bootstrap.
Description
Returns a bootstrap confidence interval for any of the correlations available in the base R
cor
function.
Usage
cor.boot.ci(x, y, method = "spearman", conf = 0.95, nbs = 3000)
Arguments
x |
n by 1 vector |
y |
n by 1 vector |
method |
Which correlation to use. Argument passed to |
conf |
Confidence level. |
nbs |
number of bootstrap samples to base CI on. |
Details
Obtains a percentile bootstrap confidence interval.
The bootstrap samples are obtained via the function boot
.
Value
A confidence interval.
Author(s)
John Kloke, Joseph McKean
See Also
See Also as cor
Examples
library(boot)
with(bb2010,cor.boot.ci(ave,hr))
Energy as a Function of temperature difference.
Description
A regression example with response energy output in watts and the predictor temperature difference in degrees Kevin; see Devore (2012) and Exercise 4.9.11 of Kloke and McKean (2014)/Exercise 4.7.8 of Kloke and McKean (2024).
Usage
data(energy)
Format
Twenty-four observations on two variables.
output
Energy output in watts
temp.diff
Temperature difference in K
Source
Devore, J. (2012), Probaility and statistics for engineering and the sciences, 8th ed., Boston: Brooks/Cole.
References
Kloke, J. and McKean, J.W. (2014), Nonparametric statistical methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistical methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
Examples
rfit(output ~ temp.diff,data=energy)
Rounding First Base.
Description
The amount of time it took 22 baseball players to round first base for each of three methods of rounding.
Usage
data(firstbase)
Format
A data frame with 22 observations on the following 3 variables.
round.out
Time when using round out method.
narrow.angle
Time when using narrow angle method.
wide.angle
Time when using wide angle method.
Details
Rounding methods are illustrated in Figure 7.1 of Hollander and Wolfe (1999).
Source
Hollander, M. and Wolfe, D.A. (1999), Nonparametric Statistical Methods, New York: Wiley.
References
Hollander, M. and Wolfe, D.A. (1999), Nonparametric Statistical Methods, New York: Wiley.
Two-sample Fligner-Kileen test for homogeneous scales.
Description
Returns the Fligner-Kileen test for homogeneous scales for two-samples. Also estimates of ratio of scales based on the logs of folded median-aligned samples and a corresponding confidence interval is computed. fk.test computes the value of the statistic based on squared-normal scores following the optimal (for normal errors) such test described in Section 2.10 of Hettmansperger and McKean (2011). Hence, it will differ from the core R routine fligner.test; see the discussion in Section 3.3 of Kloke and McKean (2014)/Section 3.5 of Kloke and McKean (2024).
Usage
fk.test(x,y,alternative = c("two.sided", "less", "greater"),conf.level = 0.95)
Arguments
x |
vector of first sample responses |
y |
vector of second sample responses |
alternative |
alternative indicator for hypotheses |
conf.level |
confidence coefficient for the returned confidence intervals |
Details
Returns the Fligner-Kileen test for the two-sample scale problem.
Value
statistic |
chi-squared test statistic |
p.value |
p-value of the test |
estimate |
vector of estimates of ratio of scales |
conf.int |
table of confidence intervals |
Author(s)
John Kloke, Joseph McKean
References
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall.
Kloke, J. and McKean, J.W. (2024), Nonparametric statistcal methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
See Also
fkk.test
Examples
x<-rnorm(18)
y<-rnorm(22)*3
fk.test(x,y)
k-Sample version of the Fligner-Kileen test for homogeneous scales.
Description
Returns the Fligner-Kileen test for homogeneous scales for k-samples. Also estimates of ratio of scales based on the logs of folded median-aligned samples and a corresponding confidence interval is computed. The first level (sample) is referenced. See the discussion in Section 5.7 of Kloke and McKean (2014)/Section 5.8 of Kloke and McKean (2024).
Usage
fkk.test(y,ind,conf.level = 0.95)
Arguments
y |
vector of responses |
ind |
vector of corresponding levels |
conf.level |
confidence coefficient for the returned confidence intervals |
Details
Returns the Fligner-Kileen test for the k-sample scale problem.
Value
statistic |
chi-squared test statistic |
p.value |
p-value of the test |
estimate |
vector of estimates of ratio of scales |
conf.int |
table of confidence intervals |
cwts |
vector of weights based on the estimates difference in scales |
Author(s)
John Kloke, Joseph McKean
References
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistcal methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
See Also
fk.test
Examples
y1 <- rnorm(10)
y2 <- rnorm(12)*3
y3 <- rnorm(15)*5
y<-c(y1,y2,y3)
ind<-rep(1:3,times=c(10,12,15))
fkk.test(y,ind)
Placement Test for the Behrens-Fisher problem.
Description
Returns the test based on placements for the Behrens-Fisher problem. This test was developed by Fligner and Policello (1981); see, also, Section 2.11 of Hettmansperger and McKean (2011) and Section 4.4 of Hollander and Wolfe (1999). The version computed by fp.test is discussed in Section 3.4 of Kloke and McKean (2014)/Section 3.6 of Kloke and McKean (2024).
Usage
fp.test(x,y,delta0=0,alternative = "two.sided")
Arguments
x |
vector of first sample responses |
y |
vector of second sample responses |
delta0 |
null value tested |
alternative |
alternative indicator for hypotheses |
Details
Returns the Placement Test for the Behrens-Fisher problem.
Value
statistic |
chi-squared test statistic |
p.value |
p-value of the test |
numerator |
numerator of test statistic |
denominator |
denominator of test statistic |
Author(s)
John Kloke, Joseph McKean
References
Fligner, M.~A. and Policello, G.~E. (1981), Robust rank procedures for the Behrens-Fisher problem, Journal of the American Statistical Association, 76, 162–168.
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
Hollander, M. and Wolfe, D.~A. (1999), Nonparametric statistical methods, 2nd Edition, New York: John Wiley and Sons.
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistcal methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
Gehan generalization the Wilcoxon two-sample test
Description
Generalization of the Wilcoxon rank sum which allows for censored data.
Usage
gehan.test(time, event, trt)
Arguments
time |
Time of event or of censoring |
event |
Indicator variable representing a event occur or not (time is censored) |
trt |
Variable indicating treatment group. |
Value
statistic |
Value of the test statistic |
p.value |
p-value |
Author(s)
John Kloke
References
Higgins (2004), Introduction to Modern Nonparametric Statistics, Pacific Grove, CA:Brooks/Cole–Thomson Learning
Examples
n<-76
y<-rexp(n)
event<-rbinom(n,1,0.7) # about 30% censored
trt<-sample(c(0,1),n,replace=TRUE)
gehan.test(y,event,trt)
Design Function for Robust Analysis of Covariance
Description
Returns the hetrogeneous slopes design matrix used in ANCOVA. It refereences the first level.
Usage
getxact(amat,bmat)
Arguments
amat |
cell mean design matrix of factor. |
bmat |
matrix of covariates. |
Details
Returns the heterogeneous slopes analysis of covariance matrix.
Value
cmat |
heterogeneous slopes analysis of covariance matrix |
References
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall.
Design Function for Robust Analysis of Covariance
Description
Returns the hetrogeneous slopes design matrix used in ANCOVA. It refereences the first level. Also, column names are supplied.
Usage
getxact2(amat,bmat)
Arguments
amat |
cell mean design matrix of factor. |
bmat |
matrix of covariates. |
Details
Returns the heterogeneous slopes analysis of covariance matrix.
Value
cmat |
heterogeneous slopes analysis of covariance matrix eith columns named |
References
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall.
Hemorrhage data from Dupont.
Description
Hemorrhage data from Dupont.
Usage
data(hemorrhage)
Format
A data frame with 71 observations on the following 3 variables.
genotype
a numeric vector
time
a numeric vector
recur
a numeric vector
References
Dupont
Examples
data(hemorrhage)
## maybe str(hemorrhage) ; plot(hemorrhage) ...
Hodges-Lehmann type estimation and confidence intervals.
Description
Hodges-Lehmann type estimation and confidence intervals.
Usage
hodges_lehmann.ci(x, y, var.equal = FALSE, conf.level = 0.95, ...)
Arguments
x |
numeric vector. |
y |
numeric vector. |
var.equal |
logical. Assume scales are equal (TRUE) of not (FALSE). |
conf.level |
confidence level to be used for the confidence interval. |
... |
optional arguments. currently unused. |
Details
Currently implements 2-sample estimation and confidence intervals based on methods purposed by Hodges and Lehnmann.
Value
estimate |
parameter point estimate |
stderr |
estimated standard error of point estimate |
conf.int |
estimated confidence interval |
Author(s)
John Kloke, Joseph McKean
References
Hollander, M. and Wolfe, D.A. (1999), Nonparametric Statistical Methods, New York: Wiley.
Kloke, J. and McKean, J.W. (2014), Nonparametric statistical methods using R, Boca Raton, FL: Chapman-Hall.
See Also
Examples
zoo<-c(390,258,298,255,324,240,416,319,225,284)
rh <- c(187,186,179,269,382,264,353 ,38,350,267,229,383,254,302,195, 43,337,390)
hodges_lehmann.ci(zoo,rh)
Relapse-Free Survival Times for Hodgkin's Disease Patients
Description
These data are described in Example~11.7 of Hollander and Wolfe (1999). Results from a clinical trial in early Hodgkin's disease. Subjects received one of two treatments: radiation of affected node (AN) or total nodal radiation (TN).
Usage
data("hodgkins")
Format
A data frame with 49 observations on the following 3 variables.
time
Survival time
relapse
Indicator variable for relapse
trt
treatment: a factor with levels
AN
TN
References
Hollander, M. and Wolfe, D.A. (1999), Nonparametric Statistical Methods, New York: Wiley.
Hogg's Adaptive Test
Description
Based on selector statistics (Q1 & Q2) one of four score functions is choosen. A rank test and p-value is then calculated based on it.
Usage
hogg.test(x, y, ...)
Arguments
x |
n by 1 vector |
y |
m by 1 vector |
... |
additional arguments. currently not used |
Value
statistic |
Value of the test statistic. |
p.value |
p-value based on a normal approximation. |
scores |
Which of the score functions was choosen. |
Author(s)
John Kloke, Patrick Kimes
References
Hogg, R. McKean, J, Craig, A (2013) Introduction to Mathematical Statistics, 7th Ed. Boston: Pearson.
Examples
hogg.test(rt(20,1),rt(22,1)+0.2)
Analysis of Covarince Data Set
Description
A data set presented on Page 496 of huitema (2011). The design is a 2 by 2 with one covariate.
Usage
data(huitema496)
Format
A 16 by 4 array with the following 4 columns:
y
number of novel responses.
i
type of reinforcement (2 levels).
j
type of program (2 levels).
x
covariate, a measure of verbal fluency.
Details
Discussion can be found in both references listed below.
Source
Huitema, B.E. (2011), The analysis of covariance and alternatives, 2nd ed., New York: Wiley.
References
Huitema, B.E. (2011), The analysis of covariance and alternatives, 2nd ed., New York: Wiley.
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall.
Examples
huitema496 <- data.frame(huitema496)
fit <- rfit(y~factor(i)+factor(j)+x,data=huitema496)
summary(fit)
Insulating Fluid Data
Description
Study the breakdown time of an electrical insulating fluid subject to seven different levels of voltage stress.
Usage
data("insulation")
Format
A data frame with 76 observations on the following 2 variables.
log.stress
log of voltage stress
log.time
log of failure time
Source
Nelson, W. (1982), Applied lifetime data analysis, New York: John Wiley and Sons.
Lawless, J.F. (1982), Statistical models and methods for lifetime data, New York: John Wiley and Sons.
References
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
Examples
myscores <- logGFscores
myscores@param <- c(1,5)
fit <- rfit(log.time ~ log.stress,scores=myscores,data=insulation)
summary(fit)
fit$tauhat
Internal Functions
Description
Internal functions not intended for general use. Used in calculation of Hogg's Qs.
Usage
lmean(z, p)
Arguments
z |
n by 1 vector |
p |
scalar |
Value
Returns the calculated value as a numeric scalar.
Author(s)
John Kloke, Joseph McKean
See Also
Jonckheere's Test for Ordered Alternatives
Description
Computes Jonckheere's Test for Ordered Alternatives; see Section 5.6 of Kloke and McKean (2014)/Section 5.7 of Kloke and McKean (2024).
Usage
jonckheere(y, groups)
Arguments
y |
vector of responses |
groups |
vector of associated groups (levels) |
Details
Computes Jonckheere's Test for Ordered Alternatives. The main source was downloaded from the site:
smtp.biostat.wustl.edu/sympa/biostat/arc/s-news/2000-10/msg00126.html
Value
Jonckheere |
test statistic |
ExpJ |
null expectation |
VarJ |
null variance |
p |
p-value |
Author(s)
John Kloke, Joseph McKean
References
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistcal methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
smtp.biostat.wustl.edu/sympa/biostat/arc/s-news/2000-10/msg00126.html
Examples
r<-rnorm(30)
gp<-c(rep(1,10),rep(2,10),rep(3,10))
jonckheere(r,gp)
Robust Analysis of Covariance under Heterogeneous Slopes for a k-way layout
Description
Returns a robust rank-based analysis of covariance for a k-way layout assuming heterogenous slopes; see Section 5.4 of Kloke and McKean (2014)/Sections 5.6 and 7.3 of Kloke and McKean (2024). Currently only wilcoxon scores are used.
Usage
kancova(levs,data,xcov,print.table=TRUE)
Arguments
levs |
vector of levels corresponding to the factors A, B, C, etc. |
data |
matrix with response in column 1 and level in column 2 |
xcov |
matrix of covariates |
print.table |
logical indicating a table should be printed |
Details
Returns the analysis of covariance table assuming heterogenous slopes for a k-way layout.
Value
tab2 |
analysis of covariance |
fint |
rank-based ful model (heterogenous slopes |
fithomog |
rank-based ful model (homogeneous slopes |
Author(s)
John Kloke, Joseph McKean
References
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistcal methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
Examples
levels <- c(2,2)
y.group <- huitema496[,c('y','i','j')]
xcov <- huitema496[,'x']
kancova(levels,y.group,xcov)
routine used in the ANCOVA table obtained by kancova
Description
routine used in making the display of the ANCOVA table obtained by kancova.
Usage
kancovarown(vec)
Arguments
vec |
vector to be labeled. |
Details
Returns the labels.
Value
nm |
vector of labels |
References
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall.
Train a k nearest neighbors (knn) classifer via cross validation (cv).
Description
Train a k nearest neighbors (knn) classifer via cross validation (cv). The number of folds and the set of the number of neihbors to consider may be specified.
Usage
knn_cv(xy, k.cv = 5, kvec = seq(1, 47, by = 2))
Arguments
xy |
Data frame with the data matrix x as the first set of columns and the vector y as the last column. |
k.cv |
scalar. number of folds to use. default is 5. |
kvec |
vector. set of neighbors to consider. default is odd integers between 1 and 47 (inclusive). |
Value
kvec |
set of neighbors considered |
error |
vector of misclassification error rates corresponding to kvec |
k.best |
number of neighbors with lowest error rate |
k.cv |
number of folds to used |
Author(s)
John Kloke
References
Hastie, T., Tibshiani, R., and Friedman, J. (2017), The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition, New York: Springer.
James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013), An Introduction to Statistical Learning with Applications in R, New York: Springer.
Venables, W. N. and Ripley, B. D. (2002) _Modern Applied Statistics with S._ Fourth edition. Springer.
See Also
knn
Examples
train_set <- sim_class2[sim_class2$train==1,-1]
set.seed(19180511)
fit_cv <- knn_cv(train_set,k.cv=10)
fit_cv
Chateau Latour Wine Data
Description
The response variable is the quality of a vintage based on a scale of 1 to 5 over the years 1961 to 2004. The predictor is end of harvest, days between August 31st and the end of harvest for that year, and the factor of interest is whether or not it rained at harvest time.
Usage
data(latour)
Format
A data frame with 44 rows and 4 columns.
year
Year of harvest
quality
Rating on a scale of 1-5
end.of.harvest
Days August 31 and the end of harvest
rain
indicator variable for rain
References
Sheather, SJ (2009), A Modern Approach to Regression with R, New York: Springer.
Examples
data(latour)
plot(quality~end.of.harvest,pch='',data=latour)
points(quality~end.of.harvest,data=latour[latour$rain==0,],pch=3)
points(quality~end.of.harvest,data=latour[latour$rain==1,],pch=4)
Mood Median Confidence Interval
Description
Mood's classical nonparametric method for calculating a difference in population medians.
Usage
mood.ci(x, y, var.equal = FALSE, conf.level = 0.95, ...)
Arguments
x |
n x 1 vector |
y |
m x 1 vector |
var.equal |
Logical. Assume scale of the two populations are equal. |
conf.level |
numeric value. confidence level for the confidence interval. |
... |
not currently implmented |
Value
A vector of length 2 containing the lower and upper endpoints of the confidence interval.
Author(s)
John Kloke, Joseph McKean
References
Hollander, M. and Wolfe, D.A. (1999), Nonparametric Statistical Methods, New York: Wiley.
Kloke, J. and McKean, J.W. (2014), Nonparametric statistical methods using R, Boca Raton, FL: Chapman-Hall.
See Also
Examples
x <- rt(101,9)
y <- rt(108,9)+0.3
mood.ci(x,y)
Robust Analysis of Covariance under Heterogeneous Slopes
Description
Returns tests for homogeneous slopes and also assuming homogeneous slopes a test for differences in level. Currently only wilcoxon scores are used.
Usage
onecova(levs,data,xcov,print.table=TRUE)
Arguments
levs |
Number of levels of the one-way design |
data |
matrix with response in column 1 and level in column 2 |
xcov |
matrix of covariates |
print.table |
logical indicating a table should be printed |
Details
Returns the analysis of covariance table.
Value
tab |
analysis of covariance |
References
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall.
Examples
data=latour[,c('quality','rain')]
xcov<-cbind(latour['end.of.harvest'])
onecova(2,data,xcov,print.table=TRUE)
Robust Analysis of Covariance under Heterogeneous Slopes
Description
Returns a robust rank-based analysis of covariance for a one-way layout assuming heterogenous slopes; see Section 5.4 of Kloke and McKean (2014)/Sections 5.6 and 7.3 of Kloke and McKean (2024). Currently only wilcoxon scores are used.
Usage
onecovaheter(levs,data,xcov,print.table=TRUE)
Arguments
levs |
Number of levels of the one-way design |
data |
matrix with response in column 1 and level in column 2 |
xcov |
matrix of covariates |
print.table |
logical indicating a table should be printed |
Details
Returns the analysis of covariance table assuming heterogenous slopes.
Value
tab |
analysis of covariance |
fit |
rank-based ful model (heterogenous slopes |
References
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistcal methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
Examples
data=latour[,c('quality','rain')]
xcov<-cbind(latour['end.of.harvest'])
onecovaheter(2,data,xcov,print.table=TRUE)
Robust Analysis of Covariance under Heterogeneous Slopes
Description
Returns a robust rank-based analysis of covariance for a one-way layout assuming homogeneous slopes; see Section 5.4 of Kloke and McKean (2014)/Sections 5.6 and 7.3 of Kloke and McKean (2024). Currently only wilcoxon scores are used.
Usage
onecovahomog(levs,data,xcov,print.table=TRUE)
Arguments
levs |
Number of levels of the one-way design |
data |
matrix with response in column 1 and level in column 2 |
xcov |
matrix of covariates |
print.table |
logical indicating a table should be printed |
Details
Returns the analysis of covariance table assuming homogeneous slopes.
Value
tab |
analysis of covariance |
fit |
rank-based ful model (homogeneous slopes |
References
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistcal methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
Examples
data=latour[,c('quality','rain')]
xcov<-cbind(latour['end.of.harvest'])
onecovahomog(2,data,xcov,print.table=TRUE)
Placements.
Description
Returns the placements of the first vector in terms of the second vector used the R function fp.test; see Section 2.11 of Hettmansperger and McKean (2011) and Section 4.4 of Hollander and Wolfe (1999). The version computed by fp.test is discussed in Section 3.4 of Kloke and McKean (2014)/Section 3.6 of Kloke and McKean (2024).
Usage
place(x,y)
Arguments
x |
first vector |
y |
second vector of second sample responses |
Details
Returns the Placements for the routine fp.test.
Value
ic |
vector of placements. |
Author(s)
John Kloke, Joseph McKean
References
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
Hollander, M. and Wolfe, D.~A. (1999), Nonparametric statistical methods, 2nd Edition, New York: John Wiley and Sons.
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistcal methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
Plank data
Description
Abebe et al. (2001) discuss a dataset resulting from a three-way layout for a neurological experiment in which the time required for a mouse to exit a narrow elevated wooden plank is measured. The response is the log of time (in seconds) to exit. Interest lies in assessing the effects of three factors: the Mouse Strain (Tg+, Tg-), the mouse's Gender (female, male), and the mouse's Age (Aged, Middle, Young). The design is a 2 by 2 by 3 factorial design.
Usage
data(plank)
Format
A data frame with 64 observations on the following 4 variables.
response
a numeric vector
strain
a factor with levels
1
2
gender
a factor with levels
1
2
age
a factor with levels
1
2
3
References
Abebe, A., Crimin, K., McKean, J. W., Vidmar, T. J., and Haas, J. V. (2001) “Rank-Based Procedures for Linear Models: Applications to Pharmaceutical Science Data" Drug Information Journal,
Examples
data(plank)
boxplot(response~strain,data=plank)
raov(response~strain:gender:age,data=plank)
plot function for knn_cv
Description
plots the misclassification error rate versus number of neighbors based on call to knn_cv
Usage
## S3 method for class 'knn_cv'
plot(x, ...)
Arguments
x |
object of class knn_cv. |
... |
additional arguments. currently not used. |
Details
The list x is assumed to have attributes kvec and error representing the number of neighbors and the corresponding misclassification rate, respectively.
Value
No return value, called for side effects of creating plot.
Author(s)
John Kloke
References
Hastie, T., Tibshiani, R., and Friedman, J. (2017), The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition, New York: Springer.
James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013), An Introduction to Statistical Learning with Applications in R, New York: Springer.
Venables, W. N. and Ripley, B. D. (2002) _Modern Applied Statistics with S._ Fourth edition. Springer.
See Also
A Simulated Polynomial Data Set.
Description
A simulated polynomial (3rd degree) model discussed in Section 4.7.1 of Kloke and McKean (2014)/4.6.1 of Kloke and McKean (2024).
Usage
data(poly)
Format
One-hundred observations on two variables.
y
response variable
x
predictor
References
Kloke, J. and McKean, J.W. (2014), Nonparametric statistical methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistical methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
Examples
plot(y ~ x,data=poly)
Degree of Polynomial Determination
Description
Tests for the degree of a polnomial. This test was suggested by Graybill (1976) and is discussed from a robust point-of-view in Section 4.7.1. of Kloke and McKean (2014)/4.6.1 of Kloke and McKean (2024).
Usage
polydeg(y, x, P, alpha = 0.05)
Arguments
y |
vector of responses |
x |
Predictor |
P |
Super degree of polynomial which provides a satisfactory fit |
alpha |
Level of the testing |
Details
Returns the degree of the polynomial based on the algorithm.
Value
deg |
The determined degree |
coll |
Matrix of step information |
fitf |
Fit of the polynomial based on the determoned degreer |
References
Graybill, F.A. (1976), Theory and application of the linear model, North Scituate, Ma: Duxbury Press.
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistcal methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
Examples
x <- 1:20
xc <- x - mean(x)
y<- .2*xc + xc^3 +rt(20,3)*90
plot(y~x)
polydeg(y,xc,6)
Internal print functions
Description
Internal print functions
Usage
## S3 method for class 'hogg.test'
print(x, digits = max(5, .Options$digits - 2), ...)
## S3 method for class 'rank.test'
print(x,...)
## S3 method for class 'fkk.test'
print(x,...)
## S3 method for class 'knn_cv'
print(x,...)
## S3 method for class 'npsm.ci'
print(x, estimate=FALSE,stderr=FALSE,digits = max(5, .Options$digits - 2),...)
Arguments
x |
Object to be printed. |
digits |
Number of digits to present. Passed to print function. |
... |
Additional arguments. |
estimate |
not currently implemented. |
stderr |
not currently implemented. |
Value
No return value, called for side effects
Author(s)
John Kloke, Joseph McKean
DES for treatment of prostate cancer.
Description
Under investigation in this clinical trial was the pharmaceutical agent diethylstilbestrol DES; subjects were assigned treatment to 1.0 mg DES (treatment = 2) or to placebo (treatment = 1).
Usage
data(prostate)
Format
A data frame with 38 observations on the following 8 variables.
patient
a numeric vector
treatment
a numeric vector
time
a numeric vector
status
a numeric vector
age
a numeric vector
shb
a numeric vector
size
a numeric vector
index
a numeric vector
Source
http://www.crcpress.com/product/isbn/9781584883258
References
Collett, D. (2003) Modeling survival data in medical research CRC press.
Examples
data(prostate)
boxplot(size~treatment,data=prostate)
qhic
Description
A regression example with response yearly upkeep of a home and the predictor value of home; see Bowerman et al. (2005) and Exercise 4.9.8 of Kloke and McKean (2014)/Exercise 7.6.2 of Kloke and McKean (2024).
Usage
data(qhic)
Format
Forty observations on two variables.
upkeep
annual upkeep expenditure of home (y)
value
value of the home (x)
References
Bowerman, B.L., O'Connell, R.T., and Koehler, A.B. (2005), Forecasting, time series, and regression: An applied approach, Australia: Thomson.
Kloke, J. and McKean, J.W. (2014), Nonparametric statistical methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistical methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
Examples
plot(upkeep~value,data=qhic,xlab='Value (in $1000s)',ylab='Annual upkeep (in $10s)')
Quail from a two-factor experiment.
Description
Two sample quail data.
Usage
data(quail2)
Format
A data frame with 30 observations on the following 2 variables.
treat
indicator variable for treatment
ldl
ldl measurement
References
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
McKean J.W., Vidmar, T.J., and Sievers, G.L. (1989), A robust two stage multiple comparison procedure with application to a random drug screen, Biometrics, 45, 1281–1297.
Examples
data(quail2)
boxplot(ldl~treat,data=quail2)
General scores rank test for two sample problem
Description
A generalization of the Wilcoxon rank-sum test where a score function is applied to the ranks. Any scores from Rfit can be used as well as user defined. Default is to perform a Wilcoxon analysis.
Usage
rank.test(x, y, alternative = "two.sided", scores = Rfit::wscores,
conf.int = FALSE, conf.level = 0.95)
Arguments
x |
m x 1 vector |
y |
n x 1 vector |
alternative |
one of 'two.sided', 'less', or 'greater' |
scores |
an object of class scores |
conf.int |
logical indicating if a confidence interval should be estimated |
conf.level |
desired level of confidence for interval |
Details
Test is based on T = sum_i a(R(y_i)) where R is the rank based on the combined sample and a(t) = varphi(t/(N+1)). Confidence interval, if requested, is based on call to Rfit.
Value
statistic |
Standardized value of test statistics |
Sphi |
Test statistic |
p.value |
p-value |
conf.int |
confidence interval for shift in location |
estimate |
point estimate for shift in location |
Author(s)
John Kloke, Joseph McKean
References
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
See Also
Examples
rank.test(rt(20,1),rt(22,1)+0.2)
random contaminated normal deviates
Description
Generate a random sample from a contaminated normal distribution.
Usage
rcn(n, eps, sigmac)
rcn_5_5(n)
Arguments
n |
sample size |
eps |
proportion of proportion of contamination |
sigmac |
standard devation of contaiminated component |
Details
With probability (1-eps) a deviates are drawn from a standard normal distribution. With probability eps deviates are drawn from a normal distribution with mean 0 and standard devation sigmac rcn_5_5 is a special case where eps=0.05 and sigma=5.
Value
n x 1 numeric vector containing the random deviates.
Author(s)
John Kloke, Joseph McKean
References
Hogg, R. McKean, J, Craig, A (2013) Introduction to Mathematical Statistics, 7th Ed. Boston: Pearson.
See Also
Examples
qqnorm(rcn(100,.25,10))
set.seed(101); rcn(10,0.05,5)
set.seed(101); rcn_5_5(10)
Fat-Finger Error Contaminated Normal Deviates
Description
Generate random data from a contaminated normal distribution where the contaimation is a multiplicative factor. As, for example, in cases of data recorded in incorrect units or incorrect decimal point.
Usage
rcnx100(n,eps=0.001,x=100,mu=0,sigma=1,...)
rcnx(...)
rcnx_01_100(n)
Arguments
n |
sample size to be drawn. |
eps |
amount (probability) of contaminated observations |
x |
multiplier for the contaminated observations |
mu |
mean of uncontaminated samples |
sigma |
standard deviation of uncontaminated samples |
... |
optional arguments. |
Details
Samples are drawn from a random normal distribution with mean mu and standard deviations. A fraction of the observations (eps) are multiplied by the factor x. rcnx is an alias for rcnx100. rcnx_01_100 is a special case where the observations are drawn from a standard normal distribution (i.e., mu=0 and sigma=1 — the defaults in rcnx100) and eps and x are specified as 0.01 and 100, respectively.
Value
Numeric vector of length n is returned.
Author(s)
John Kloke
References
https://en.wikipedia.org/wiki/Fat-finger_error
See Also
Examples
set.seed(101); x1 <- rcnx100(10)
set.seed(101); x2 <- rcnx(10)
set.seed(101); x3 <- rcnx_01_100(10)
qqnorm(rcnx(10000,eps=0.005,x=10))
qqnorm(rcnx(1000,eps=0.05,x=1/100))
Random Laplace.
Description
Random generation for the Laplace (double exponential) data with location 0 and scale 1.
Usage
rlaplace(n)
Arguments
n |
scalar. number of random draws. |
Details
A Laplace or double expoential distribution has heavier tails than a normal distribution and so a sample will tend to have additional outliers.
Value
A vector of length n is returned containing the random data.
Author(s)
John Kloke, Joseph McKean
References
Hogg, Robert V.; McKean, Joseph; and Craig, Allen T., "Introduction to Mathematical Statistics (6th Edition)" (2005).
Examples
x <- rlaplace(100)
qqnorm(x)
Simulated Regression Model
Description
A simulated regression model with one response and one predictor. It is discussed in Exercise 6.5.6 of Kloke and McKean (2014)/Exercise 8.11.23 of Kloke and McKean (2024).
Usage
data(rs)
Format
Fifty observations on two variables.
y
simulated response
x
simulated predictor
References
Kloke, J. and McKean, J.W. (2014), Nonparametric statistical methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistical methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
Examples
rfit(y ~ x,data=rs)
Seinfeld — the sitcom — viewership counts by episode
Description
Counts of viewers for 9 seasons of Seinfeld
Usage
data("seinfeld")
Format
A data frame with 180 observations on the following 4 variables.
episodeNumberOverall
a numeric vector
season
a numeric vector
episodeNumberSeason
a numeric vector
viewers
a numeric vector
Source
Wikipedia https://en.wikipedia.org/wiki/List_of_Seinfeld_episodes (date unknown).
Examples
data(seinfeld)
#Comparison boxplots of views versus season
boxplot(viewers~season,data=seinfeld,ylab='Number of Viewers (in millions)',xlab='Season')
# Normal q-q plots for selected seasons.
oldpar_mfrow <- par()$mfrow
par(mfrow=c(2,2))
seasons2display <- c(4,5,6,9)
for( s in seasons2display) {
v <- seinfeld[seinfeld$season==s,'viewers']
qqnorm(v,main=paste("Season",s))
abline(a=median(v),b=mad(v))
}
par(mfrow=oldpar_mfrow)
# Normal q-q plots for selected seasons
# using centered and scaled residuals.
oldpar_mfrow <- par()$mfrow
par(mfrow=c(2,2))
seasons2display <- c(4,5,6,9)
for( s in seasons2display) {
v0 <- seinfeld[seinfeld$season==s,'viewers']
v1 <- (v0 - median(v0))/mad(v0)
qqnorm(v1,main=paste("Season",s))
abline(a=0,b=1)
}
par(mfrow=oldpar_mfrow)
Doksum and Sievers rat data
Description
Doksum and Sievers (1976) describe an experiment involving the effect of ozone on weight gain of rats. The experimental group consisted of 22 rats which were placed in an ozone environment for seven days, while the control group contained 21 rats which were placed in an ozone-free environment for the same amount of time. The response was the weight gain in a rat over the time period.
Usage
data(sievers)
Format
A data frame with 45 observations on the following 2 variables.
group
indicator variable for treatment
weight.gain
response variable of weight gain
References
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
Doksum, K. A. and Sievers, G. L. (1976), Plotting with confidence: Graphical comparisons of two populations, Biometrika, 63, 421-434.
Examples
data(sievers)
boxplot(weight.gain~group,data=sievers)
p-value for a one sample sign test
Description
p-value for a one sample sign test based on the binomial distribution.
Usage
signtest_pvalue(x, alternative = "two.sided", theta0 = 0, ...)
Arguments
x |
number vector. |
alternative |
type of alternative hypothesis |
theta0 |
null value of the parameter |
... |
optional arguments. currently ignored. |
Details
Returns p-value using the binomial distribution.
Value
a numeric scalar — the p-value — is returned
Author(s)
John Kloke, Joseph McKean
References
Kloke, J. and McKean, J.W. (2014), Nonparametric statistical methods using R, Boca Raton, FL: Chapman-Hall.
Examples
x <- round(rt(19,9) + 2,1)
signtest_pvalue(x,alternative='greater')
S <- sum(x > 0)
M <- sum(x != 0)
1-pbinom(S-1,M,0.5)
x <- round(rt(19,9) + 0,1)
signtest_pvalue(x)
S <- sum(x > 0)
M <- sum(x != 0)
2*min(pbinom(S,M,0.5), 1-pbinom(S-1,M,0.5))
A simulated classification example with two variables and two classes (labels).
Description
A simulated classification example with two variables and two classes (labels).
Usage
data("sim_class2")
Format
A data frame with 1000 observations on the following 4 variables.
train
an indicator for training and test sets
x1
an explantory variable
x2
an explantory variable
y
response variable - a factor with levels
0
1
Details
Random points in the x1,x2 plane were generated. Class labels based on location relative to two circles in the x1,x2 plane with some random variation in the labels simulated.
Examples
data(sim_class2)
dim(sim_class2)
train_set <- sim_class2[sim_class2$train==1,]
dim(train_set)
with(train_set,plot(x1,x2,main='Training Set',cex=0.625))
with(train_set,points(x1,x2,main='Training Set',pch=20,col=y,cex=0.625))
Simon (the memory game) dataset
Description
An experiment in which the members of two groups of students each played the game Simon twice.
Usage
data("simon")
Format
A data frame with 31 observations on the following 3 variables.
game1
score on first trial
game2
score on second trial
class
group variable
Details
Demonstrates the concept of regression toward the mean. Simulated data to represent a realistic realization of the experiment. See Problem 4.9.20 of Kloke and McKean (2014)/Problem 4.7.17 of Kloke and McKean (2024).
References
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistcal methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
Examples
data(simon)
plot(game2~game1,data=simon)
rfit(game2~game1,data=simon)
Sine Cosine Model
Description
Simulated dataset
Usage
data("sincos")
Format
A data frame with 197 observations on the following 2 variables.
x
independent variable
y
dependent variable
Details
The data were generated using
x <- seq(1,50,by=.25) ; y <- 5*sin(3*x) + 6*cos(x/4)+rnorm(length(x),0,10)
References
Kloke, J. and McKean, J.W. (2014), Nonparametric statistical methods using R, Boca Raton, FL: Chapman-Hall.
Examples
data(sincos)
plot(y~x,sincos)
### code to create Figure 4.9 of Kloke & McKean 2014 ###
my.sincos<-sincos
my.sincos$y3<-my.sincos$y
my.sincos$y3[137] <- 800
plot(y3~x,ylim=c(-50,50),data=my.sincos)
fit4 <- loess(y3 ~ x,data=my.sincos)
# lines(fit4$x,fit4$fitted,lty=2)
with(fit4,lines(x,fitted,lty=2))
fit5 <- loess(y3 ~ x,family="symmetric",data=my.sincos)
with(fit5,lines(x,fitted,lty=1))
legend('bottomleft',legend=c('Local Robust Fit','Local LS Fit'),lty=1:2)
title("loess Fits of Sine-Cosine Data")
Predict top speed based on miles per gallon
Description
A sample of 82 cars with variables speed and miles per gallon collected.
Usage
data("speed")
Format
A data frame with 82 observations on the following 2 variables.
mpg
Miles per gallon
sp
a numeric vector
Source
Higgins (2003) Introduction to modern nonparmetric statistics.
References
Kloke, J. and McKean, J.W. (2014), Nonparametric statistcal methods using R, Boca Raton, FL: Chapman-Hall.
Examples
data(speed)
plot(sp~mpg,data=speed)
rfit(sp~mpg+I(mpg^2),data=speed)
Turtle Data
Description
A data frame containg measurements of 48 turtles. The first three columns are the Length, Width, and Height measurements of the carapace of the turtle. The fourth column is a categorical variable sex with values of female and male. Data are drawn from Johnson and Wichern (2007).
Usage
data(turtle)
Format
48 observations on four variables.
- Length
numeric vector.
- Width
numeric vector.
- Height
numeric vector.
- sex
character vector.
References
Johnson, R.A. and Wichern, D.W. (2007), Applied Multivariate Statistical Analysis, 6th ed., Upper Saddle River, NJ: Pearson.
Examples
with(turtle,boxplot(Length~sex))
with(turtle,boxplot(Length~sex,ylab='Length (units)'))
vanElteren test for stratified analysis
Description
Performs the vanElteren extension of the Wilcoxon rank sum test for stratified experiments.
Usage
vanElteren.test(g, y, b)
Arguments
g |
n x 1 vector: treatment/group indicator |
y |
n x 1 vector: responses |
b |
n x 1 vector: denotes strata |
Value
statistic |
Value of the test statistic. |
p.value |
p-value based on a normal approximation. |
January Weather Data for Kalamazoo
Description
January weather data for Kalamazoo, MI for the years 1900 to 1995. It is discussed in Example 4.7.4, page 105-106, of Kloke and McKean (2014)/Example 4.6.4, p.177-178, of Kloke and McKean (2024).
Usage
data(weather)
Format
Ninety-six observations (1900-1995) for twelve weather variables.
avemax
avemax
avemin
avemin
coldestmax
coldestmax
hihest
hihest
lowest
lowest
maxdayprec
maxdayprec
maxdaysnowfall
maxdaysnowfall
meantmp
meantmp
totalprec
totalprec
totalsnow
totalsnow
warmest
warmest
year
year
Source
http://weather-warehouse.com/WeatherHistory/
References
Kloke, J. and McKean, J.W. (2014), Nonparametric statistical methods using R, Boca Raton, FL: Chapman-Hall. Kloke, J. and McKean, J.W. (2024), Nonparametric statistical methods using R, Second Edition, Boca Raton, FL: Chapman-Hall.
Examples
plot(avemax ~ year,data=weather)
Wilson (score) confidence interval for a population proportion.
Description
Wilson (score) confidence interval for a population proportion.
Usage
wilson.ci(x, n, conf.level = 0.95)
Arguments
x |
number of events |
n |
number of samples |
conf.level |
confidence level |
Details
Uses defintion in Agresti.
Value
conf.int |
estimated confidence interval |
Author(s)
John Kloke, Joseph McKean
References
Agresti (2002), Categorical data analysis, New York: John Wiley & Sons, Inc.
See Also
Examples
n <- 100
x <- rbinom(1,n,0.33)
wilson.ci(n,x)