The HotellingEllipse package is designed to help draw the Hotelling ellipse on the PCA or PLS score scatterplot. HotellingEllipse computes the Hotelling’s T\(^2\) value, the semi-minor axis (denoted a), the semi-major axis (denoted b) along with the x-y coordinates for drawing a confidence ellipse based on Hotelling’s T\(^2\). Specifically, there are two functions available:
ellipseParam(), is used to calculate the Hotelling’s T\(^2\) and the semi-axes of an ellipse at 99% and 95% confidence intervals.
ellipseCoord(), is used to get the x and y coordinates of a confidence ellipse at user-defined confidence interval. The confidence interval is set at 95% by default.
library(HotellingEllipse)data("specData")In this example, we use FactoMineR::PCA() to perform the Principal Component Analysis (PCA) from a LIBS spectral dataset specData and extract the PCA scores as a data frame tibble::as_tibble().
set.seed(123)
pca_mod <- specData %>%
select(where(is.numeric)) %>%
PCA(scale.unit = FALSE, graph = FALSE)pca_scores <- pca_mod %>%
pluck("ind", "coord") %>%
as_tibble()pca_scores
#> # A tibble: 171 x 5
#> Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 144168. -36399. 2228. -670. 13805.
#> 2 118520. -31465. 16300. -20686. -13872.
#> 3 90303. -28356. 31340. -60615. 15157.
#> 4 107107. -38209. 24897. -60366. 19449.
#> 5 74350. -2148. 29814. -8351. 494.
#> 6 97511. -17932. 22254. -15406. -4195.
#> 7 82142. 19297. -34299. -12498. -648.
#> 8 76261. 16566. -34382. -16293. 137.
#> 9 73705. 31091. -22577. -17182. 2438.
#> 10 68042. 25124. -26063. -19389. 6051.
#> # … with 161 more rowsTo add a confidence ellipse, we use the function ellipseParam(). We want to compute the length of the ellipse semi-axes for bivariate data within the PC1-PC2 subspace. To do this, we set the number of components, k, to 2, while the pcx and pcy inputs are respectively set to 1 and 2.
res <- ellipseParam(data = pca_scores, k = 2, pcx = 1, pcy = 2)str(res)
#> List of 4
#> $ Tsquare : tibble[,1] [171 × 1] (S3: tbl_df/tbl/data.frame)
#> ..$ value: num [1:171] 2.28 2.65 8 8.63 1.05 ...
#> $ Ellipse : tibble[,4] [1 × 4] (S3: tbl_df/tbl/data.frame)
#> ..$ a.99pct: num 319536
#> ..$ b.99pct: num 91816
#> ..$ a.95pct: num 256487
#> ..$ b.95pct: num 73699
#> $ cutoff.99pct: num 9.52
#> $ cutoff.95pct: num 6.14We can extract parameters for further use:
a1 <- pluck(res, "Ellipse", "a.99pct")
b1 <- pluck(res, "Ellipse", "b.99pct")a2 <- pluck(res, "Ellipse", "a.95pct")
b2 <- pluck(res, "Ellipse", "b.95pct")Tsq <- pluck(res, "Tsquare", "value")Another way to add Hotelling ellipse is to use the function ellipseCoord(). This function provides the x and y coordinates of the confidence ellipse at user-defined confidence interval. The confidence interval confi.limit is set at 95% by default. Below, the x-y coordinates are estimated based on data projected into the PC1-PC3 subspace.
xy_coord <- ellipseCoord(data = pca_scores, pcx = 1, pcy = 3, conf.limit = 0.95, pts = 500)str(xy_coord)
#> tibble[,2] [500 × 2] (S3: tbl_df/tbl/data.frame)
#> $ x: num [1:500] 256487 256466 256405 256304 256161 ...
#> $ y: num [1:500] -1.73e-12 7.93e+02 1.59e+03 2.38e+03 3.17e+03 ...