Type: | Package |
Title: | Bayesian Cluster Validity Index |
Version: | 1.0.2 |
Imports: | e1071, mclust, ggplot2, UniversalCVI |
Description: | Algorithms for computing and generating plots with and without error bars for Bayesian cluster validity index (BCVI) (O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. <doi:10.1016/j.csda.2024.108053>) based on several underlying cluster validity indexes (CVIs) including Calinski-Harabasz, Chou-Su-Lai, Davies-Bouldin, Dunn, Pakhira-Bandyopadhyay-Maulik, Point biserial correlation, the score function, Starczewski, and Wiroonsri indices for hard clustering, and Correlation Cluster Validity, the generalized C, HF, KWON, KWON2, Modified Pakhira-Bandyopadhyay-Maulik, Pakhira-Bandyopadhyay-Maulik, Tang, Wiroonsri-Preedasawakul, Wu-Li, and Xie-Beni indices for soft clustering. The package is compatible with K-means, fuzzy C means, EM clustering, and hierarchical clustering (single, average, and complete linkage). Though BCVI is compatible with any underlying existing CVIs, we recommend users to use either WI or WP as the underlying CVI. |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Depends: | R (≥ 2.10) |
NeedsCompilation: | no |
Packaged: | 2025-07-09 04:43:28 UTC; lenovo |
Author: | Nathakhun Wiroonsri
|
Maintainer: | Onthada Preedasawakul <o.preedasawakul@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-07-09 07:50:10 UTC |
B1 Artificial Dataset
Description
A 2
-dimensional dataset from Wiroonsri and Preedasawakul (2024) generated from 1
Gaussian and 1
Uniform distributions labeled as 1-2
.
Usage
B1_data
Format
A data frame with 5500 data points and 3 variables
x
Numeric values generated from Gaussian and Uniform distributions
y
Numeric values generated from Gaussian and Uniform distributions
label
Categorical labels 1,2
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B2_data, B3_data, B_WP.IDX, B_Wvalid, B_XB.IDX
B2 Artificial Dataset
Description
A 2
-dimensional dataset from Wiroonsri and Preedasawakul (2024) generated from 5
different Gaussian distributions labeled as 1-5
.
Usage
B2_data
Format
A data frame with 850 data points and 3 variables
x
Numeric values generated from Gaussian distributions
y
Numeric values generated from Gaussian distributions
label
Categorical labels 1,2,3,4,5
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B1_data, B3_data, B_WP.IDX, B_Wvalid, B_XB.IDX
B3 Artificial Dataset
Description
A 2
-dimensional dataset from Wiroonsri and Preedasawakul (2024) generated from 5
different Gaussian distributions labeled as 1-5
.
Usage
B3_data
Format
A data frame with 2300 data points and 3 variables
x
Numeric values generated from Gaussian distributions
y
Numeric values generated from Gaussian distributions
label
Categorical labels 1,2,3,4,5
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B2_data, B4_data, B_WP.IDX, B_Wvalid, B_XB.IDX
B4 Artificial Dataset
Description
A 2
-dimensional dataset from Wiroonsri and Preedasawakul (2024) generated from 6
different Gaussian distributions labeled as 1-6
.
Usage
B4_data
Format
A data frame with 740 data points and 3 variables
x
Numeric values generated from Gaussian distributions
y
Numeric values generated from Gaussian distributions
label
Categorical labels 1,2,3,4,5,6
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B3_data, B5_data, B_WP.IDX, B_Wvalid, B_XB.IDX
B5 Artificial Dataset
Description
A 2
-dimensional dataset from Wiroonsri and Preedasawakul (2024) generated from 7
different Gaussian and 2
Uniform distributions labeled as 1-9
.
Usage
B5_data
Format
A data frame with 1820 data points and 3 variables
x
Numeric values generated from Gaussian and Uniform distributions
y
Numeric values generated from Gaussian and Uniform distributions
label
Categorical labels 1,2,3,4,5,6,7,8,9
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B4_data, B6_data, B_WP.IDX, B_Wvalid, B_XB.IDX
B6 Artificial Dataset
Description
A 2
-dimensional dataset from Wiroonsri and Preedasawakul (2024) generated from 3
different Gaussian and 2
Uniform distributions labeled as 1-5
.
Usage
B6_data
Format
A data frame with 1000 data points and 3 variables
x
Numeric values generated from Gaussian and Uniform distributions
y
Numeric values generated from Gaussian and Uniform distributions
label
Categorical labels 1,2,3,4,5
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B5_data, B7_data, B_WP.IDX, B_Wvalid, B_XB.IDX
B7 Artificial Dataset
Description
A 2
-dimensional dataset from Wiroonsri and Preedasawakul (2024) generated from 3
different Gaussian and 2
Uniform distributions labeled as 1-5
.
Usage
B7_data
Format
A data frame with 800 data points and 3 variables
x
Numeric values generated from Gaussian and Uniform distributions
y
Numeric values generated from Gaussian and Uniform distributions
label
Categorical labels 1,2,3,4,5
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B6_data, B1_data, B_WP.IDX, B_Wvalid, B_XB.IDX
BCVI-Correlation Cluster Validity (CCV) index
Description
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using the pearson correlation cluster validity (CCVP) and/or the spearman’s (rho) correlation cluster validity (CCVS) as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
Usage
B_CCV.IDX(x, kmax, indexlist = "all", method = "FCM", fzm = 2,
iter = 100, nstart = 20, alpha = "default", mult.alpha = 1/2)
Arguments
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
indexlist |
a character string indicating which The generalized C index be computed (" |
method |
a character string indicating which clustering method to be used ( |
fzm |
a number greater than 1 giving the degree of fuzzification for |
iter |
a maximum number of iterations for |
nstart |
a maximum number of initial random sets for FCM for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
Details
BCVI-CCV is defined as follows.
Let
r_k(\bf x) = \dfrac{CVI(k)-\min_j CVI(j)}{\sum_{i=2}^K (CVI(i)-\min_j CVI(j))}
where CVI is either CCVP or CCVS index.
Assume that
f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}
represents the conditional probability density function of the dataset given \bf p
, where C({\bf p})
is the normalizing constant. Assume further that {\bf p}
follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K)
. The posterior distribution of \bf p
still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x}))
.
The BCVI is then defined as
BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}
where \alpha_0 = \sum_{k=2}^K \alpha_k.
The variance of p_k
can be computed as
Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.
Value
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
M. Popescu, J. C. Bezdek, T. C. Havens and J. M. Keller (2013). "A Cluster Validity Framework Based on Induced Partition Dissimilarity." https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6246717&isnumber=6340245
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B7_data, B_TANG.IDX, B_XB.IDX, B_Wvalid, B_DB.IDX
Examples
library(BayesCVI)
# The data included in this package.
data = B7_data[,1:2]
# alpha
aalpha = c(20,20,20,5,5,5,0.5,0.5,0.5)
B.CCV = B_CCV.IDX(x = scale(data), kmax=10, indexlist = "CCVP", method = "FCM", fzm = 2, iter = 100,
nstart = 20, alpha = aalpha, mult.alpha = 1/2)
# plot the BCVI-CCVP
pplot = plot_BCVI(B.CCV$CCVP)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
BCVI-Calinski–Harabasz (CH) index
Description
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using Calinski–Harabasz (CH) as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
Usage
B_CH.IDX(x, kmax, method = "kmeans", nstart = 100, alpha = "default", mult.alpha = 1/2)
Arguments
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
nstart |
a maximum number of initial random sets for kmeans for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
Details
BCVI-CH is defined as follows.
Let
r_k(\bf x) = \dfrac{CH(k)-\min_j CH(j)}{\sum_{i=2}^K (CH(i)-\min_j CH(j))}
Assume that
f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}
represents the conditional probability density function of the dataset given \bf p
, where C({\bf p})
is the normalizing constant. Assume further that {\bf p}
follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K)
. The posterior distribution of \bf p
still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x}))
.
The BCVI is then defined as
BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}
where \alpha_0 = \sum_{k=2}^K \alpha_k.
The variance of p_k
can be computed as
Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.
Value
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
T. Calinski, J. Harabasz, "A dendrite method for cluster analysis," Communications in Statistics, 3, 1-27 (1974).
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B2_data, B_TANG.IDX, B_XB.IDX, B_Wvalid, B_DB.IDX
Examples
library(BayesCVI)
# The data included in this package.
data = B2_data[,1:2]
# alpha
aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5)
B.CH = B_CH.IDX(x = scale(data), kmax=10, method = "kmeans",
nstart = 100, alpha = aalpha, mult.alpha = 1/2)
# plot the BCVI
pplot = plot_BCVI(B.CH)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
BCVI-Chou-Su-Lai (CSL) index
Description
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using Chou-Su-Lai (CSL) as the underlying cluster validity index (CVI) and Dirichlet prior parameters of the user's choice. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
Usage
B_CSL.IDX(x, kmax, method = "kmeans", nstart = 100, alpha = "default", mult.alpha = 1/2)
Arguments
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
nstart |
a maximum number of initial random sets for kmeans for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
Details
BCVI-CSL is defined as follows.
Let
r_k(\bf x) = \dfrac{\max_j CSL(j)- CSL(k)}{\sum_{i=2}^K (\max_j CSL(j) - CSL(i))}.
Assume that
f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}
represents the conditional probability density function of the dataset given \bf p
, where C({\bf p})
is the normalizing constant. Assume further that {\bf p}
follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K)
. The posterior distribution of \bf p
still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x}))
.
The BCVI is then defined as
BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}
where \alpha_0 = \sum_{k=2}^K \alpha_k.
The variance of p_k
can be computed as
Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.
Value
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
C. H. Chou, M. C. Su, E. Lai, "A new cluster validity measure and its application to image compression," Pattern Anal Applic, 7, 205-220 (2004).
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B2_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
Examples
library(BayesCVI)
# The data included in this package.
data = B2_data[,1:2]
# alpha
aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5)
B.CSL = B_CSL.IDX(x = scale(data), kmax=10, method = "kmeans",
nstart = 100, alpha = aalpha, mult.alpha = 1/2)
# plot the BCVI
pplot = plot_BCVI(B.CSL)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
BCVI-Davies–Bouldin (DB) and DB* (DBs) indexes
Description
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using DB and/or DBs as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
Usage
B_DB.IDX(x, kmax, method = "kmeans", indexlist = "all", p = 2, q = 2,
nstart = 100, alpha = "default", mult.alpha = 1/2)
Arguments
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
indexlist |
a character string indicating which cluster validity indexes to be computed ( |
p |
the power of the Minkowski distance between centroids of clusters. The default is |
q |
the power of dispersion measure of a cluster. The default is |
nstart |
a maximum number of initial random sets for kmeans for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
Details
BCVI-DB is defined as follows.
Let
r_k(\bf x) = \dfrac{\max_j CVI(j)- CVI(k)}{\sum_{i=2}^K (\max_j CVI(j) - CVI(i))}.
where CVI indicates DB or DBs index.
Assume that
f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}
represents the conditional probability density function of the dataset given \bf p
, where C({\bf p})
is the normalizing constant. Assume further that {\bf p}
follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K)
. The posterior distribution of \bf p
still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x}))
.
The BCVI is then defined as
BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}
where \alpha_0 = \sum_{k=2}^K \alpha_k.
The variance of p_k
can be computed as
Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.
Value
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
D. L. Davies, D. W. Bouldin, "A cluster separation measure," IEEE Trans Pattern Anal Machine Intell, 1, 224-227 (1979).
M. Kim, R. S. Ramakrishna, "New indices for cluster validity assessment," Pattern Recognition Letters, 26, 2353-2363 (2005).
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B2_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DI.IDX
Examples
library(BayesCVI)
# The data included in this package.
data = B2_data[,1:2]
# alpha
aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5)
B.DB = B_DB.IDX(x = scale(data), kmax=10, method = "kmeans", indexlist = "all",
p = 2, q = 2, nstart = 100, alpha = "default", mult.alpha = 1/2)
# plot the BCVI-DB
pplot = plot_BCVI(B.DB$DB)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
# plot the BCVI-DBs
pplot = plot_BCVI(B.DB$DBs)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
BCVI-Dunn index (DI)
Description
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using Dunn index (DI) as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
Usage
B_DI.IDX(x, kmax, method = "kmeans", nstart = 100, alpha = "default", mult.alpha = 1/2)
Arguments
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
nstart |
a maximum number of initial random sets for kmeans for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
Details
BCVI-DI is defined as follows.
Let
r_k(\bf x) = \dfrac{DI(k)-\min_j DI(j)}{\sum_{i=2}^K (DI(i)-\min_j DI(j))}
Assume that
f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}
represents the conditional probability density function of the dataset given \bf p
, where C({\bf p})
is the normalizing constant. Assume further that {\bf p}
follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K)
. The posterior distribution of \bf p
still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x}))
.
The BCVI is then defined as
BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}
where \alpha_0 = \sum_{k=2}^K \alpha_k.
The variance of p_k
can be computed as
Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.
Value
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
J. C. Dunn, "A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters," J Cybern, 3(3), 32-57 (1973).
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B2_data, B_TANG.IDX, B_XB.IDX, B_Wvalid, B_DB.IDX
Examples
library(BayesCVI)
# The data included in this package.
data = B2_data[,1:2]
# alpha
aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5)
B.DI = B_DI.IDX(x = scale(data), kmax=10, method = "kmeans",
nstart = 100, alpha = aalpha, mult.alpha = 1/2)
# plot the BCVI
pplot = plot_BCVI(B.DI)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
BCVI-The generalized C (GC) index
Description
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using all or part of GC1 GC2 GC3 and GC4 as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
Usage
B_GC.IDX(x, kmax, indexlist = "all", method = "FCM", fzm = 2, iter = 100,
nstart = 20, alpha = "default", mult.alpha = 1/2)
Arguments
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
indexlist |
a character string indicating which The generalized C index be computed (" |
method |
a character string indicating which clustering method to be used ( |
fzm |
a number greater than 1 giving the degree of fuzzification for |
iter |
a maximum number of iterations for |
nstart |
a maximum number of initial random sets for FCM for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
Details
BCVI-GC is defined as follows.
Let
r_k(\bf x) = \dfrac{\max_j CVI(j)- CVI(k)}{\sum_{i=2}^K (\max_j CVI(j) - CVI(i))}.
where CVI is one of the GC1 GC2 GC3 or GC4 index.
Assume that
f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}
represents the conditional probability density function of the dataset given \bf p
, where C({\bf p})
is the normalizing constant. Assume further that {\bf p}
follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K)
. The posterior distribution of \bf p
still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x}))
.
The BCVI is then defined as
BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}
where \alpha_0 = \sum_{k=2}^K \alpha_k.
The variance of p_k
can be computed as
Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.
Value
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
J. C. Bezdek, M. Moshtaghi, T. Runkler, and C. Leckie, “The generalized c index for internal fuzzy cluster validity,” IEEE Transactions on Fuzzy Systems, vol. 24, no. 6, pp. 1500–1512, 2016. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7429723&isnumber=7797168
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B7_data, B_TANG.IDX, B_XB.IDX, B_Wvalid, B_DB.IDX
Examples
library(BayesCVI)
# The data included in this package.
data = B7_data[,1:2]
# alpha
aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5)
B.GC = B_GC.IDX(x = scale(data), kmax = 10, indexlist = "GC1",
method = "FCM", fzm = 2, iter = 100,
nstart = 20, alpha = aalpha, mult.alpha = 1/2)
# plot the BCVI-GC1
pplot = plot_BCVI(B.GC$GC1)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
BCVI-HF index
Description
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using HF as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
Usage
B_HF.IDX(x, kmax, method = "FCM", fzm = 2, nstart = 20,
iter = 100, alpha = "default", mult.alpha = 1/2)
Arguments
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
fzm |
a number greater than 1 giving the degree of fuzzification for |
nstart |
a maximum number of initial random sets for FCM for |
iter |
a maximum number of iterations for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
Details
BCVI-HF is defined as follows.
Let
r_k(\bf x) = \dfrac{\max_j HF(j)- HF(k)}{\sum_{i=2}^K (\max_j HF(j) - HF(i))}.
Assume that
f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}
represents the conditional probability density function of the dataset given \bf p
, where C({\bf p})
is the normalizing constant. Assume further that {\bf p}
follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K)
. The posterior distribution of \bf p
still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x}))
.
The BCVI is then defined as
BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}
where \alpha_0 = \sum_{k=2}^K \alpha_k.
The variance of p_k
can be computed as
Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.
Value
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
F. Haouas, Z. Ben Dhiaf, A. Hammouda and B. Solaiman, "A new efficient fuzzy cluster validity index: Application to images clustering," 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Naples, Italy, 2017, pp. 1-6. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8015651&isnumber=8015374
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B7_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
Examples
library(BayesCVI)
# The data included in this package.
data = B7_data[,1:2]
# alpha
aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5)
B.HF = B_HF.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2,
nstart = 20, iter = 100, alpha = aalpha, mult.alpha = 1/2)
# plot the BCVI
pplot = plot_BCVI(B.HF)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
BCVI-Modified Kernel form of Pakhira-Bandyopadhyay-Maulik (KPBM) index
Description
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using Modified Kernel form of Pakhira-Bandyopadhyay-Maulik (KPBM) as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
Usage
B_KPBM.IDX(x, kmax, method = "FCM", fzm = 2, nstart = 20,
iter = 100, alpha = "default", mult.alpha = 1/2)
Arguments
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
fzm |
a number greater than 1 giving the degree of fuzzification for |
nstart |
a maximum number of initial random sets for FCM for |
iter |
a maximum number of iterations for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
Details
BCVI-KPBM is defined as follows.
Let
r_k(\bf x) = \dfrac{KPBM(k)-\min_j KPBM(j)}{\sum_{i=2}^K (KPBM(i)-\min_j KPBM(j))}
Assume that
f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}
represents the conditional probability density function of the dataset given \bf p
, where C({\bf p})
is the normalizing constant. Assume further that {\bf p}
follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K)
. The posterior distribution of \bf p
still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x}))
.
The BCVI is then defined as
BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}
where \alpha_0 = \sum_{k=2}^K \alpha_k.
The variance of p_k
can be computed as
Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.
Value
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
C. Alok. (2010). "An investigation of clustering algorithms and soft computing approaches for pattern recognition," Department of Computer Science, Assam University. http://hdl.handle.net/10603/93443
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B7_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
Examples
library(BayesCVI)
# The data included in this package.
data = B7_data[,1:2]
# alpha
aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5)
B.KPBM = B_KPBM.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2, nstart = 20,
iter = 100, alpha = aalpha, mult.alpha = 1/2)
# plot the BCVI
pplot = plot_BCVI(B.KPBM)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
BCVI-KWON index
Description
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using KWON as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
Usage
B_KWON.IDX(x, kmax, method = "FCM", fzm = 2, nstart = 20,
iter = 100, alpha = "default", mult.alpha = 1/2)
Arguments
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
fzm |
a number greater than 1 giving the degree of fuzzification for |
nstart |
a maximum number of initial random sets for FCM for |
iter |
a maximum number of iterations for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
Details
BCVI-KWON is defined as follows.
Let
r_k(\bf x) = \dfrac{\max_j KWON(j)- KWON(k)}{\sum_{i=2}^K (\max_j KWON(j) - KWON(i))}.
Assume that
f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}
represents the conditional probability density function of the dataset given \bf p
, where C({\bf p})
is the normalizing constant. Assume further that {\bf p}
follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K)
. The posterior distribution of \bf p
still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x}))
.
The BCVI is then defined as
BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}
where \alpha_0 = \sum_{k=2}^K \alpha_k.
The variance of p_k
can be computed as
Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.
Value
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
S. H. Kwon, “Cluster validity index for fuzzy clustering,” Electronics letters, vol. 34, no. 22, pp. 2176–2177, 1998. doi:10.1049/el:19981523
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B7_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
Examples
library(BayesCVI)
# The data included in this package.
data = B7_data[,1:2]
# alpha
aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5)
B.KWON = B_KWON.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2, nstart = 20,
iter = 100, alpha = aalpha, mult.alpha = 1/2)
# plot the BCVI
pplot = plot_BCVI(B.KWON)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
BCVI-KWON2 index
Description
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using KWON2 as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
Usage
B_KWON2.IDX(x, kmax, method = "FCM", fzm = 2, nstart = 20,
iter = 100, alpha = "default", mult.alpha = 1/2)
Arguments
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
fzm |
a number greater than 1 giving the degree of fuzzification for |
nstart |
a maximum number of initial random sets for FCM for |
iter |
a maximum number of iterations for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
Details
BCVI-KWON2 is defined as follows.
Let
r_k(\bf x) = \dfrac{\max_j KWON2(j)- KWON2(k)}{\sum_{i=2}^K (\max_j KWON2(j) - KWON2(i))}.
Assume that
f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}
represents the conditional probability density function of the dataset given \bf p
, where C({\bf p})
is the normalizing constant. Assume further that {\bf p}
follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K)
. The posterior distribution of \bf p
still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x}))
.
The BCVI is then defined as
BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}
where \alpha_0 = \sum_{k=2}^K \alpha_k.
The variance of p_k
can be computed as
Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.
Value
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
S. H. Kwon, J. Kim, and S. H. Son, “Improved cluster validity index for fuzzy clustering,” Electronics Letters, vol. 57, no. 21, pp. 792–794, 2021.
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B7_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
Examples
library(BayesCVI)
# The data included in this package.
data = B7_data[,1:2]
# alpha
aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5)
B.KWON2 = B_KWON2.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2,
nstart = 20, iter = 100, alpha = aalpha, mult.alpha = 1/2)
# plot the BCVI
pplot = plot_BCVI(B.KWON2)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
BCVI-Point biserial correlation (PB)
Description
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using Point biserial correlation (PB) as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
Usage
B_PB.IDX(x, kmax, method = "kmeans", corr = "pearson", nstart = 100,
alpha = "default", mult.alpha = 1/2)
Arguments
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
corr |
a character string indicating which correlation coefficient is to be computed ( |
nstart |
a maximum number of initial random sets for kmeans for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
Details
BCVI-PB is defined as follows.
Let
r_k(\bf x) = \dfrac{PB(k)-\min_j PB(j)}{\sum_{i=2}^K (PB(i)-\min_j PB(j))}
Assume that
f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}
represents the conditional probability density function of the dataset given \bf p
, where C({\bf p})
is the normalizing constant. Assume further that {\bf p}
follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K)
. The posterior distribution of \bf p
still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x}))
.
The BCVI is then defined as
BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}
where \alpha_0 = \sum_{k=2}^K \alpha_k.
The variance of p_k
can be computed as
Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.
Value
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
G. W. Miligan, "An examination of the effect of six types of error perturbation on fifteen clustering algorithms," Psychometrika, 45, 325-342 (1980).
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B2_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
Examples
library(BayesCVI)
# The data included in this package.
data = B2_data[,1:2]
# alpha
aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5)
B.PB = B_PB.IDX(x = scale(data), kmax=10, method = "kmeans", corr = "pearson", nstart = 100,
alpha = aalpha, mult.alpha = 1/2)
# plot the BCVI
pplot = plot_BCVI(B.PB)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
BCVI-Pakhira-Bandyopadhyay-Maulik (PBM) index
Description
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using Pakhira-Bandyopadhyay-Maulik (PBM) as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
Usage
B_PBM.IDX(x, kmax, method = "FCM", fzm = 2, nstart = 20,
iter = 100, alpha = "default", mult.alpha = 1/2)
Arguments
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
fzm |
a number greater than 1 giving the degree of fuzzification for |
nstart |
a maximum number of initial random sets for FCM for |
iter |
a maximum number of iterations for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
Details
BCVI-PBM is defined as follows.
Let
r_k(\bf x) = \dfrac{PBM(k)-\min_j PBM(j)}{\sum_{i=2}^K (PBM(i)-\min_j PBM(j))}
Assume that
f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}
represents the conditional probability density function of the dataset given \bf p
, where C({\bf p})
is the normalizing constant. Assume further that {\bf p}
follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K)
. The posterior distribution of \bf p
still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x}))
.
The BCVI is then defined as
BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}
where \alpha_0 = \sum_{k=2}^K \alpha_k.
The variance of p_k
can be computed as
Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.
Value
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
M. K. Pakhira, S. Bandyopadhyay, and U. Maulik, “Validity index for crisp and fuzzy clusters,” Pattern recognition, vol. 37, no. 3, pp. 487–501, 2004.
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B7_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
Examples
library(BayesCVI)
# The data included in this package.
data = B7_data[,1:2]
# alpha
aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5)
B.PBM = B_PBM.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2, nstart = 20,
iter = 100, alpha = aalpha, mult.alpha = 1/2)
# plot the BCVI
pplot = plot_BCVI(B.PBM)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
BCVI-The score function
Description
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using the score function (SF) as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
Usage
B_SF.IDX(x, kmax, method = "kmeans", nstart = 100, alpha = "default", mult.alpha = 1/2)
Arguments
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
nstart |
a maximum number of initial random sets for kmeans for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
Details
BCVI-SF is defined as follows.
Let
r_k(\bf x) = \dfrac{\max_j SF(j)- SF(k)}{\sum_{i=2}^K (\max_j SF(j) - SF(i))}.
Assume that
f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}
represents the conditional probability density function of the dataset given \bf p
, where C({\bf p})
is the normalizing constant. Assume further that {\bf p}
follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K)
. The posterior distribution of \bf p
still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x}))
.
The BCVI is then defined as
BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}
where \alpha_0 = \sum_{k=2}^K \alpha_k.
The variance of p_k
can be computed as
Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.
Value
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
S. Saitta, B. Raphael, I. Smith, "A bounded index for cluster validity," In Perner, P.: Machine Learning and Data Mining in Pattern Recognition, Lecture Notes in Computer Science, 4571, Springer (2007).
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B2_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
Examples
library(BayesCVI)
# The data included in this package.
data = B2_data[,1:2]
# alpha
aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5)
B.SF = B_SF.IDX(x = scale(data), kmax=10, method = "kmeans",
nstart = 100, alpha = aalpha, mult.alpha = 1/2)
# plot the BCVI
pplot = plot_BCVI(B.SF)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
BCVI-Starczewski and Pakhira-Bandyopadhyay-Maulik for crisp clustering indexes
Description
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using Starczewski (STR) and/or Pakhira-Bandyopadhyay-Maulik (PBM) as the underlying cluster validity index (CVI) and Dirichlet prior parameters of the user's choice. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
Usage
B_STRPBM.IDX(x, kmax, method = "kmeans", indexlist = "all",
nstart = 100, alpha = "default", mult.alpha = 1/2)
Arguments
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
indexlist |
a character string indicating which cluster validity indexes to be computed ( |
nstart |
a maximum number of initial random sets for kmeans for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
Details
BCVI-STRPBM is defined as follows.
Let
r_k(\bf x) = \dfrac{CVI(k)-\min_j CVI(j)}{\sum_{i=2}^K (CVI(i)-\min_j CVI(j))}
where CVI is either STR or PBM index.
Assume that
f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}
represents the conditional probability density function of the dataset given \bf p
, where C({\bf p})
is the normalizing constant. Assume further that {\bf p}
follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K)
. The posterior distribution of \bf p
still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x}))
.
The BCVI is then defined as
BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}
where \alpha_0 = \sum_{k=2}^K \alpha_k.
The variance of p_k
can be computed as
Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.
Value
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
M. K. Pakhira, S. Bandyopadhyay and U. Maulik, "Validity index for crisp and fuzzy clusters," Pattern Recogn 37(3):487–501 (2004).
A. Starczewski, "A new validity index for crisp clusters," Pattern Anal Applic 20, 687–700 (2017).
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B2_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
Examples
library(BayesCVI)
# The data included in this package.
data = B2_data[,1:2]
# alpha
aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5)
B.STRPBM = B_STRPBM.IDX(x = scale(data), kmax=10, method = "kmeans",
indexlist = "all", nstart = 100, alpha = aalpha, mult.alpha = 1/2)
# plot the BCVI-STR
pplot = plot_BCVI(B.STRPBM$STR)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
# plot the BCVI-PBM
pplot = plot_BCVI(B.STRPBM$PBM)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
BCVI-Tang index
Description
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using Tang as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
Usage
B_TANG.IDX(x, kmax, method = "FCM", fzm = 2, nstart = 20,
iter = 100, alpha = "default", mult.alpha = 1/2)
Arguments
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
fzm |
a number greater than 1 giving the degree of fuzzification for |
nstart |
a maximum number of initial random sets for FCM for |
iter |
a maximum number of iterations for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
Details
BCVI-TANG is defined as follows.
Let
r_k(\bf x) = \dfrac{\max_j TANG(j)- TANG(k)}{\sum_{i=2}^K (\max_j TANG(j) - TANG(i))}.
Assume that
f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}
represents the conditional probability density function of the dataset given \bf p
, where C({\bf p})
is the normalizing constant. Assume further that {\bf p}
follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K)
. The posterior distribution of \bf p
still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x}))
.
The BCVI is then defined as
BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}
where \alpha_0 = \sum_{k=2}^K \alpha_k.
The variance of p_k
can be computed as
Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.
Value
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
Y. Tang, F. Sun, and Z. Sun, “Improved validation index for fuzzy clustering,” in Proceedings of the 2005, American Control Conference, 2005., pp. 1120–1125 vol. 2, 2005. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1470111&isnumber=31519
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B7_data, B_DI.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
Examples
library(BayesCVI)
# The data included in this package.
data = B7_data[,1:2]
# alpha
aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5)
B.TANG = B_TANG.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2,
nstart = 20, iter = 100, alpha = aalpha, mult.alpha = 1/2)
# plot the BCVI
pplot = plot_BCVI(B.TANG)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
BCVI-Wu and Li (WL) index
Description
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using Wu and Li (WL) as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
Usage
B_WL.IDX(x, kmax, method = "FCM", fzm = 2, nstart = 20,
iter = 100, alpha = "default", mult.alpha = 1/2)
Arguments
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
fzm |
a number greater than 1 giving the degree of fuzzification for |
nstart |
a maximum number of initial random sets for FCM for |
iter |
a maximum number of iterations for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
Details
BCVI-WL is defined as follows.
Let
r_k(\bf x) = \dfrac{\max_j WL(j)- WL(k)}{\sum_{i=2}^K (\max_j WL(j) - WL(i))}.
Assume that
f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}
represents the conditional probability density function of the dataset given \bf p
, where C({\bf p})
is the normalizing constant. Assume further that {\bf p}
follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K)
. The posterior distribution of \bf p
still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x}))
.
The BCVI is then defined as
BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}
where \alpha_0 = \sum_{k=2}^K \alpha_k.
The variance of p_k
can be computed as
Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.
Value
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
C. H. Wu, C. S. Ouyang, L. W. Chen, and L. W. Lu, “A new fuzzy clustering validity index with a median factor for centroid-based clustering,” IEEE Transactions on Fuzzy Systems, vol. 23, no. 3, pp. 701–718, 2015.https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6811211&isnumber=7115244
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B7_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
Examples
library(BayesCVI)
# The data included in this package.
data = B7_data[,1:2]
# alpha
aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5)
B.WL = B_WL.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2,
nstart = 20, iter = 100, alpha = aalpha, mult.alpha = 1/2)
# plot the BCVI
pplot = plot_BCVI(B.WL)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
BCVI-Wiroonsri and Preedasawakul (WP) index
Description
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using Wiroonsri and Preedasawakul (WP) as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
Usage
B_WP.IDX(x, kmax, corr = "pearson", method = "FCM", fzm = 2,
gamma = (fzm^2 * 7)/4, sampling = 1, iter = 100, nstart = 20,
NCstart = TRUE, alpha = "default", mult.alpha = 1/2)
Arguments
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
corr |
a character string indicating which correlation coefficient is to be computed ( |
method |
a character string indicating which clustering method to be used ( |
fzm |
a number greater than 1 giving the degree of fuzzification for |
gamma |
adjusted fuzziness parameter for |
sampling |
a number greater than 0 and less than or equal to 1 indicating the undersampling proportion of data to be used. This argument is intended for handling a large dataset. The default is |
iter |
a maximum number of iterations for |
nstart |
a maximum number of initial random sets for FCM for |
NCstart |
logical for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
Details
BCVI-WP is defined as follows.
Let
r_k(\bf x) = \dfrac{WP(k)-\min_j WP(j)}{\sum_{i=2}^K (WP(i)-\min_j WP(j))}
Assume that
f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}
represents the conditional probability density function of the dataset given \bf p
, where C({\bf p})
is the normalizing constant. Assume further that {\bf p}
follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K)
. The posterior distribution of \bf p
still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x}))
.
The BCVI is then defined as
BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}
where \alpha_0 = \sum_{k=2}^K \alpha_k.
The variance of p_k
can be computed as
Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.
Value
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
N. Wiroonsri, O. Preedasawakul, "A correlation-based fuzzy cluster validity index with secondary options detector". doi:10.48550/arXiv.2308.14785
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B7_data, B_TANG.IDX, B_XB.IDX, B_Wvalid, B_DB.IDX
Examples
library(BayesCVI)
# The data included in this package.
data = B7_data[,1:2]
# alpha
aalpha = c(20,20,20,5,5,5,0.5,0.5,0.5)
B.WP = B_WP.IDX(x = scale(data), kmax =10, corr = "pearson", method = "FCM",
fzm = 2, sampling = 1, iter = 100, nstart = 20, NCstart = TRUE,
alpha = aalpha, mult.alpha = 1/2)
# plot the BCVI
pplot = plot_BCVI(B.WP)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
BCVI-Wiroonsri (WI) index
Description
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using Wiroonsri (WI) as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
Usage
B_Wvalid(x, kmax, method = "kmeans", corr = "pearson", nstart = 100,
sampling = 1, NCstart = TRUE, alpha = "default", mult.alpha = 1/2)
Arguments
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
corr |
a character string indicating which correlation coefficient is to be computed ( |
nstart |
a maximum number of initial random sets for kmeans for |
sampling |
a number greater than 0 and less than or equal to 1 indicating the undersampling proportion of data to be used. This argument is intended for handling a large dataset. The default is |
NCstart |
logical for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
Details
BCVI-WI is defined as follows.
Let
r_k(\bf x) = \dfrac{WI(k)-\min_j WI(j)}{\sum_{i=2}^K (WI(i)-\min_j WI(j))}
Assume that
f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}
represents the conditional probability density function of the dataset given \bf p
, where C({\bf p})
is the normalizing constant. Assume further that {\bf p}
follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K)
. The posterior distribution of \bf p
still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x}))
.
The BCVI is then defined as
BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}
where \alpha_0 = \sum_{k=2}^K \alpha_k.
The variance of p_k
can be computed as
Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.
Value
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
N. Wiroonsri, "Clustering performance analysis using a new correlation based cluster validity index," Pattern Recognition, 145, 109910, 2024. doi:10.1016/j.patcog.2023.109910
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B2_data, B_TANG.IDX, B_WP.IDX, B_STRPBM.IDX, B_DB.IDX
Examples
library(BayesCVI)
# The data included in this package.
data = B2_data[,1:2]
# alpha
aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5)
B.WI = B_Wvalid(x = scale(data), kmax = 10, method = "kmeans", corr = "pearson",
nstart = 100, sampling = 1, NCstart = TRUE, alpha = aalpha,
mult.alpha = 1/2)
# plot the BCVI
pplot = plot_BCVI(B.WI)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
BCVI-Xie and Beni (XB) index
Description
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using Xie and Beni (XB) as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
Usage
B_XB.IDX(x, kmax, method = "FCM", fzm = 2, nstart = 20,
iter = 100, alpha = "default", mult.alpha = 1/2)
Arguments
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
fzm |
a number greater than 1 giving the degree of fuzzification for |
nstart |
a maximum number of initial random sets for FCM for |
iter |
a maximum number of iterations for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
Details
BCVI-XB is defined as follows.
Let
r_k(\bf x) = \dfrac{\max_j XB(j)- XB(k)}{\sum_{i=2}^K (\max_j XB(j) - XB(i))}.
Assume that
f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}
represents the conditional probability density function of the dataset given \bf p
, where C({\bf p})
is the normalizing constant. Assume further that {\bf p}
follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K)
. The posterior distribution of \bf p
still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x}))
.
The BCVI is then defined as
BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}
where \alpha_0 = \sum_{k=2}^K \alpha_k.
The variance of p_k
can be computed as
Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.
Value
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
X. Xie and G. Beni, “A validity measure for fuzzy clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 8, pp. 841–847, 1991.
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B7_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
Examples
library(BayesCVI)
# The data included in this package.
data = B7_data[,1:2]
# alpha
aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5)
B.XB = B_XB.IDX(x = scale(data), kmax =10, method = "FCM",
fzm = 2, nstart = 20, iter = 100, alpha = aalpha, mult.alpha = 1/2)
# plot the BCVI
pplot = plot_BCVI(B.XB)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
Bayesian cluster validity index
Description
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using an underlying cluster validity index (CVI) and Dirichlet prior parameters of the user's choice. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
Usage
BayesCVIs(CVI, n, kmax, opt.pt, alpha = "default", mult.alpha = 1/2)
Arguments
CVI |
the CVI values for |
n |
a number of data point. |
kmax |
a maximum number of clusters to be considered. |
opt.pt |
a character string indicating whether the maximum or the minimum of |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
Details
BCVI is defined as follows.
Let
r_k(\bf x) = \dfrac{\max_j CVI(j)- CVI(k)}{\sum_{i=2}^K (\max_j CVI(j) - CVI(i))}
for a CVI such that the smallest value indicates the optimal number of clusters and
r_k(\bf x) = \dfrac{CVI(k)-\min_j CVI(j)}{\sum_{i=2}^K (CVI(i)-\min_j CVI(j))}
for a CVI such that the largest value indicates the optimal number of clusters.
Assume that
f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}
represents the conditional probability density function of the dataset given \bf p
, where C({\bf p})
is the normalizing constant. Assume further that {\bf p}
follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K)
. The posterior distribution of \bf p
still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x}))
.
The BCVI is then defined as
BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}
where \alpha_0 = \sum_{k=2}^K \alpha_k.
The variance of p_k
can be computed as
Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.
Value
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
opt.pt |
a character string indicating whether the maximum or the minimum of |
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B2_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
Examples
# install a package for computing an underlying CVI
# install.packages("UniversalCVI")
library(UniversalCVI)
library(BayesCVI)
data = R1_data[,-3]
# Compute WP index by WP.IDX using default gamma
FCM.WP = WP.IDX(scale(data), cmax = 10, cmin = 2, corr = 'pearson', method = 'FCM', fzm = 2,
iter = 100, nstart = 20, NCstart = TRUE)
# WP.IDX values
result = FCM.WP$WP$WPI
aalpha = c(20,20,20,5,5,5,0.5,0.5,0.5)
B.WP = BayesCVIs(CVI = result,
n = nrow(data),
kmax = 10,
opt.pt = "max",
alpha = aalpha,
mult.alpha = 1/2)
# plot the BCVI
pplot = plot_BCVI(B.WP)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
Plots for visualizing BCVI
Description
Plot Bayesian cluster validity index (BCVI) with and without standard deviation error bars and the underlying index.
Usage
plot_BCVI(B.result, mult.err.bar = 2)
Arguments
B.result |
a result from one of the functions |
mult.err.bar |
a multiplier of the stadard deviations to be used for plotting error bars |
Details
BCVI is defined as follows.
Let
r_k(\bf x) = \dfrac{\max_j CVI(j)- CVI(k)}{\sum_{i=2}^K (\max_j CVI(j) - CVI(i))}
for a cluster validity index (CVI) such that the smallest value indicates the optimal number of clusters and
r_k(\bf x) = \dfrac{CVI(k)-\min_j CVI(j)}{\sum_{i=2}^K (CVI(i)-\min_j CVI(j))}
for a CVI such that the largest indicates the optimal number of clusters.
Assume that
f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}
represents the conditional probability density function of the dataset given \bf p
, where C({\bf p})
is the normalizing constant. Assume further that {\bf p}
follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K)
. The posterior distribution of \bf p
still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x}))
.
The BCVI is then defined as
BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}
where \alpha_0 = \sum_{k=2}^K \alpha_k.
The variance of p_k
can be computed as
Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.
Value
plot_index |
a plot of the underlying index for the number of groups from |
plot_BCVI |
a plot of BCVI for the number of groups from |
error_bar_plot |
a plot of BCVI with error bars for the number of groups from |
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. doi:10.1016/j.csda.2024.108053
See Also
B_STRPBM.IDX, B_TANG.IDX, B_XB.IDX, B_Wvalid, B_WP.IDX, B_DB.IDX
Examples
library(BayesCVI)
library(UniversalCVI)
##Soft clustering
# The data included in this package.
data = B7_data[,1:2]
# alpha
aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5)
B.XB = B_XB.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2,
nstart = 20, iter = 100, alpha = aalpha, mult.alpha = 1/2)
# plot the BCVI
pplot = plot_BCVI(B.XB)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
## Hard clustering
# The data included in this package.
data = B2_data[,1:2]
K.STR = STRPBM.IDX(scale(data), kmax = 10, kmin = 2, method = "kmeans",
indexlist = "STR", nstart = 100)
# WP.IDX values
result = K.STR$STR$STR
aalpha = c(20,20,20,5,5,5,0.5,0.5,0.5)
B.STR = BayesCVIs(CVI = result,
n = nrow(data),
kmax = 10,
opt.pt = "max",
alpha = aalpha,
mult.alpha = 1/2)
# plot the BCVI
pplot = plot_BCVI(B.STR)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot