Type: Package
Title: Techniques for Evaluating Clustering
Description: The design of this package allows us to run different clustering packages and compare the results between them, to determine which algorithm behaves best from the data provided. See Martos, L.A.P., García-Vico, Á.M., González, P. et al.(2023) <doi:10.1007/s13748-022-00294-2> "Clustering: an R library to facilitate the analysis and comparison of cluster algorithms.", Martos, L.A.P., García-Vico, Á.M., González, P. et al. "A Multiclustering Evolutionary Hyperrectangle-Based Algorithm" <doi:10.1007/s44196-023-00341-3> and L.A.P., García-Vico, Á.M., González, P. et al. "An Evolutionary Fuzzy System for Multiclustering in Data Streaming" <doi:10.1016/j.procs.2023.12.058>.
Version: 1.7.10
Date: 2024-04-20
Author: Luis Alfonso Perez Martos [aut, cre] (<https://orcid.org/0000-0002-5154-6105>)
Maintainer: Luis Alfonso Perez Martos <lapm0001@gmail.com>
URL: https://github.com/laperez/clustering
License: GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.3.1
Repository: CRAN
Imports: amap, apcluster, cluster, ClusterR, data.table, dplyr, foreach, future, ggplot2, gmp, methods, pracma, pvclust, shiny, sqldf, stats, tools, utils, xtable, toOrdinal
Suggests: DT, shinyalert, shinyFiles, shinyjs, shinythemes, shinyWidgets, tidyverse, shinycssloaders
NeedsCompilation: no
Packaged: 2024-04-22 18:47:54 UTC; luis
Depends: R (≥ 3.5.0)
Date/Publication: 2024-04-22 19:10:11 UTC

Filter metrics in a clustering object returning a new clustering object.

Description

Generates a new filtered clustering object.

Usage

## S3 method for class 'clustering'
clustering[condition = TRUE]

Arguments

clustering

The clustering object to filter.

condition

Expression to filter the clustering object.

Details

This function allows you to filter the data set for a given evaluation metric. The evaluation metrics available are: Algorithm, Distance, Clusters, Data, Var, Time, Entropy, Variation_information, Precision, Recall, F_measure, Fowlkes_mallows_index, Connectivity, Dunn, Silhouette and TimeAtt.

Value

A clustering object filtered from the input parameters.

Examples



result <- Clustering::clustering(df = Clustering::basketball, algorithm = 'clara',
min=3, max=4, metrics = c('Precision','Recall'))

result[Precision > 0.14 & Recall > 0.11]


Method that runs the aggExcluster algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Description

Method that runs the aggExcluster algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Usage

aggExCluster_euclidean(dt, clusters, metric)

Arguments

dt

Matrix or data frame with the set of values to be applied to the algorithm.

clusters

It's an integer that indexes the number of clusters we want to create.

metric

It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette.

Value

Return a list with both the internal and external evaluation of the grouping.


Method that runs the agnes algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Description

Method that runs the agnes algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Usage

agnes_euclidean_method(dt, clusters, metric)

Arguments

dt

Matrix or data frame with the set of values to be applied to the algorithm.

clusters

It's an integer that indexes the number of clusters we want to create.

metric

It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette.

Value

Return a list with both the internal and external evaluation of the grouping.


Method that runs the agnes algorithm using the manhattan metric to make an external or internal validation of the cluster

Description

Method that runs the agnes algorithm using the manhattan metric to make an external or internal validation of the cluster

Usage

agnes_manhattan_method(dt, clusters, metric)

Arguments

dt

matrix or data frame with the set of values to be applied to the algorithm.

clusters

is an integer that indexes the number of clusters we want to create.

metric

is a characters vector with the metrics avalaible in the package. The metrics implemented are: entropy, variation_information, precision,recall,f_measure,fowlkes_mallows_index,connectivity,dunn, silhouette.

Value

returns a list with both the internal and external evaluation of the grouping.


amap package algorithms

Description

amap package algorithms

Usage

algorithm_amap()

Value

list with the algorithms


apcluster package algorithms

Description

apcluster package algorithms

Usage

algorithm_apcluster()

Value

list with the algorithms


cluster package algorithms

Description

cluster package algorithms

Usage

algorithm_cluster()

Value

list with the algorithms


ClusterR package algorithms

Description

ClusterR package algorithms

Usage

algorithm_clusterr()

Value

list with the algorithms


pvclust package algorithms

Description

pvclust package algorithms

Usage

algorithm_pvclust()

Value

list with the algorithms


Method that returns the list of used algorithms

Description

Method that returns the list of used algorithms

Usage

algorithms()

Value

algorithm listing array


Method that returns all the algorithms executed by the package

Description

Method that returns all the algorithms executed by the package

Usage

algorithms_package(packages)

Arguments

packages

package array

Value

array with the algorithms we're going to run


Method that runs the apClusterK algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Description

Method that runs the apClusterK algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Usage

apclusterK_euclidean(dt, clusters, columnClass, metric)

Arguments

dt

Matrix or data frame with the set of values to be applied to the algorithm.

clusters

It's an integer that indexes the number of clusters we want to create.

metric

It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette.

Value

Return a list with both the internal and external evaluation of the grouping.


Method that runs the apclusterK algorithm using the Manhattan metric to make an external or internal validation of the cluster.

Description

Method that runs the apclusterK algorithm using the Manhattan metric to make an external or internal validation of the cluster.

Usage

apclusterK_manhattan(dt, clusters, columnClass, metric)

Arguments

dt

Matrix or data frame with the set of values to be applied to the algorithm.

clusters

It's an integer that indexes the number of clusters we want to create.

metric

It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette.

Value

Return a list with both the internal and external evaluation of the grouping.


Method that runs the apclusterK algorithm using the Minkowski metric to make an external or internal validation of the cluster.

Description

Method that runs the apclusterK algorithm using the Minkowski metric to make an external or internal validation of the cluster.

Usage

apclusterK_minkowski(dt, clusters, columnClass, metric)

Arguments

dt

Matrix or data frame with the set of values to be applied to the algorithm.

clusters

It's an integer that indexes the number of clusters we want to create.

metric

It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette.

Value

Return a list with both the internal and external evaluation of the grouping.


Clustering GUI.

Description

Method that allows us to execute the main algorithm in graphic interface mode instead of through the console.

Usage

appClustering()

Details

The operation of this method is to generate a graphical user. interface to be able to execute the clustering algorithm without knowing the parameters. Its operation is very simple, we can change the values and see the behavior quickly.

Value

GUI with the parameters of the algorithm and their representation in tables and graphs.


This data set contains a series of statistics (5 attributes) about 96 basketball players:

Description

This data set contains a series of statistics about basketball players:

Usage

data(basketball)

Format

A data frame with 96 observations on 5 variables:

This data set contains a series of statistics about basketball players:

assists_per_minuteReal

average number of assistances per minute

heightInteger

height of the player

time_playedReal

time played by the player

ageInteger

number of years of the player

points_per_minuteReal

average number of points per minute

Source

KEEL, <http://www.keel.es/>


Best rated external metrics.

Description

Method in charge of searching for each algorithm those that have the best external classification.

Method that looks for those external attribute that are better classified, making use of the var column. In this way of discard attribute and only work with those that give the best response to the algorithm in question.

Usage

best_ranked_external_metrics(df)

Arguments

df

Matrix or data frame with the result of running the clustering algorithm.

Value

Returns a data.frame with the best classified external attribute.

Examples


result = Clustering::clustering(
               df = cluster::agriculture,
               min = 4,
               max = 4,
               algorithm='clara',
               metrics=c("Recall")
         )

Clustering::best_ranked_external_metrics(df = result)


Best rated internal metrics.

Description

Method in charge of searching for each algorithm those that have the best internal classification.

Method that looks for those internal attributes that are better classified, making use of the Var column. In this way we discard the attributes and only work with those that give the best response to the algorithm in question.

Usage

best_ranked_internal_metrics(df)

Arguments

df

Matrix or data frame with the result of running the clustering algorithm.

Value

Returns a data.frame with the best classified internal attributes.

Examples


result = Clustering::clustering(
               df = cluster::agriculture,
               min = 4,
               max = 5,
               algorithm='gmm',
               metrics=c("Recall")
         )

Clustering::best_ranked_internal_metrics(df = result)



Data from an experiment on the affects of machine adjustments on the time to count bolts.

Description

A manufacturer of automotive accessories provides hardware, e.g. nuts, bolts, washers and screws, to fasten the accessory to the car or truck. Hardware is counted and packaged automatically. Specifically, bolts are dumped into a large metal dish. A plate that forms the bottom of the dish rotates counterclockwise. This rotation forces bolts to the outside of the dish and up along a narrow ledge. Due to the vibration of the dish caused by the spinning bottom plate, some bolts fall off the ledge and back into the dish. The ledge spirals up to a point where the bolts are allowed to drop into a pan on a conveyor belt. As a bolt drops, it passes by an electronic eye that counts it. When the electronic counter reaches the preset number of bolts, the rotation is stopped and the conveyor belt is moved forward

Usage

data(bolts)

Format

A data frame with 40 observations on 8 variables:

A manufacturer of automotive accessories provides hardware, e.g. nuts, bolts, washers and screws, to fasten the accessory to the car or truck. Hardware is counted and packaged automatically. Specifically, bolts are dumped into a large metal dish. A plate that forms the bottom of the dish rotates counterclockwise. This rotation forces bolts to the outside of the dish and up along a narrow ledge. Due to the vibration of the dish caused by the spinning bottom plate, some bolts fall off the ledge and back into the dish. The ledge spirals up to a point where the bolts are allowed to drop into a pan on a conveyor belt. As a bolt drops, it passes by an electronic eye that counts it. When the electronic counter reaches the preset number of bolts, the rotation is stopped and the conveyor belt is moved forward

RUNInteger

is the order in which the data were collected

SPEED1Integer

a speed setting that controls the speed of rotation of the plate at the bottom of the dish

TOTALInteger

total number of bolts (TOTAL) to be counted

SPEED2Integer

a second speed setting hat is used to change the speed of rotation (usually slowing it down) for the last few bolts

NUMBER2Integer

the number of bolts to be counted at this second speed

SENSInteger

the sensitivity of the electronic eye

TIMEReal

The measured response is the time, in seconds

T20BOLTReal

n order to put times on a equal footing the response to be analyzed is the time to count 20 bolts

Details

There are several adjustments on the machine that affect its operation. These include; a speed setting that controls the speed of rotation (SPEED1Integer) of the plate at the bottom of the dish, a total number of bolts (TOTAL) to be counted, a second speed setting (SPEED2Integer) that is used to change the speed of rotation (usually slowing it down) for the last few bolts, the number of bolts to be counted at this second speed (NUMBER2Integer), and the sensitivity of the electronic eye (SENSInteger). The sensitivity setting is to insure that the correct number of bolts are counted. Too few bolts packaged causes customer complaints. Too many bolts packaged increases costs. For each run conducted in this experiment the correct number of bolts was counted. From an engineering standpoint if the correct number of bolts is counted, the sensitivity should not affect the time to count bolts. The measured response is the time (TIMEReal), in seconds, it takes to count the desired number of bolts. In order to put times on a equal footing the response to be analyzed is the time to count 20 bolts (T20BOLTReal). Below are the data for 40 combinations of settings. RUNinteger is the order in which the data were collected.

Source

KEEL, <http://www.keel.es/>


Method that calculates the best rated external metrics.

Description

Method that calculates the best rated external metrics.

Usage

calculate_best_external_variables_by_metrics(df)

Arguments

df

Data matrix or data frame.

Value

Return a table with the external metrics that has the best rating.


Method that calculates the best rated internal metrics.

Description

Method that calculates the best rated internal metrics.

Usage

calculate_best_internal_variables_by_metrics(df)

Arguments

df

Data matrix or data frame.

Value

Return a table with the internal metrics that has the best rating.


Method that calculates which algorithm and which metric behaves best for the datasets provided.

Description

Method that calculates which algorithm and which metric behaves best for the datasets provided.

Usage

calculate_best_validation_external_by_metrics(df, metric)

Arguments

df

Data matrix or data frame.

metric

String with the metric.

Value

Return a table with the algorithm and the best performing metric for the datasets.


Method that calculates which algorithm and which metric behaves best for the datasets provided.

Description

Method that calculates which algorithm and which metric behaves best for the datasets provided.

Usage

calculate_best_validation_internal_by_metrics(df, metric)

Arguments

df

Data matrix or data frame.

metric

String with the metric.

Value

Return a table with the algorithm and the best performing metric for the datasets.


Method to calculate the Connectivity

Description

Method to calculate the Connectivity

Usage

calculate_connectivity(
  distance = NULL,
  clusters,
  datadf = NULL,
  neighbSize = 12,
  method = "euclidean"
)

Arguments

distance

Dissimilarity matrix.

clusters

Array that containe tha data grouped in cluster.

datadf

Dataframe with original data.

neighbSize

Number of neighbours.

method

Indicates the method for calculating distance between points. Default is euclidean.

Value

Return a double with the result of the connectivity calculation.


Method to calculate the dunn.

Description

Method to calculate the dunn.

Usage

calculate_dunn(distance = NULL, clusters, datadf = NULL, method = "euclidean")

Arguments

distance

Dissimilarity matrix.

clusters

Array that containe tha data grouped in cluster.

datadf

Dataframe with original data.

method

Indicate the method for calculating distance between points.

Value

Return a double with the result of the dunn calculation


Method that returns the value or variable depending on where it is in the calculated metrics.

Description

Method that returns the value or variable depending on where it is in the calculated metrics.

Usage

calculate_result(
  algorith,
  distance,
  cluster,
  dataset,
  ranking,
  timeExternal,
  entropy,
  variation_information,
  precision,
  recall,
  fowlkes_mallows_index,
  f_measure,
  timeInternal,
  dunn,
  connectivity,
  silhouette,
  variables
)

Arguments

algorith

Algorithm name.

distance

Name of the metric used to calculate the distance between points.

cluster

Number of clusters.

dataset

Name of dataset.

ranking

Position we want to obtain from the list of variables.

timeExternal

Array with the external validation calculation times of the clustering.

entropy

Array with the calculation of the entropy for each of the variables.

variation_information

Array with the calculation of the variation_information for each of the variables.

precision

Array with the calculation of the precision for each of the variables.

recall

Array with the calculation of the recall for each of the variables.

fowlkes_mallows_index

Array with the calculation of the fowlkes_mallows_index for each of the variables.

f_measure

Array with the calculation of the f_measure for each of the variables.

timeInternal

Array with the internal validation calculation times of the clustering.

dunn

Array with the calculation of the dunn for each of the variables.

connectivity

Array with the calculation of the connectivity for each of the variables.

silhouette

Array with the calculation of the silhouette for each of the variables.

variables

True if we want to show the value of the metric calculation and false if we want to show the variable.

Value

Returns an array with the calculation of each metric based on the indicated position.


Method that returns the value or variable depending on where it is in the calculated metrics.

Description

Method that returns the value or variable depending on where it is in the calculated metrics.

Usage

calculate_result_internal(
  algorith,
  distance,
  cluster,
  dataset,
  ranking,
  timeInternal,
  dunn,
  connectivity,
  silhouette,
  variables
)

Arguments

algorith

Algorithm name.

distance

Name of the metric used to calculate the distance between points.

cluster

Number of clusters.

dataset

Name of dataset.

timeInternal

Array with the internal validation calculation times of the clustering.

dunn

Array with the calculation of the dunn for each of the variables.

connectivity

Array with the calculation of the connectivity for each of the variables.

silhouette

Array with the calculation of the silhouette for each of the variables.

variables

True if we want to show the value of the metric calculation and false if we want to show the variable.

Value

Returns an array with the calculation of each metric based on the indicated position.


Method that calculates which algorithm behaves best for the datasets provided.

Description

Method that calculates which algorithm behaves best for the datasets provided.

Usage

calculate_validation_external_by_metrics(df)

Arguments

df

Data matrix or data frame.

Value

Return a table with the best performing algorithm for the provided datasets.


Method that calculates which algorithm behaves best for the datasets provided.

Description

Method that calculates which algorithm behaves best for the datasets provided.

Usage

calculate_validation_internal_by_metrics(df)

Arguments

df

Data matrix or data frame.

Value

Return a table with the best performing algorithm for the provided datasets.


Method that runs the clara algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Description

Method that runs the clara algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Usage

clara_euclidean_method(dt, clusters, columnClass, metric)

Arguments

dt

Matrix or data frame with the set of values to be applied to the algorithm.

clusters

It's an integer that indexes the number of clusters we want to create.

metric

It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette.

Value

Return a list with both the internal and external evaluation of the grouping.


Method that runs the clara algorithm using the Manhattan metric to make an external or internal validation of the cluster.

Description

Method that runs the clara algorithm using the Manhattan metric to make an external or internal validation of the cluster.

Usage

clara_manhattan_method(dt, clusters, columnClass, metric)

Arguments

dt

Matrix or data frame with the set of values to be applied to the algorithm.

clusters

It's an integer that indexes the number of clusters we want to create.

metric

It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette.

Value

Return a list with both the internal and external evaluation of the grouping.


Clustering algorithm.

Description

Discovering the behavior of attributes in a set of clustering packages based on evaluation metrics.

Usage

clustering(
  path = NULL,
  df = NULL,
  packages = NULL,
  algorithm = NULL,
  min = 3,
  max = 4,
  metrics = NULL
)

Arguments

path

The path of file. NULL It is only allowed to use path or df but not both at the same time. Only files in .dat, .csv or arff format are allowed.

df

data matrix or data frame, or dissimilarity matrix. NULL If you want to use training and test basketball attributes.

packages

character vector with the packets running the algorithm.

NULL The seven packages implemented are: cluster, ClusterR, amap, apcluster, pvclust.
By default runs all packages.

algorithm

character vector with the algorithms implemented within the package. NULL The algorithms implemented are: hclust,apclusterK,agnes,clara,daisy, diana,fanny,mona,pam,gmm, kmeans_arma,kmeans_rcpp,mini_kmeans,pvclust.

min

An integer with the minimum number of clusters This data is necessary to indicate the minimum number of clusters when grouping the data. The default value is 3.

max

An integer with the maximum number of clusters. This data is necessary to indicate the maximum number of clusters when grouping the data. The default value is 4.

metrics

Character vector with the metrics implemented to evaluate the distribution of the data in clusters. NULL The night metrics implemented are: Entropy, Variation_information,
Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn and Silhouette.

Details

The operation of this algorithm is to evaluate how the attributes of a dataset or a set of datasets behave in different clustering algorithms. To do this, it is necessary to indicate the type of evaluation you want to make on the distribution of the data. To be able to execute the algorithm it is necessary to indicate the number of clusters.

min and max, the algorithms algorithm or packages.

packages that we want to cluster and the metrics metrics.

Value

A matrix with the result of running all the metrics of the algorithms contained in the packages indicated. We also obtain information with the types of metrics, algorithms and packages executed.

Examples


Clustering::clustering(
     df = cluster::agriculture,
     min = 3,
     max = 3,
     algorithm='clara',
     metrics=c('Precision')
)




Method to calculate the connectivity.

Description

Method to calculate the connectivity.

Usage

connectivity_metric(distance, clusters_vector, dt, method)

Arguments

distance

Dissimilarity matrix.

clusters_vector

Array that containe tha data grouped in cluster.

dt

Dataframe with original data.

method

Indicates the method for calculating distance between points.

Value

Return a double with the result of the connectivity calculation.


Method that converts a matrix into numerical format.

Description

Method that converts a matrix into numerical format.

Usage

convert_numeric_matrix(datas)

Arguments

datas

information matrix.

Value

return a matrix in numeric format.


Method in charge of creating a table from an array with the values of the variable used as a sample and another with the classification of the values.

Description

Method in charge of creating a table from an array with the values of the variable used as a sample and another with the classification of the values.

Usage

convert_table(clusters_vector, column_dataset_label)

Arguments

clusters_vector

Array of the variable used for the classification.

column_dataset_label

Array with the grouping of the values.

Value

Return a table with the grouping of both arrays.


Method to convert columns to ordinal.

Description

Method to convert columns to ordinal.

Usage

convert_toOrdinal(df)

Arguments

df

data frame with the results.

Value

convert data frame to Ordinal.


Method that runs the daisy algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Description

Method that runs the daisy algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Usage

daisy_euclidean_method(dt, clusters, columnClass, metric)

Arguments

dt

Matrix or data frame with the set of values to be applied to the algorithm.

clusters

It's an integer that indexes the number of clusters we want to create.

metric

It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette.

Value

Return a list with both the internal and external evaluation of the grouping.


Method that runs the daisy algorithm using the Gower metric to make an external or internal validation of the cluster.

Description

Method that runs the daisy algorithm using the Gower metric to make an external or internal validation of the cluster.

Usage

daisy_gower_method(dt, clusters, columnClass, metric)

Arguments

dt

Matrix or data frame with the set of values to be applied to the algorithm.

clusters

It's an integer that indexes the number of clusters we want to create.

metric

It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette.

Value

Return a list with both the internal and external evaluation of the grouping.


Method that runs the daisy algorithm using the Manhattan metric to make an external or internal validation of the cluster.

Description

Method that runs the daisy algorithm using the Manhattan metric to make an external or internal validation of the cluster.

Usage

daisy_manhattan_method(dt, clusters, columnClass, metric)

Arguments

dt

Matrix or data frame with the set of values to be applied to the algorithm.

clusters

It's an integer that indexes the number of clusters we want to create.

metric

It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette.

Value

Return a list with both the internal and external evaluation of the grouping.


Method to filter only the external measurement columns

Description

Method to filter only the external measurement columns

Usage

dataframe_by_metrics_evaluation(data, external = TRUE)

Arguments

data

information matrix.

external

boolean indicating whether it is an external measurement.

Value

returns a data frame with the filtered columns.


Method in charge of detecting the limit of a dataset header.

Description

Method in charge of detecting the limit of a dataset header.

Usage

detect_definition_attribute(path)

Arguments

path

of the dataset

Value

The row where the dataset attributes definition ends


Method that runs the diana algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Description

Method that runs the diana algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Usage

diana_euclidean_method(dt, clusters, metric)

Arguments

dt

Matrix or data frame with the set of values to be applied to the algorithm.

clusters

It's an integer that indexes the number of clusters we want to create.

metric

It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette.

Value

Return a list with both the internal and external evaluation of the grouping.


Method to calculate the dunn.

Description

Method to calculate the dunn.

Usage

dunn_metric(dist, clusters_vector, dt, me)

Arguments

dist

Dissimilarity matrix.

clusters_vector

Array that containe tha data grouped in cluster.

dt

Dataframe with original data.

me

Indicates the method for calculating distance between points.

Value

Return a double with the result of the dunn calculation.


Method for calculating entropy.

Description

Method for calculating entropy.

Usage

entropy_formula(x_vec)

Arguments

x_vec

With datas to calculate entropy.

Value

An array with the calculate.


Method to calculate the entropy.

Description

Method to calculate the entropy.

Usage

entropy_metric(conversion_data_frame, table_convert, column_dataset_label)

Arguments

conversion_data_frame

A double with the result of the entropy calculation.

table_convert

Table conversion (variable - cluster).

column_dataset_label

Array with the calculation of the clustering algorithm.

Value

Return a double with the result of the entropy calculation.


Method in charge of calculating the average for all datasets using all the algorithms defined in the application.

Description

Method in charge of calculating the average for all datasets using all the algorithms defined in the application.

Usage

evaluate_all_column_dataset(datas, method, cluster, nameDataset, metrics)

Arguments

datas

It's a data frame or matrix.

method

Described the metrics used by each of the algorithms.

cluster

Number of clusters.

nameDataset

Specify the name of dataset like information.

metrics

Array with internal or external metrics.

Value

A list with result of external and internal validation applying on algorithms.


Evaluates algorithms by measures of dissimilarity based on a metric.

Description

Method that calculates which algorithm and which metric behaves best for the datasets provided.

Usage

evaluate_best_validation_external_by_metrics(df, metric)

Arguments

df

Data matrix or data frame with the result of running the clustering algorithm.

metric

String with the metric.

Details

Method groups the data by algorithm and distance measure, instead of obtaining the best attribute from the data set.

Value

A data.frame with the algorithms classified by measures of dissimilarity.

Examples


result = Clustering::clustering(
               df = cluster::agriculture,
               min = 4,
               max = 5,
               algorithm='kmeans_rcpp',
               metrics=c("F_measure"))

Clustering::evaluate_best_validation_external_by_metrics(result,'F_measure')


Evaluates algorithms by measures of dissimilarity based on a metric.

Description

Method that calculates which algorithm and which metric behaves best for the datasets provided.

Usage

evaluate_best_validation_internal_by_metrics(df, metric)

Arguments

df

Data matrix or data frame with the result of running the clustering algorithm.

metric

It's a string with the metric to evaluate.

Details

This method groups the data by algorithm and distance measure, instead of obtaining the best attribute from the data set.

Value

A data.frame with the algorithms classified by measures of dissimilarity.

Examples


result = Clustering::clustering(
               df = cluster::agriculture,
               min = 4,
               max = 5,
               algorithm='gmm',
               metrics=c("Precision","Connectivity")
         )

Clustering::evaluate_best_validation_internal_by_metrics(result,"Connectivity")


Evaluate external validations by algorithm.

Description

Method that calculates which algorithm behaves best for the datasets provided.

Usage

evaluate_validation_external_by_metrics(df)

Arguments

df

data matrix or data frame with the result of running the clustering algorithm.

Details

It groups the results of the execution by algorithms.

Value

A data.frame with all the algorithms that obtain the best results regardless of the dissimilarity measure used.

Examples


result = Clustering::clustering(
               df = cluster::agriculture,
               min = 4,
               max = 4,
               algorithm='kmeans_arma',
               metrics=c("Precision")
         )

Clustering::evaluate_validation_external_by_metrics(result)



Evaluate internal validations by algorithm.

Description

Method that calculates which algorithm behaves best for the datasets provided.

Usage

evaluate_validation_internal_by_metrics(df)

Arguments

df

data matrix or data frame with the result of running the clustering algorithm.

Details

It groups the results of the execution by algorithms.

Value

A data.frame with all the algorithms that obtain the best results regardless of the dissimilarity measure used.

Examples


result = Clustering::clustering(
               df = cluster::agriculture,
               min = 4,
               max = 5,
               algorithm='kmeans_rcpp',
               metrics=c("Recall","Silhouette")
         )

Clustering::evaluate_validation_internal_by_metrics(result)


Clustering::evaluate_validation_internal_by_metrics(result$result)



Evaluation clustering algorithm.

Description

Method of performing information processing

Usage

execute_datasets(
  path,
  df,
  packages,
  algorithm,
  cluster_min,
  cluster_max,
  metrics,
  attributes,
  name_dataframe
)

Arguments

path

Path where the datasets are located.

df

Data matrix or data frame, or dissimilarity matrix, depending on the value of the argument.

packages

Array defining the clustering package. The seven packages implemented are: cluster, ClusterR, amap, apcluster, pvclust. By default runs all packages.

algorithm

Array with the algorithms that implement the package. The algorithms implemented are: hclust,apclusterK, agnes,clara,daisy,diana,fanny,mona,pam,gmm,kmeans_arma,kmeans_rcpp, mini_kmeans, pvclust.

cluster_min

Minimum number of clusters. at least one must be.

cluster_max

Maximum number of clusters. cluster_max must be greater or equal cluster_min.

metrics

Array defining the metrics avalaible in the package. The night metrics implemented are: Entropy, Variation_information, Precision, Recall, F_measure, Fowlkes_mallows_index, Connectivity, Dunn and Silhouette.

name_dataframe

Name of data.frame when df is fill.

Value

Returns a matrix with the result of running all the metrics of the algorithms contained in the packages we indicated.


Evaluation clustering algorithm.

Description

Method that evaluates clustering algorithm from a file directory or dataframe.

Usage

execute_package_parallel(
  directory_files,
  df,
  algorithms_execute,
  measures_execute,
  cluster_min,
  cluster_max,
  metrics_execute,
  attributes,
  number_algorithms,
  numberClusters,
  numberDataSets,
  is_metric_external,
  is_metric_internal,
  name_dataframe
)

Arguments

directory_files

It's a string with the route where the datasets are located.

df

Data matrix or data frame, or dissimilarity matrix, depending on the value of the argument.

algorithms_execute

Character vector with the algorithms to be executed. The algorithms implemented are: hclust, apclusterK,agnes,clara,daisy,diana,fanny,mona,pam,gmm,kmeans_arma, kmeans_rcpp,mini_kmeans, pvclust.

measures_execute

Character array with the measurements of dissimilarity to be executed. Depending on the algorithm, one or the other is implemented. Among them we highlight: Euclidena, Manhattan, etc.

cluster_min

Minimum number of clusters.

cluster_max

Maximum number of clusters. cluster_max must be greater or equal cluster_min.

metrics_execute

Character array defining the metrics to be executed. The night metrics implemented are: Entropy, Variation_information, Precision, Recall, F_measure, Fowlkes_mallows_index, Connectivity, Dunn and Silhouette.

number_algorithms

It's a numeric field with the number of algorithms.

numberClusters

It's a numeric field with the difference between clusters.

numberDataSets

It's a numeric field with the number of datasets.

is_metric_external

Boolean field to indicate whether to run external metrics.

is_metric_internal

Boolean field to indicate whether to run internal metrics.

name_dataframe

Name of data.frame when is fill.

Value

Returns a list with the result matrix of evaluating the data from the indicated algorithms, metrics and number of clusters.


Export result of external metrics in latex.

Description

Method that exports the results of external measurements in latex format to a file.

Usage

export_file_external(df, path = NULL)

Arguments

df

It's a dataframe that contains as a parameter a table in latex format with the results of the external validations.

path

It's a string with the path to a directory where a file is to be stored in latex format.

Details

When we work in latex format and we need to create a table to export the results, with this method we can export the results of the clustering algorithm to latex.

Value

A file in Latex format with the results of the external metrics.

Examples


result = Clustering::clustering(
               df = cluster::agriculture,
               min = 4,
               max = 5,
               algorithm='gmm',
               metrics=c("Precision")
         )

Clustering::export_file_external(result)
file.remove("external_data.tex")


Export result of internal metrics in latex.

Description

Method that exports the results of internal measurements in latex format to a file.

Usage

export_file_internal(df, path = NULL)

Arguments

df

It's a dataframe that contains as a parameter a table in latex format with the results of the internal validations.

path

It's a string with the path to a directory where a file is to be stored in latex format.

Details

When we work in latex format and we need to create a table to export the results, with this method we can export the results of the clustering algorithm to latex.

Value

A file in Latex format with the results of the internal metrics.

Examples


result = Clustering::clustering(
               df = cluster::agriculture,
               min = 4,
               max = 5,
               algorithm='gmm',
               metrics=c("Recall","Dunn")
         )

Clustering::export_file_internal(result)
file.remove("internal_data.tex")


Method that return the extension of a file

Description

Method that return the extension of a file

Usage

extension_file(path)

Arguments

path

dataset directory

Value

return the extension of file


Method that applicate differents external metrics about a data frame or matrix, for example precision, recall etc

Description

Method that applicate differents external metrics about a data frame or matrix, for example precision, recall etc

Usage

external_validation(column_dataset_label, clusters_vector, metric = CONST_NULL)

Arguments

column_dataset_label

Array containing the distribution of the data in the cluster.

clusters_vector

Array that containe tha data grouped in cluster.

metric

Array with external metric types.

Value

Return a list of the external results initialized to zero.


Method that runs the fanny algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Description

Method that runs the fanny algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Usage

fanny_euclidean_method(dt, clusters, columnClass, metric)

Arguments

dt

Matrix or data frame with the set of values to be applied to the algorithm.

clusters

It's an integer that indexes the number of clusters we want to create.

metric

It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette.

Value

Return a list with both the internal and external evaluation of the grouping.


Method that runs the fanny algorithm using the Manhattan metric to make an external or internal validation of the cluster.

Description

Method that runs the fanny algorithm using the Manhattan metric to make an external or internal validation of the cluster.

Usage

fanny_manhattan_method(dt, clusters, columnClass, metric)

Arguments

dt

Matrix or data frame with the set of values to be applied to the algorithm.

clusters

It's an integer that indexes the number of clusters we want to create.

metric

It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette.

Value

Return a list with both the internal and external evaluation of the grouping.


Method that fill vector

Description

Method that fill vector

Usage

fill_cluster_vector(data, appcluster)

Arguments

data

matriz or dataframe with dataset

appcluster

data with the information of the appcluster object

Value

a vector fill with information


Method to calculate the f_measure.

Description

Method to calculate the f_measure.

Usage

fmeasure_metric(true_positive, false_positive, false_negative)

Arguments

true_positive

Array with matching elements of B is in the same cluster.

false_positive

Array with non matching element of B is in the same cluster.

false_negative

Array with matching elements of B is not in the same cluster.

Value

Returns a double with the f_measure calculation.


Method to calculate the fowlkes and mallows.

Description

Method to calculate the fowlkes and mallows.

Usage

fowlkes_mallows_index_metric(true_positive, false_positive, false_negative)

Arguments

true_positive

Array with matching elements of B is in the same cluster.

false_positive

Array with non matching element of B is in the same cluster.

false_negative

Array with matching elements of B is not in the same cluster.

Value

Returns a double with the fowlkes_mallows_index calculation.


Method that runs the gmm algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Description

Method that runs the gmm algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Usage

gmm_euclidean_method(dt, clusters, columnClass, metric)

Arguments

dt

Matrix or data frame with the set of values to be applied to the algorithm.

clusters

It's an integer that indexes the number of clusters we want to create.

metric

It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette.

Value

Return a list with both the internal and external evaluation of the grouping.


Method that runs the gmm algorithm using the Manhattan metric to make an external or internal validation of the cluster.

Description

Method that runs the gmm algorithm using the Manhattan metric to make an external or internal validation of the cluster.

Usage

gmm_manhattan_method(dt, clusters, columnClass, metric)

Arguments

dt

Matrix or data frame with the set of values to be applied to the algorithm.

clusters

It's an integer that indexes the number of clusters we want to create.

metric

It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette.

Value

Return a list with both the internal and external evaluation of the grouping.


Method that runs the hcluster algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Description

Method that runs the hcluster algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Usage

hclust_euclidean(dt, clusters, columnClass, metric)

Arguments

dt

Matrix or data frame with the set of values to be applied to the algorithm.

clusters

It's an integer that indexes the number of clusters we want to create.

metric

It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette.

Value

Return a list with both the internal and external evaluation of the grouping.


Method that returns an array with the external information of the cluster

Description

Method that returns an array with the external information of the cluster

Usage

information_external(metrics, information, size, variables)

Arguments

metrics

array with the metrics used in the execution of the package

information

list with external clustering information

size

external number of columns

variables

Null returns the position of the variable, otherwise it returns the value of the variable

Value

array with the information from the calculation of the external evaluation of the clustering


Method that returns an array with the internal information of the cluster

Description

Method that returns an array with the internal information of the cluster

Usage

information_internal(metrics, information, size, variables)

Arguments

metrics

array with the metrics used in the execution of the package

information

list with internal clustering information

size

internal number of columns

variables

Null returns the position of the variable, otherwise it returns the value of the variable

Value

array with the information from the calculation of the internal evaluation of the clustering


Method that return a list of internal validation initialized to zero.

Description

Method that return a list of internal validation initialized to zero.

Usage

initializeExternalValidation()

Value

A list of all values set to zero.


Method that return a list of external validation initialized to zero.

Description

Method that return a list of external validation initialized to zero.

Usage

initializeInternalValidation()

Value

A list of all values set to zero.


Method that applicate differents internal metrics about a data frame or matrix, for example dunn, connectivity etc.

Description

Method that applicate differents internal metrics about a data frame or matrix, for example dunn, connectivity etc.

Usage

internal_validation(
  distance = NULL,
  clusters_vector,
  dataf = NULL,
  method = CONST_EUCLIDEAN,
  metric = NULL
)

Arguments

distance

Dissimilarity matrix.

clusters_vector

Array that containe tha data grouped in cluster.

dataf

Dataframe with original data.

method

Indicates the method for calculating distance between points.

metric

Array with external metric types.

Value

Return a list of the internal results initialized to zero.


Method that checks for external metrics

Description

Method that checks for external metrics

Usage

is_External_Metrics(metrics)

Arguments

metrics

array with the metrics used in the execution of the package

Value

true if it exists and false otherwise


Method that checks for internal metrics

Description

Method that checks for internal metrics

Usage

is_Internal_Metrics(metrics)

Arguments

metrics

array with the metrics used in the execution of the package

Value

true if it exists and false otherwise


Method that runs the kmeans_arma algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Description

Method that runs the kmeans_arma algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Usage

kmeans_arma_method(dt, clusters, columnClass, metric)

Arguments

dt

Matrix or data frame with the set of values to be applied to the algorithm.

clusters

It's an integer that indexes the number of clusters we want to create.

metric

It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette.

Value

Return a list with both the internal and external evaluation of the grouping.


Method that runs the kmeans_rcpp algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Description

Method that runs the kmeans_rcpp algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Usage

kmeans_rcpp_method(dt, clusters, columnClass, metric)

Arguments

dt

Matrix or data frame with the set of values to be applied to the algorithm.

clusters

It's an integer that indexes the number of clusters we want to create.

metric

It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette.

Value

Return a list with both the internal and external evaluation of the grouping.


Method that return max value of metric.

Description

Method that return max value of metric.

Usage

max_value_metric(df, metric, isExternalMetric)

Arguments

df

Data matrix or data frame.

metric

Metric to evaluate.

Value

A value with maximum column.


Metrics of the amap algorithm

Description

Metrics of the amap algorithm

Usage

measure_amap()

Value

list with the metrics


Metrics of the apcluster algorithm

Description

Metrics of the apcluster algorithm

Usage

measure_apcluster()

Value

list with the metrics


Method that returns all the measures executed by the package from the indicated algorithms

Description

Method that returns all the measures executed by the package from the indicated algorithms

Usage

measure_calculate(algorithm)

Arguments

algorithm

algorithms array

Value

array with the measures we're going to run


Metrics of the cluster algorithm

Description

Metrics of the cluster algorithm

Usage

measure_cluster()

Value

list with the metrics


Metrics of the ClusterR algorithm

Description

Metrics of the ClusterR algorithm

Usage

measure_clusterr()

Value

list with the metrics


Method that returns all the measures executed by the package

Description

Method that returns all the measures executed by the package

Usage

measure_package(package)

Arguments

package

package array

Value

array with the measures we're going to run


Metrics of the pvclust algorithm

Description

Metrics of the pvclust algorithm

Usage

measure_pvclust()

Value

list with the metrics


Method in charge of verifying the implemented metrics

Description

Method in charge of verifying the implemented metrics

Usage

metrics_calculate(metrics, variables, internal, external)

Arguments

metrics

array with the metrics used in the execution of the package

variables

boolean field that indicates if it should show the results of the variables

Value

list of metrics


Method that returns the list of used external metrics

Description

Method that returns the list of used external metrics

Usage

metrics_external()

Value

external metrics listing array


Method that returns the list of used internal metrics

Description

Method that returns the list of used internal metrics

Usage

metrics_internal()

Value

internal metrics listing array


Method that returns the list of used metrics

Description

Method that returns the list of used metrics

Usage

metrics_validate()

Value

metrics listing array


Method that runs the mini_kmeans algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Description

Method that runs the mini_kmeans algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Usage

mini_kmeans_method(dt, clusters, columnClass, metric)

Arguments

dt

Matrix or data frame with the set of values to be applied to the algorithm.

clusters

It's an integer that indexes the number of clusters we want to create.

metric

It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette.

Value

Return a list with both the internal and external evaluation of the grouping.


Method that runs the mona algorithm using external or internal validation of the cluster.

Description

Method that runs the mona algorithm using external or internal validation of the cluster.

Usage

mona_method(dt, clusters, columnClass, metric)

Arguments

dt

Matrix or data frame with the set of values to be applied to the algorithm.

clusters

It's an integer that indexes the number of clusters we want to create.

metric

It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette.

Value

Return a list with both the internal and external evaluation of the grouping.


Method that returns how many external metrics there are in the array of metrics used in the calculation

Description

Method that returns how many external metrics there are in the array of metrics used in the calculation

Usage

number_columnas_external(metrics)

Arguments

metrics

array with the metrics used in the execution of the package

Value

returns the number of occurrences


Method that returns how many internal metrics there are in the array of metrics used in the calculation

Description

Method that returns how many internal metrics there are in the array of metrics used in the calculation

Usage

number_columnas_internal(metrics)

Arguments

metrics

array with the metrics used in the execution of the package

Value

returns the number of occurrences


Method that returns the number of variables in a dataset directory

Description

Method that returns the number of variables in a dataset directory

Usage

number_variables_dataset(path)

Arguments

path

dataset directory

Value

returns the number of variables in a dataset directory


Method that returns the list of used packages

Description

Method that returns the list of used packages

Usage

packages()

Value

package listing array


Method that runs the pam algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Description

Method that runs the pam algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Usage

pam_euclidean_method(dt, clusters, columnClass, metric)

Arguments

dt

Matrix or data frame with the set of values to be applied to the algorithm.

clusters

It's an integer that indexes the number of clusters we want to create.

metric

It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette.

Value

Return a list with both the internal and external evaluation of the grouping.


Method that runs the pam algorithm using the Manhattan metric to make an external or internal validation of the cluster.

Description

Method that runs the pam algorithm using the Manhattan metric to make an external or internal validation of the cluster.

Usage

pam_manhattan_method(dt, clusters, columnClass, metric)

Arguments

dt

Matrix or data frame with the set of values to be applied to the algorithm.

clusters

It's an integer that indexes the number of clusters we want to create.

metric

It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette.

Value

Return a list with both the internal and external evaluation of the grouping.


Method that return a list of files that exists in a directory

Description

Method that return a list of files that exists in a directory

Usage

path_dataset(directory)

Arguments

directory

of the directory

Value

a vector with the files existing into of a directory


Graphic representation of the evaluation measures.

Description

Graphical representation of the evaluation measures grouped by cluster.

Usage

plot_clustering(df, metric)

Arguments

df

data matrix or data frame with the result of running the clustering algorithm.

metric

it's a string with the name of the metric select to evaluate.

Details

In certain cases the review or filtering of the data is necessary to select the data, that is why thanks to the graphic representations this task is much easier. Therefore with this method we will be able to filter the data by metrics and see the data in a graphical way.

Value

Generate an image with the distribution of the clusters by metrics.

Examples


result = Clustering::clustering(
               df = cluster::agriculture,
               min = 4,
               max = 5,
               algorithm='gmm',
               metrics=c("Precision")
         )

Clustering::plot_clustering(result,c("Precision"))


Method to calculate the precision.

Description

Method to calculate the precision.

Usage

precision_metric(true_positive, false_positive)

Arguments

true_positive

Array with matching elements of B is in the same cluster.

false_positive

Array with non matching element of B is in the same cluster.

Value

Returns a double with the precision calculation.


Method that runs the pvclust algorithm using the Correlation metric to make an external or internal validation of the cluster.

Description

Method that runs the pvclust algorithm using the Correlation metric to make an external or internal validation of the cluster.

Usage

pvclust_correlation_method(dt, clusters, columnClass, metric)

Arguments

dt

Matrix or data frame with the set of values to be applied to the algorithm.

clusters

It's an integer that indexes the number of clusters we want to create.

metric

It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette.

Value

Return a list with both the internal and external evaluation of the grouping.


Method that runs the pvclust algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Description

Method that runs the pvclust algorithm using the Euclidean metric to make an external or internal validation of the cluster.

Usage

pvclust_euclidean_method(dt, clusters, columnClass, metric)

Arguments

dt

Matrix or data frame with the set of values to be applied to the algorithm.

clusters

It's an integer that indexes the number of clusters we want to create.

metric

It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette.

Value

Return a list with both the internal and external evaluation of the grouping.


Method that runs the pvpick algorithm using an external or internal validation of the cluster.

Description

Method that runs the pvpick algorithm using an external or internal validation of the cluster.

Usage

pvpick_method(dt, clusters, columnClass, metric)

Arguments

dt

Matrix or data frame with the set of values to be applied to the algorithm.

clusters

It's an integer that indexes the number of clusters we want to create.

metric

It's a characters vector with the metrics avalaible in the package. The metrics implemented are: Entropy, Variation_information, Precision,Recall,F_measure,Fowlkes_mallows_index,Connectivity,Dunn, Silhouette.

Value

Return a list with both the internal and external evaluation of the grouping.


Method that converts a dataset into a matrix

Description

Method that converts a dataset into a matrix

Usage

read_file(path)

Arguments

path

dataset directory

Value

returns a matrix whose content is the dataset received as a parameter


Method to calculate the recall.

Description

Method to calculate the recall.

Usage

recall_metric(true_positive, false_negative)

Arguments

true_positive

Array with matching elements of B is in the same cluster.

false_negative

Array with matching elements of B is not in the same cluster.

Value

Returns a double with the recall calculation.


Method for refactoring the distance measurement name.

Description

Method for refactoring the distance measurement name.

Usage

refactorName(nameMeasure)

Arguments

nameMeasure

name of the distance measure

Value

a string with the refactored measure name


Method for filtering clustering results.

Description

Method for filtering clustering results.

Usage

resultClustering(result)

Arguments

result

data.frame with clustering results.

Value

a matrix with the filtered columns.


External results by algorithm.

Description

It is used for obtaining the results of an algorithm indicated as a parameter grouped by number of clusters.

Usage

result_external_algorithm_by_metric(df, metric)

Arguments

df

data matrix or data frame with the result of running the clustering algorithm.

metric

It's a string with the metric to evaluate.

Value

A data.frame with the results of the algorithm indicated as parameter.

Examples


result = Clustering::clustering(
               df = cluster::agriculture,
               min = 4,
               max = 5,
               algorithm='gmm',
               metrics=c("Precision")
         )

Clustering::result_external_algorithm_by_metric(result,'Precision')


Internal results by algorithm

Description

It is used for obtaining the results of an algorithm indicated as a parameter grouped by number of clusters.

Usage

result_internal_algorithm_by_metric(df, metric)

Arguments

df

data matrix or data frame with the result of running the clustering algorithm.

metric

It's a string with the metric we want to evaluate your results.

Value

A data.frame with the results of the algorithm indicated as parameter.

Examples


result = Clustering::clustering(
               df = cluster::agriculture,
               min = 4,
               max = 5,
               algorithm='gmm',
               metrics=c("Recall","Silhouette")
         )

Clustering::result_internal_algorithm_by_metric(result,'Silhouette')


Method in charge of obtaining those metrics that are external from those indicated.

Description

Method in charge of obtaining those metrics that are external from those indicated.

Usage

row_name_df_external(metrics)

Arguments

metrics

Array with the metrics used in the calculation.

Value

Return an array with the metrics that are external.


Method in charge of obtaining those metrics that are internal from those indicated.

Description

Method in charge of obtaining those metrics that are internal from those indicated.

Usage

row_name_df_internal(metrics)

Arguments

metrics

Array with the metrics used in the calculation.

Value

Return an array with the metrics that are internal.


Method that returns a table with the algorithm and the metric indicated as parameters.

Description

Method that returns a table with the algorithm and the metric indicated as parameters.

Usage

show_result_external_algorithm_by_metric(df, metric)

Arguments

df

Data matrix or data frame.

metric

String with the metric.

Value

Return a table with the algorithm and the metric indicated as parameter.


Method in charge of obtaining a table with the results of the algorithms grouped by clusters, calculating the maximum value of each external metrics.

Description

Method in charge of obtaining a table with the results of the algorithms grouped by clusters, calculating the maximum value of each external metrics.

Usage

show_result_external_algorithm_group_by_clustering(df)

Arguments

df

Data matrix or data frame.

Value

Return a table with the algorithms and the clusters.


Method that returns a table with the algorithm and the metric indicated as parameters.

Description

Method that returns a table with the algorithm and the metric indicated as parameters.

Usage

show_result_internal_algorithm_by_metric(df, metric)

Arguments

df

Data matrix or data frame.

metric

An which we will calculate the results.

Value

Return a table with the algorithm and the metric indicated as parameter.


Method in charge of obtaining a table with the results of the algorithms grouped by clusters, calculating the maximum value of each internal metrics.

Description

Method in charge of obtaining a table with the results of the algorithms grouped by clusters, calculating the maximum value of each internal metrics.

Usage

show_result_internal_algorithm_group_by_clustering(df)

Arguments

df

Data matrix or data frame.

Value

Return a table with the algorithms and the clusters.


Method to calculate the silhouette.

Description

Method to calculate the silhouette.

Usage

silhouette_metric(clusters_vector, distance)

Arguments

clusters_vector

Array that containe tha data grouped in cluster.

distance

Dissimilarity matrix.

Value

Return a double with the result of the silhouette calculation.


Returns the clustering result sorted by a set of metrics.

Description

This function receives a clustering object and sorts the columns by parameter. By default it performs sorting by the algorithm field.

Usage

## S3 method for class 'clustering'
sort(x, decreasing = TRUE, ...)

Arguments

x

It's an clustering object.

decreasing

A logical indicating if the sort should be increasing or decreasing. By default, decreasing.

...

Additional parameters as "by", a String with the name of the evaluation measure to order by. Valid values are: Algorithm, Distance, Clusters, Data, Var, Time, Entropy, Variation_information, Precision, Recall, F_measure, Fowlkes_mallows_index, Connectivity, Dunn, Silhouette and TimeAtt.

Details

The additional argument in "..." is the 'by' argument, which is a array with the name of the evaluation measure to order by. Valid value are: Algorithm, Distance, Clusters, Data, Var, Time, Entropy, Variation_information, Precision, Recall, F_measure, Fowlkes_mallows_index, Connectivity, Dunn, Silhouette, TimeAtt.

Value

Another clustering object with the evaluation measures sorted

Examples



result <-
Clustering::clustering(df = cluster::agriculture,min = 4, max = 4,algorithm='gmm',
metrics='Recall')

sort(result, FALSE, 'Recall')


Method that format a number with four digits

Description

Method that format a number with four digits

Usage

specify_decimal(x, k)

Arguments

x

number

k

number of decimals

Value

a number convert to string with four digits


The data provided are daily stock prices from January 1988 through October 1991, for ten aerospace companies.

Description

The data provided are daily stock prices from January 1988 through October 1991, for ten aerospace companies.

Usage

data(stock)

Format

A data frame with 950 observations on 10 variables:

The data provided are daily stock prices from January 1988 through October 1991, for ten aerospace companies.

Company1

company1 details

Company2

company2 details

Company3

company3 details

Company4

company4 details

Company5

company5 details

Company6

company6 details

Company7

company7 details

Company8

company8 details

Company9

company9 details

Company10

company10 details

Source

KEEL, <http://www.keel.es/>


The study was performed at the 2nd Department of Medicine, 1st Faculty of Medicine of Charles University and Charles University Hospital. The data were transferred to electronic form by the European Centre of Medical Informatics, Statisticsand Epidemiology of Charles University and Academy of Sciences.

Description

The study was performed at the 2nd Department of Medicine, 1st Faculty of Medicine of Charles University and Charles University Hospital. The data were transferred to electronic form by the European Centre of Medical Informatics, Statisticsand Epidemiology of Charles University and Academy of Sciences.

Usage

data(stulong)

Format

A data frame with 1417 observations on 5 variables.

The study was performed at the 2nd Department of Medicine, 1st Faculty of Medicine of Charles University and Charles University Hospital. The data were transferred to electronic form by the European Centre of Medical Informatics, Statisticsand Epidemiology of Charles University and Academy of Sciences.

a1

Height

a2

Weight

a3

Blood pressure I systolic (mm Hg)

a4

Blood pressure I diastolic (mm Hg)

a5

ercentage Cholesterol in mg

Source

KEEL, <http://www.keel.es/>


Method for filtering external columns of a dataset.

Description

Method for filtering external columns of a dataset.

Usage

transform_dataset(df)

Arguments

df

Data frame with clustering results.

Value

Dafa frame filtered with the columns of the external measurements.

Exists internal measure


Method for filtering internal columns of a dataset.

Description

Method for filtering internal columns of a dataset.

Usage

transform_dataset_internal(df)

Arguments

df

data frame with clustering results.

Value

dafa frame filtered with the columns of the internal measurements.

Exists internal measure


Method to calculate the variation information.

Description

Method to calculate the variation information.

Usage

variation_information_metric(conversion_data_frame, table_convert)

Arguments

conversion_data_frame

Return a double with the result of the entropy calculation.

table_convert

Table conversion (variable - cluster).

Value

Returns a double with the result of the variation information calculation.


One of the most known testing data sets in machine learning. This data sets describes several situations where the weather is suitable or not to play sports, depending on the current outlook, temperature, humidity and wind.

Description

One of the most known testing data sets in machine learning. This data sets describes several situations where the weather is suitable or not to play sports, depending on the current outlook, temperature, humidity and wind.

Usage

data(weather)

Format

A data frame with 14 observations on 5 variables:

One of the most known testing data sets in machine learning. This data sets describes several situations where the weather is suitable or not to play sports, depending on the current outlook, temperature, humidity and wind.

Outlook

sunny, overcast, rainy

Temperature

hot, mild, cool

Humidity

high, normal

Windy

true, false

Play

yes, no

Source

KEEL, <http://www.keel.es/>

mirror server hosted at Truenetwork, Russian Federation.