Help for package SKFCPD

Type:

Package

Title:

Fast Online Changepoint Detection for Temporally Correlated Data

Version:

0.2.4

Date:

2024-02-15

Maintainer:

Hanmo Li <hanmo@pstat.ucsb.edu>

Author:

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Description:

Sequential Kalman filter for scalable online changepoint detection by temporally correlated data. It enables fast single and multiple change points with missing values. See the reference: Hanmo Li, Yuedong Wang, Mengyang Gu (2023), <doi:10.48550/arXiv.2310.18611>.

License:

GPL (≥ 3)

Depends:

R (≥ 3.5.0), methods (≥ 4.2.2), rlang (≥ 1.0.6), ggplot2 (≥ 3.4.0), ggpubr (≥ 0.5.0), reshape2 (≥ 1.4.4), FastGaSP (≥ 0.5.2)

Imports:

Rcpp (≥ 1.0.9)

LinkingTo:

Rcpp, RcppEigen

NeedsCompilation:

yes

Encoding:

UTF-8

Packaged:

2024-02-16 05:00:08 UTC; lihan

Repository:

CRAN

Date/Publication:

2024-02-17 23:30:12 UTC

Dynamic Linear Model for Online Changepoint Detection

Description

The 'SKFCPD' package provides estimation of changepoint locations using the Dynamic Linear Model (DLM) within the Bayesian Online Changepoint Detection (BOCPD) framework. The efficient computation is achieved through implementation of the Sequential Kalman filter. The range parameter and noise-to-signal ratio are estimated from training samples via a Gaussian process model. This package is capable of handling multidimensional data with temporal correlations and random missing patterns.

Details

The DESCRIPTION file:

Package:	SKFCPD
Type:	Package
Title:	Fast Online Changepoint Detection for Temporally Correlated Data
Version:	0.2.4
Date:	2024-02-15
Authors@R:	c(person(given="Hanmo",family="Li",role=c("aut", "cre"), email="hanmo@pstat.ucsb.edu"), person(given="Yuedong",family="Wang", role=c("aut"), email="yuedong@pstat.ucsb.edu"), person(given="Mengyang",family="Gu", role=c("aut"), email="mengyang@pstat.ucsb.edu"))
Maintainer:	Hanmo Li <hanmo@pstat.ucsb.edu>
Author:	Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]
Description:	Sequential Kalman filter for scalable online changepoint detection by temporally correlated data. It enables fast single and multiple change points with missing values. See the reference: Hanmo Li, Yuedong Wang, Mengyang Gu (2023), <arXiv:2310.18611>.
License:	GPL (>= 3)
Depends:	R (>= 3.5.0), methods (>= 4.2.2), rlang (>= 1.0.6), ggplot2 (>= 3.4.0), ggpubr (>= 0.5.0), reshape2 (>= 1.4.4), FastGaSP (>= 0.5.2)
Imports:	Rcpp (>= 1.0.9)
LinkingTo:	Rcpp, RcppEigen
NeedsCompilation:	yes
Encoding:	UTF-8
Packaged:	2024-02-15 11:15:56 UTC; lihan
Archs:	x64

Index of help topics:

Estimate_GP_params      Estimate parameters from fast computation of
                        GaSP model
SKFCPD                  Getting the results of the SKFCPD model
SKFCPD-class            Class '"SKFCPD"'
SKFCPD-package          Dynamic Linear Model for Online Changepoint
                        Detection
plot_SKFCPD             Plot for SKFCPD model

Implements a fast online changepoint detection algorithm using dynamic linear model based on Sequential Kalman filter. It's for temporally correlated data and accepts multi-dimensional datasets with missing values.

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

Li, Hanmo, Yuedong Wang, and Mengyang Gu. Sequential Kalman filter for fast online changepoint detection in longitudinal health records. arXiv preprint arXiv:2310.18611 (2023).

Fearnhead, Paul, and Zhen Liu. On-line inference for multiple changepoint problems. Journal of the Royal Statistical Society Series B: Statistical Methodology 69, no. 4 (2007): 589-605.

Adams, Ryan Prescott, and David JC MacKay. Bayesian online changepoint detection. arXiv preprint arXiv:0710.3742 (2007).

Hartikainen, Jouni, and Simo Sarkka. Kalman filtering and smoothing solutions to temporal Gaussian process regression models. In 2010 IEEE international workshop on machine learning for signal processing, pp. 379-384. IEEE, 2010.

Gu, Mengyang, and Yanxun Xu. Fast nonseparable Gaussian stochastic process with application to methylation level interpolation. Journal of Computational and Graphical Statistics 29, no. 2 (2020): 250-260.

Gu, Mengyang, and Weining Shen. Generalized probabilistic principal component analysis of correlated data. The Journal of Machine Learning Research 21, no. 1 (2020): 428-468.

Gu, Mengyang, Xiaojing Wang, and James O. Berger. Robust Gaussian stochastic process emulation. The Annals of Statistics 46, no. 6A (2018): 3038-3066.

Examples

  library(SKFCPD)
  
  #------------------------------------------------------------------------------
  # Example: fast online changepoint detection with DEPENDENT data.
  # 
  # Data generation: Data follows a multidimensional Gaussian process with Matern 2.5 kernel.
  #------------------------------------------------------------------------------
  # Data Generation
  set.seed(1)
  
  n_obs = 150
  n_dim = 2
  seg_len = c(70, 30, 20,30)
  mean_each_seg = c(0,1,-1,0)
  
  x_mat=matrix(1:n_obs)
  y_mat=matrix(NA, nrow=n_obs, ncol=n_dim)
  
  gamma = rep(5, n_dim) # range parameter of the covariance matrix
  
  # compute the matern 2.5 kernel
  construct_cor_matrix = function(input, gamma){
    n = length(input)
    R0=abs(outer(input,(input),'-'))
    matrix_one = matrix(1, n, n)
    const = sqrt(5) * R0 / gamma
    Sigma = (matrix_one + const + const^2/3) * (exp(-const))
    return(Sigma)
  }
  
  for(j in 1:n_dim){
    y_each_dim = c()
    for(i in 1:length(seg_len)){
      nobs_per_seg = seg_len[i]
      Sigma = construct_cor_matrix(1:nobs_per_seg, gamma[j])
      L=t(chol(Sigma))
      theta=rep(mean_each_seg[i],nobs_per_seg)+L%*%rnorm(nobs_per_seg)
      y_each_dim = c(y_each_dim, theta+0.1*rnorm(nobs_per_seg))
    }
    y_mat[,j] = y_each_dim
  }
  
  ## Detect changepoints by SKFCPD
  Online_CPD_1 = SKFCPD(design = x_mat,
                        response = y_mat,
                        train_prop = 1/3)
  
  ## visulize the results
  plot_SKFCPD(Online_CPD_1)

Setting up the CPD_DLM model

Description

Implementing the robust GaSP model for estimating the changepoint locations. The range parameter and noise-to-signal ratio are estimated from the training samples by a Gaussian process model.

Usage

  CPD_DLM(design, response, gamma,model_type, mu, sigma_2, eta,
         kernel_type, stop_at_first_cp, hazard_vec,
         truncate_at_prev_cp)

Arguments

design

A matrix with dimension n x p. The design of the experiment.

response

A matrix with dimension n x q. The observations.

gamma

A numeric variable of the range parameter for the covariance matrix. The default value of gamma is 1.

model_type

A numeric variable that can take values of 0, 1 and 2. Model_type=0 stands for a GP model with unknown mean and known variance. Model_type=1 stands for a GP model with known mean and unknown variance. Model_type=2 stands for a GP model with unknown mean and unknown variance. The default value of model_type is 2.

mu

A vector of the mean parameter at each coordinate. Ignored when model_type = 0 or 2.

sigma_2

A vector of the variance parameter at each coordinate.

eta

A vector of the noise-to-signal ratio at each coordinate

kernel_type

A character specifying the type of kernels of the input. matern_5_2 are Matern correlation with roughness parameter 5/2. exp is power exponential correlation with roughness parameter alpha=2. The default choice is matern_5_2.

stop_at_first_cp

A numeric variable that decides if the SKFCPD method stops when it detects the first changepoint. The default value of stop_at_first_cp is FALSE.

hazard_vec

The hazard vector in the SKFCPD method. 1/vector is the prior probability that a changepoint occur at a vector of time points.

truncate_at_prev_cp

If TRUE, truncate the run length at the most recently detected changepoint. The default value of truncate_at_prev_cp is FALSE.

Value

SKFCPD returns a S4 object of class SKFCPD (see SKFCPD-class).

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

Li, Hanmo, Yuedong Wang, and Mengyang Gu. Sequential Kalman filter for fast online changepoint detection in longitudinal health records. arXiv preprint arXiv:2310.18611 (2023).

Fearnhead, Paul, and Zhen Liu. On-line inference for multiple changepoint problems. Journal of the Royal Statistical Society Series B: Statistical Methodology 69, no. 4 (2007): 589-605.

Adams, Ryan Prescott, and David JC MacKay. Bayesian online changepoint detection. arXiv preprint arXiv:0710.3742 (2007).

Generating coefficient and conditional matrics

Description

Generating coefficient and conditional matrics for Gaussian Process(GP) model with Matern 2.5 or power exponential kernels.

Usage

Construct_G_W_W0_V(d, gamma, eta, kernel_type, is_initial)

Arguments

d

A value of the distance between the sorted input.

gamma

A value of the range parameter for the covariance matrix.

eta

The noise-to-signal ratio.

kernel_type

is_initial

A bolean variable. is_initial=TRUE means the matrics generated is for the inital state.

Value

A list of GG, W, W0 and VV matrix.

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

Hartikainen, J. and Sarkka, S. (2010). Kalman filtering and smoothing solutions to temporal gaussian process regression models. Machine Learning for Signal Processing (MLSP), 2010 IEEE International Workshop, 379-384.

M. Gu, Y. Xu (2019), fast nonseparable Gaussian stochastic process with application to methylation level interpolation. Journal of Computational and Graphical Statistics, In Press, arXiv:1711.11501.

Campagnoli P, Petris G, Petrone S. (2009), Dynamic linear model with R. Springer-Verlag New York.

The coefficient matrix in the dynamic linear model when kernel is the exponential covariance

Description

The coefficient matrix in the dynamic linear model when kernel is the exponential covariance.

Usage

Construct_G_exp_fastGP(delta_x,lambda)

Arguments

delta_x

the distance between the sorted input.

lambda

the transformed range parameter.

Value

GG matrix.

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

Campagnoli P, Petris G, Petrone S. (2009), Dynamic linear model with R. Springer-Verlag New York.

The coefficient matrix in the dynamic linear model when kernel is the Matern covariance with roughness parameter 2.5.

Description

The coefficient matrix in the dynamic linear model when kernel is the Matern covariance with roughness parameter 2.5.

Usage

Construct_G_matern_5_2_fastGP(delta_x,lambda)

Arguments

delta_x

A vector of the distance between the sorted input.

lambda

the transformed range parameter.

Value

GG matrix.

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

Campagnoli P, Petris G, Petrone S. (2009), Dynamic linear model with R. Springer-Verlag New York.

The coefficient matrix in the dynamic linear model when kernel is the Matern covariance with roughness parameter 2.5.

Description

The coefficient matrix in the dynamic linear model when kernel is the Matern covariance with roughness parameter 2.5.

Usage

Construct_G_matern_5_2_one_dim(delta_x,lambda)

Arguments

delta_x

A value of the distance between the sorted input.

lambda

the transformed range parameter.

Value

GG matrix.

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

Campagnoli P, Petris G, Petrone S. (2009), Dynamic linear model with R. Springer-Verlag New York.

covariance of the stationary distribution of the state when kernel is the exponential covariance.

Description

This function computes the covariance of the stationary distribution of the state when kernel is the exponential covariance.

Usage

Construct_W0_exp_one_dim(lambda)

Arguments

lambda

the transformed range parameter.

Value

W0 matrix.

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

Campagnoli P, Petris G, Petrone S. (2009), Dynamic linear model with R. Springer-Verlag New York.

covariance of the stationary distribution of the state when kernel is the Matern covariance with roughness parameter 2.5.

Description

This function computes covariance of the stationary distribution of the state when kernel is the Matern covariance with roughness parameter 2.5.

Usage

Construct_W0_matern_5_2_one_dim(lambda)

Arguments

lambda

the transformed range parameter.

Value

W0 matrix.

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

Campagnoli P, Petris G, Petrone S. (2009), Dynamic linear model with R. Springer-Verlag New York.

The conditional covariance matrix of the state in the dynamic linear model when kernel is the exponential covariance

Description

The conditional covariance matrix of the state in the dynamic linear model when kernel is the exponential covariance.

Usage

Construct_W_exp_fastGP(delta_x,lambda,W0)

Arguments

delta_x

the distance between the sorted input.

lambda

the transformed range parameter.

W0

the covariance matrix of the stationary distribution of the state.

Value

W matrix.

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

Campagnoli P, Petris G, Petrone S. (2009), Dynamic linear model with R. Springer-Verlag New York.

The conditional covariance matrix for matern covariance with roughness parameter 2.5

Description

The conditional covariance matrix of the state in the dynamic linear model when kernel is the matern covariance with roughness parameter 2.5.

Usage

Construct_W_matern_5_2_fastGP(delta_x,lambda, W0)

Arguments

delta_x

a vector of the distance between the sorted input.

lambda

the transformed range parameter.

W0

the covariance matrix of the stationary distribution of the state.

Value

W matrix.

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

Campagnoli P, Petris G, Petrone S. (2009), Dynamic linear model with R. Springer-Verlag New York.

The conditional covariance matrix for matern covariance with roughness parameter 2.5

Description

The conditional covariance matrix of the state in the dynamic linear model when kernel is the matern covariance with roughness parameter 2.5.

Usage

Construct_W_matern_5_2_one_dim(delta_x,lambda)

Arguments

delta_x

a value of the distance between the sorted input.

lambda

the transformed range parameter.

Value

W matrix.

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

Campagnoli P, Petris G, Petrone S. (2009), Dynamic linear model with R. Springer-Verlag New York.

Estimate parameters from fast computation of GaSP model

Description

Getting the estimated parameters from fast computation of the Gaussian stochastic process (GaSP) model with the Matern kernel function with a noise.

Usage

  Estimate_GP_params(input, output, kernel_type='matern_5_2')

Arguments

input

a vector with dimension num_obs x 1 for the sorted input locations.

output

a vector with dimension n x 1 for the observations at the sorted input locations.

kernel_type

a character to specify the type of kernel to use. The current version supports kernel_type to be "matern_5_2" or "exp", meaning that the matern kernel with roughness parameter being 2.5 or 0.5 (power exponent kernel), respectively.

Value

Estimate_GP_params returns an S4 object of class Estimated_GP_params with estimated parameters including

beta

the inverse range parameter, i.e. beta=1/gamma

eta

the noise-to-signal ratio

sigma_2

the variance parameter

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

Gu, Mengyang, and Weining Shen. Generalized probabilistic principal component analysis of correlated data. The Journal of Machine Learning Research 21, no. 1 (2020): 428-468.

Gu, Mengyang, Xiaojing Wang, and James O. Berger. Robust Gaussian stochastic process emulation. The Annals of Statistics 46, no. 6A (2018): 3038-3066.

Examples


  library(SKFCPD)

  #------------------------------------------------------------------------------
  # simple example with noise
  #------------------------------------------------------------------------------
  
  y_R<-function(x){
    cos(2*pi*x)
  }
  ###let's test for 100 observations
  set.seed(1)
  num_obs=100
  input=runif(num_obs)
  output=y_R(input)+rnorm(num_obs,mean=0,sd=1)
  
  ## run Estimate_GP_params to get estimated parameters
  params_est = Estimate_GP_params(input, output)
  print(params_est@beta) ## inverse of range parameter
  print(params_est@eta) ## noise-to-signal ratio
  print(params_est@sigma_2) ## variance

Estimated GaSP parameters class

Description

S4 class for fast parameter estimation of the Gaussian stochastic process (GaSP) model with the Matern kernel function with or without a noise.

Objects from the Class

Objects of this class are created with the function Estimate_GP_params that computes the calculations needed for setting up the estimation and prediction.

Slots

beta:: object of class numeric for the inverse of the range parameter, i.e. beta = 1/gamma.
eta:: object of class numeric for the estimated noise-to-signal parameter.
sigma_2:: object of class numeric for the estimated variance parameter.

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

Hartikainen, J. and Sarkka, S. (2010). Kalman filtering and smoothing solutions to temporal gaussian process regression models, Machine Learning for Signal Processing (MLSP), 2010 IEEE International Workshop, 379-384.

M. Gu, Y. Xu (2017), Nonseparable Gaussian stochastic process: a unified view and computational strategy, arXiv:1711.11501.

M. Gu, X. Wang and J.O. Berger (2018), Robust Gaussian Stochastic Process Emulation, Annals of Statistics, 46, 3038-3066.

Computing the predictive distribution in the online fashion

Description

This function computs the predictive distribution of the run length in the online fashion.

Usage

  GaSP_CPD_pred_dist_objective_prior_KF_online(KF_params, prev_L_params, cur_point,
  d, gamma, model_type, mu, sigma_2, eta, kernel_type, G_W_W0_V_ini, G_W_W0_V)

Arguments

KF_params

A list of current Kalman filter parameters.

prev_L_params

A list of previous Kalman filter parameters.

cur_point

A value of current observation.

d

A value of the distance between the sorted input.

gamma

A numeric variable of the range parameter for the covariance matrix. The default value of gamma is 1.

model_type

mu

A vector of the mean parameter at each coordinate. Ignored when model_type = 0 or 2.

sigma_2

A vector of the variance parameter at each coordinate.

eta

A vector of the noise-to-signal ratio at each coordinate

kernel_type

G_W_W0_V_ini

A list of the initial coefficient and conditional matrics for Gaussian Process(GP) model. It's the output from the function Construct_G_W_W0_V

G_W_W0_V

A list of the coefficient and conditional matrics for Gaussian Process(GP) model. It's the output from the function Construct_G_W_W0_V

Value

GaSP_CPD_pred_dist_objective_prior_KF_online returns a list that contains 3 items: (1) the current Kalman filter parameters; (2) the previous Kalman filter parameters and (3) the vector of the logrithm for the current predictive distribution of different run lengths.

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

Fearnhead, P., & Liu, Z. (2007). On-line inference for multiple changepoint problem. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(4), 589-605.

Adams, R. P., & MacKay, D. J. (2007). Bayesian online changepoint detection. arXiv preprint arXiv:0710.3742.

Computing the predictive distribution directly in the online fashion

Description

This function computs directly the predictive distribution of the run length in the online fashion. The direct computation includes the inversion of covariance matrix, which is of computational complexity $O(n^3)$, with $n$ being the number of observations.

Usage

  GaSP_CPD_pred_dist_objective_prior_direct_online(cur_seq, d, gamma, eta, mu, sigma_2)

Arguments

cur_seq

A vector of sequence of observations.

d

A value of the distance between the sorted input.

gamma

A numeric variable of the range parameter for the covariance matrix. The default value of gamma is 1.

eta

A vector of the noise-to-signal ratio at each coordinate

mu

A vector of the mean parameter at each coordinate. Ignored when model_type = 0 or 2.

sigma_2

A vector of the variance parameter at each coordinate.

Value

GaSP_CPD_pred_dist_objective_prior_direct_online returns the log likelihood of observations that follows Gaussian Process with Exponential kernel.

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

Williams, C. K., & Rasmussen, C. E. (2006). Gaussian processes for machine learning (Vol. 2, No. 3, p. 4). Cambridge, MA: MIT press.

matrices and vectors for the inverse covariance in the predictive distribution

Description

This function computes the required values for the inverse covariance matrix.

Usage

Get_Q_K(GG,W,C0,VV)

Arguments

GG

a list of matrices defined in the dynamic linear model.

W

a list of matrices defined in the dynamic linear model.

C0

a matrix defined in the dynamic linear model.

VV

a numerical value for the nugget.

Value

A list of 2 items for Q and K.

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

M. Gu, Y. Xu (2019), fast nonseparable gaussian stochastic process with application to methylation level interpolation. Journal of Computational and Graphical Statistics, In Press, arXiv:1711.11501.

Campagnoli P, Petris G, Petrone S. (2009), Dynamic linear model with R. Springer-Verlag New York.

the natural logarithm of the determinant of the correlation matrix and the estimated sum of squares in the exponent of the profile likelihood

Description

This function computes the natural logarithm of the determinant of the correlation matrix and the estimated sum of squares for computing the profile likelihood.

Usage

Get_log_det_S2_one_dim(param,have_noise,delta_x,output,kernel_type)

Arguments

param

a vector of parameters. The first parameter is the natural logarithm of the inverse range parameter in the kernel function. If the data contain noise, the second parameter is the logarithm of the nugget-variance ratio parameter.

have_noise

a bool value. If it is true, it means the model contains a noise.

delta_x

a vector with dimension (num_obs-1) x 1 for the differences between the sorted input locations.

output

a vector with dimension num_obs x 1 for the observations at the sorted input locations.

kernel_type

A character specifying the type of kernel.

Value

A list where the first value is the natural logarithm of the determinant of the correlation matrix and the second value is the estimated sum of squares.

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

M. Gu, Y. Xu (2017), Nonseparable Gaussian stochastic process: a unified view and computational strategy, arXiv:1711.11501.

M. Gu, X. Wang and J.O. Berger (2018), Robust Gaussian Stochastic Process Emulation, Annals of Statistics, 46, 3038-3066.

Getting inital Kalman filter parameters

Description

Initialize the Kalman filter parameters for Gaussian Process model with Matern 2.5 or power exponential kernels.

Usage

  KF_ini(cur_input, d, gamma, eta, kernel_type, G_W_W0_V)

Arguments

cur_input

A value of current observation.

d

A value of the distance between the sorted input.

gamma

A value of the range parameter for the covariance matrix.

eta

The noise-to-signal ratio.

kernel_type

A character specifying the type of kernels of the input. matern_5_2 are Matern correlation with roughness parameter 5/2. exp is power exponential correlation with roughness parameter alpha=2.

G_W_W0_V

A list of the coefficient and conditional matrics for Gaussian Process(GP) model. It's the output from the function Construct_G_W_W0_V

Value

KF_ini returns a list of kalman filter parameters.

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

Fearnhead, P., & Liu, Z. (2007). On-line inference for multiple changepoint problem. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(4), 589-605.

Adams, R. P., & MacKay, D. J. (2007). Bayesian online changepoint detection. arXiv preprint arXiv:0710.3742.

Getting inital Kalman filter parameters for different observation sequences

Description

Initialize the Kalman filter parameters for Gaussian Process model with Matern 2.5 or power exponential kernels with different observation sequences.

Usage

  KF_ini_for_profile_like(cur_input, d, gamma, eta, kernel_type, G_W_W0_V)

Arguments

cur_input

A value of current observation.

d

A value of the distance between the sorted input.

gamma

A value of the range parameter for the covariance matrix.

eta

The noise-to-signal ratio.

kernel_type

A character specifying the type of kernels of the input. matern_5_2 are Matern correlation with roughness parameter 5/2. exp is power exponential correlation with roughness parameter alpha=2.

G_W_W0_V

A list of the coefficient and conditional matrics for Gaussian Process(GP) model. It's the output from the function Construct_G_W_W0_V

Value

KF_ini_for_profile_like returns a list of kalman filter parameters with different observation sequences.

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

Fearnhead, P., & Liu, Z. (2007). On-line inference for multiple changepoint problem. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(4), 589-605.

Adams, R. P., & MacKay, D. J. (2007). Bayesian online changepoint detection. arXiv preprint arXiv:0710.3742.

Updating Kalman filter parameters

Description

Updating the Kalman filter parameters for Gaussian Process model with Matern 2.5 or power exponential kernels with different observation sequences.

Usage

  KF_param_update_for_profile_like(cur_input, cur_num_obs,
  prev_param, d, gamma, eta, kernel_type, G_W_W0_V)

Arguments

cur_input

A value of current observation.

cur_num_obs

A value of index for the current observation.

prev_param

A list of previous Kalman filter parameters.

d

A value of the distance between the sorted input.

gamma

A value of the range parameter for the covariance matrix.

eta

The noise-to-signal ratio.

kernel_type

A character specifying the type of kernels of the input. matern_5_2 are Matern correlation with roughness parameter 5/2. exp is power exponential correlation with roughness parameter alpha=2.

G_W_W0_V

A list of the coefficient and conditional matrics for Gaussian Process(GP) model. It's the output from the function Construct_G_W_W0_V

Value

KF_param_update_for_profile_like returns a list of updated kalman filter parameters with different observation sequences.

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

Fearnhead, P., & Liu, Z. (2007). On-line inference for multiple changepoint problem. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(4), 589-605.

Adams, R. P., & MacKay, D. J. (2007). Bayesian online changepoint detection. arXiv preprint arXiv:0710.3742.

Getting the results of the SKFCPD model

Description

Estimating changepoint locations using the Dynamic Linear Model (DLM) within the Bayesian Online Changepoint Detection (BOCPD) framework. The efficient computation is achieved through implementation of the Kalman filter. The range parameter and noise-to-signal ratio are estimated from training samples via a Gaussian process model. This function is capable of handling multidimensional data with temporal correlations and random missing patterns.

Usage

  SKFCPD(design = NULL, response = NULL, FCPD = NULL, 
  init_params = list(gamma = 1, sigma_2 = 1, eta = 1), 
  train_prop = NULL, kernel_type = "matern_5_2", 
  hazard_vec=100, print_info = TRUE, truncate_at_prev_cp = FALSE)

Arguments

design

A vector with the length of n. The design of the experiment.

response

A matrix with dimension n x q. The observations.

FCPD

An object of the class SKFCPD computed in the previous run of the algorithm.

init_params

A list with estimated range parameter gamma, noise-to-signal parameter eta and variance parameter sigma_2. The default values are gamma=1, eta=1, and sigma_2=1.

train_prop

A numerical value between 0 and 1. The propotation of training samples for parameter estimation. When train_prop=NULL, we skip the training process and specify the parameter values in the argument init_params.

kernel_type

A character specifying the type of kernels of the input. matern_5_2 are Matern correlation with roughness parameter 5/2. exp is power exponential correlation with roughness parameter alpha=2. The default choice is matern_5_2.

hazard_vec

Either a constant or a vector with the length of n. The hazard vector in the SKFCPD method. hazard_vec = 1/hazard_const is the prior probability that a changepoint occur at any time points. The default value of hazard_vec is 100.

print_info

This setting prints out updates on the progress of the algorithm if set to TRUE.

truncate_at_prev_cp

If TRUE, truncate the run length at the most recently detected changepoint. The default value of truncate_at_prev_cp is FALSE.

Value

SKFCPD returns a S4 object of class SKFCPD (see SKFCPD-class).

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

Li, Hanmo, Yuedong Wang, and Mengyang Gu. Sequential Kalman filter for fast online changepoint detection in longitudinal health records. arXiv preprint arXiv:2310.18611 (2023).

Fearnhead, Paul, and Zhen Liu. On-line inference for multiple changepoint problems. Journal of the Royal Statistical Society Series B: Statistical Methodology 69, no. 4 (2007): 589-605.

Adams, Ryan Prescott, and David JC MacKay. Bayesian online changepoint detection. arXiv preprint arXiv:0710.3742 (2007).

Examples

  library(SKFCPD)
  
  #------------------------------------------------------------------------------
  # Example: fast online changepoint detection with DEPENDENT data.
  # 
  # Data generation: Data follows a multidimensional Gaussian process with Matern 2.5 kernel.
  #------------------------------------------------------------------------------
  # Data Generation
  set.seed(1)
  
  n_obs = 150
  n_dim = 2
  seg_len = c(70, 30, 20,30)
  mean_each_seg = c(0,1,-1,0)
  
  x_mat=matrix(1:n_obs)
  y_mat=matrix(NA, nrow=n_obs, ncol=n_dim)
  
  gamma = rep(5, n_dim) # range parameter of the covariance matrix
  
  # compute the matern 2.5 kernel
  construct_cor_matrix = function(input, gamma){
    n = length(input)
    R0=abs(outer(input,(input),'-'))
    matrix_one = matrix(1, n, n)
    const = sqrt(5) * R0 / gamma
    Sigma = (matrix_one + const + const^2/3) * (exp(-const))
    return(Sigma)
  }
  
  for(j in 1:n_dim){
    y_each_dim = c()
    for(i in 1:length(seg_len)){
      nobs_per_seg = seg_len[i]
      Sigma = construct_cor_matrix(1:nobs_per_seg, gamma[j])
      L=t(chol(Sigma))
      theta=rep(mean_each_seg[i],nobs_per_seg)+L%*%rnorm(nobs_per_seg)
      y_each_dim = c(y_each_dim, theta+0.1*rnorm(nobs_per_seg))
    }
    y_mat[,j] = y_each_dim
  }
  
  ## Detect changepoints by SKFCPD
  Online_CPD_1 = SKFCPD(design = x_mat,
                        response = y_mat,
                        train_prop = 1/3)
  
  ## visulize the results
  plot_SKFCPD(Online_CPD_1)

Class `"SKFCPD"`

Description

S4 class for SKFCPD where the range parameter and noise-to-signal parameters are estimated from the training samples.

Objects from the Class

Objects of this class are created and initialized with the function SKFCPD that computes the calculations needed for setting up the analysis.

Slots

design:: Object of class "matrix" with dimension n x p. The design of the experiment.
response:: Object of class "matrix" with dimension n x q. The observations.
test_start:: Object of class "numeric". The starting index of test period.
kernel_type:: Object of class "character" to specify the type of kernel to use.
gamma:: Object of class "vector" with dimension q x 1. The range parameters.
eta:: Object of class "vector" with dimension q x 1. The noise-to-signal ratio.
sigma_2:: Object of class "vector" with dimension q x 1. The variance parameters.
hazard_vec:: Object of class "numeric". The n x 1 hazard vector in the FastCPD method.
KF_params_list:: Object of class "list". The list of Kalman filter parameters from the previous run of the algorithm.
prev_L_params_list:: Object of class "list". The list of parameters for calculating the quadratic form of the inverse covariance matrix from the previous run of the algorithm.
run_length_posterior_mat:: Object of class "matrix" with dimension n x n. The posterior distribution of the run length.
run_length_joint_mat:: Object of class "matrix" with dimension n x n. The joint distribution of the run length and the observations.
log_pred_dist_mat:: Object of class "matrix" with dimension n x n. The logrithm of the predictive distribution of observations.
cp:: Object of class "vector" with length m. The location of estimated changepoints.

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

Li, Hanmo, Yuedong Wang, and Mengyang Gu. Sequential Kalman filter for fast online changepoint detection in longitudinal health records. arXiv preprint arXiv:2310.18611 (2023).

Fearnhead, Paul, and Zhen Liu. On-line inference for multiple changepoint problems. Journal of the Royal Statistical Society Series B: Statistical Methodology 69, no. 4 (2007): 589-605.

Adams, Ryan Prescott, and David JC MacKay. Bayesian online changepoint detection. arXiv preprint arXiv:0710.3742 (2007).

Natural logarithm of profile likelihood by the fast computing algorithm

Description

This function computes the natural logarithm of the profile likelihood for the range and nugget parameter after plugging the closed form maximum likelihood estimator for the variance parameter.

Usage

compute_log_lik(param, design, response, kernel_type)

Arguments

param

design

A matrix with dimension n x p. The design of the experiment.

response

A matrix with dimension n x q. The observations.

kernel_type

Value

The numerical value of natural logarithm of the profile likelihood.

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

M. Gu, Y. Xu (2017), Nonseparable Gaussian stochastic process: a unified view and computational strategy, arXiv:1711.11501.

M. Gu, X. Wang and J.O. Berger (2018), Robust Gaussian Stochastic Process Emulation, Annals of Statistics, 46, 3038-3066.

Updating Kalman filter parameters

Description

Updating the Kalman filter parameters for Gaussian Process model with Matern 2.5 or power exponential kernels.

Usage

  get_LY_online(cur_input, prev_param, eta, G_W_W0_V)

Arguments

cur_input

A value of current observation.

prev_param

A list of previous Kalman filter parameters.

eta

The noise-to-signal ratio.

G_W_W0_V

A list of the coefficient and conditional matrics for Gaussian Process(GP) model. It's the output from the function Construct_G_W_W0_V

Value

get_LY_online returns a list of updated kalman filter parameters.

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

Fearnhead, P., & Liu, Z. (2007). On-line inference for multiple changepoint problem. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(4), 589-605.

Adams, R. P., & MacKay, D. J. (2007). Bayesian online changepoint detection. arXiv preprint arXiv:0710.3742.

Caculate the mean and variance parameter through fast computation

Description

This function computes the estimtation of the mean and variance parameter through Kalamn filters for fast computations.

Usage

get_mu_sigma_hat(param, design, response, kernel_type)

Arguments

param

design

A matrix with dimension n x p. The design of the experiment.

response

A matrix with dimension n x q. The observations.

kernel_type

Value

A list with the estimtation of the mean and variance parameter.

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

M. Gu, Y. Xu (2017), Nonseparable Gaussian stochastic process: a unified view and computational strategy, arXiv:1711.11501.

M. Gu, X. Wang and J.O. Berger (2018), Robust Gaussian Stochastic Process Emulation, Annals of Statistics, 46, 3038-3066.

Updating the predictive distribution

Description

Updating the predictive distribution of the run length under the objective prior.

Usage

  get_predictive_dist_KF_objective_prior(cur_input, cur_num_obs,
  params, prev_L, d, gamma, model_type, mu, sigma_2, eta, kernel_type)

Arguments

cur_input

A value of current observation.

cur_num_obs

A value of index for the current observation.

params

A list of current Kalman filter parameters.

prev_L

A list of previous Kalman filter parameters.

d

A value of the distance between the sorted input.

gamma

A numeric variable of the range parameter for the covariance matrix.

model_type

mu

A vector of the mean parameter at each coordinate. Ignored when model_type = 0 or 2.

sigma_2

A vector of the variance parameter at each coordinate.

eta

A vector of the noise-to-signal ratio at each coordinate

kernel_type

A character specifying the type of kernels of the input. matern_5_2 are Matern correlation with roughness parameter 5/2. exp is power exponential correlation with roughness parameter alpha=2.

Value

get_predictive_dist_KF_objective_prior returns a list of updated predictive distribution of the run length under the objective prior.

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

Fearnhead, P., & Liu, Z. (2007). On-line inference for multiple changepoint problem. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(4), 589-605.

Adams, R. P., & MacKay, D. J. (2007). Bayesian online changepoint detection. arXiv preprint arXiv:0710.3742.

Updating the predictive distribution

Description

Updating the predictive distribution of the run length under the objective prior directly.

Usage

  get_predictive_dist_direct_objective_prior(cur_input_seq, d, gamma, mu, sigma_2, eta)

Arguments

cur_input_seq

A vector of sequence of observations.

d

A value of the distance between the sorted input.

gamma

A numeric variable of the range parameter for the covariance matrix. The default value of gamma is 1.

eta

A vector of the noise-to-signal ratio at each coordinate

mu

A vector of the mean parameter at each coordinate. Ignored when model_type = 0 or 2.

sigma_2

A vector of the variance parameter at each coordinate.

Value

get_predictive_dist_direct_objective_prior returns the log likelihood of observations that follows Gaussian Process with Exponential kernel.

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

Williams, C. K., & Rasmussen, C. E. (2006). Gaussian processes for machine learning (Vol. 2, No. 3, p. 4). Cambridge, MA: MIT press.

Plot for SKFCPD model

Description

Function to make plots on SKFCPD models after the SKFCPD model has been constructed.

Usage

plot_SKFCPD(x, type = "cp")

Arguments

x

an object of class SKFCPD.

type

A character specifying the type of plot. cp plots the data with estimated changepoints marked in red crossings. run_length_posterior plots the matrix of run length posterior distribution.

Value

Two plots: (1) plot of data with the red dashed lines mark the estimated changepoint locations, and (2) plot of the run length posterior distribution matrix. For multidimensional data, only the first dimension is plotted.

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

References

Li, Hanmo, Yuedong Wang, and Mengyang Gu. Sequential Kalman filter for fast online changepoint detection in longitudinal health records. arXiv preprint arXiv:2310.18611 (2023).

Examples

  library(SKFCPD)
  
  #------------------------------------------------------------------------------
  # Example: fast online changepoint detection with DEPENDENT data.
  # 
  # Data generation: Data follows a multidimensional Gaussian process with Matern 2.5 kernel.
  #------------------------------------------------------------------------------
  # Data Generation
  set.seed(1)
  
  n_obs = 150
  n_dim = 2
  seg_len = c(70, 30, 20,30)
  mean_each_seg = c(0,1,-1,0)
  
  x_mat=matrix(1:n_obs)
  y_mat=matrix(NA, nrow=n_obs, ncol=n_dim)
  
  gamma = rep(5, n_dim) # range parameter of the covariance matrix
  
  # compute the matern 2.5 kernel
  construct_cor_matrix = function(input, gamma){
    n = length(input)
    R0=abs(outer(input,(input),'-'))
    matrix_one = matrix(1, n, n)
    const = sqrt(5) * R0 / gamma
    Sigma = (matrix_one + const + const^2/3) * (exp(-const))
    return(Sigma)
  }
  
  for(j in 1:n_dim){
    y_each_dim = c()
    for(i in 1:length(seg_len)){
      nobs_per_seg = seg_len[i]
      Sigma = construct_cor_matrix(1:nobs_per_seg, gamma[j])
      L=t(chol(Sigma))
      theta=rep(mean_each_seg[i],nobs_per_seg)+L%*%rnorm(nobs_per_seg)
      y_each_dim = c(y_each_dim, theta+0.1*rnorm(nobs_per_seg))
    }
    y_mat[,j] = y_each_dim
  }
  
  ## Detect changepoints by SKFCPD
  Online_CPD_1 = SKFCPD(design = x_mat,
                        response = y_mat,
                        train_prop = 1/3)
  
  ## visulize the results
  plot_SKFCPD(Online_CPD_1)

Square root of product for every elements in a vector

Description

This function computes Square root of product for every elements in a vector.

Usage

productPowerMinusHalf(vec)

Arguments

vec

A input vector.

Value

The square root of product for every elements in a vector

Author(s)

Hanmo Li [aut, cre], Yuedong Wang [aut], Mengyang Gu [aut]

Maintainer: Hanmo Li <hanmo@pstat.ucsb.edu>

Dynamic Linear Model for Online Changepoint Detection

Description

Details

Author(s)

References

See Also

Examples

Setting up the CPD_DLM model

Description

Usage

Arguments

Value

Author(s)

References

Generating coefficient and conditional matrics

Description

Usage

Arguments

Value

Author(s)

References

The coefficient matrix in the dynamic linear model when kernel is the exponential covariance

Description

Usage

Arguments

Value

Author(s)

References

The coefficient matrix in the dynamic linear model when kernel is the Matern covariance with roughness parameter 2.5.

Description

Usage

Arguments

Value

Author(s)

References

The coefficient matrix in the dynamic linear model when kernel is the Matern covariance with roughness parameter 2.5.

Description

Usage

Arguments

Value

Author(s)

References

covariance of the stationary distribution of the state when kernel is the exponential covariance.

Description

Usage

Arguments

Value

Author(s)

References

covariance of the stationary distribution of the state when kernel is the Matern covariance with roughness parameter 2.5.

Description

Usage

Arguments

Value

Author(s)

References

The conditional covariance matrix of the state in the dynamic linear model when kernel is the exponential covariance

Description

Usage

Arguments

Value

Author(s)

References

The conditional covariance matrix for matern covariance with roughness parameter 2.5

Description

Usage

Arguments

Value

Author(s)

References

The conditional covariance matrix for matern covariance with roughness parameter 2.5

Description

Usage

Arguments

Value

Author(s)

References

Estimate parameters from fast computation of GaSP model

Description

Usage