NMF and NMTF)In this vignette we consider approximating a non-negative matrix as a product of multiple non-negative low-rank matrices (a.k.a., factor matrices).
Test data is available from toyModel.
library("nnTensor")
X <- toyModel("NMF")You will see that there are five blocks in the data matrix as follows.
image(X, main="Original Data")Here, we consider the approximation of the non-negative data matrix \(X\) (\(N \times M\)) as the matrix product of \(U\) (\(N \times J\)) and \(V\) (\(M \times J\)):
\[ X \approx U V' \ \mathrm{s.t.}\ U \geq 0, V \geq 0 \]
This is known as non-negative matrix factorization (NMF (Lee and Seung 1999; CICHOCK 2009)) and multiplicative update (MU) rule often used to achieve this factorization.
In NMF, the rank parameter \(J\)
(\(\leq \min(N,M)\)) is needed to be
set in advance. Other settings such as the number of MU iterations
(num.iter) or factorization algorithm
(algorithm) are also available. For the details of
arguments of NMF, see ?NMF. After the calculation, various
objects are returned by NMF.
set.seed(123456)
out_NMF <- NMF(X, J=5)
str(out_NMF, 2)## List of 10
##  $ U            : num [1:100, 1:5] 46.4 47.2 47.4 47.7 46.3 ...
##  $ V            : num [1:300, 1:5] 1.97 2.01 2 2.03 2.01 ...
##  $ J            : num 5
##  $ RecError     : Named num [1:101] 1.00e-09 8.35e+03 7.65e+03 7.42e+03 7.21e+03 ...
##   ..- attr(*, "names")= chr [1:101] "offset" "1" "2" "3" ...
##  $ TrainRecError: Named num [1:101] 1.00e-09 8.35e+03 7.65e+03 7.42e+03 7.21e+03 ...
##   ..- attr(*, "names")= chr [1:101] "offset" "1" "2" "3" ...
##  $ TestRecError : Named num [1:101] 1e-09 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 ...
##   ..- attr(*, "names")= chr [1:101] "offset" "1" "2" "3" ...
##  $ RelChange    : Named num [1:101] 1.00e-09 5.43e-01 9.12e-02 3.12e-02 2.95e-02 ...
##   ..- attr(*, "names")= chr [1:101] "offset" "1" "2" "3" ...
##  $ Trial        : NULL
##  $ Runtime      : NULL
##  $ RankMethod   : NULLThe reconstruction error (RecError) and relative error
(RelChange, the amount of change from the reconstruction
error in the previous step) can be used to diagnose whether the
calculation is converging or not.
layout(t(1:2))
plot(log10(out_NMF$RecError[-1]), type="b", main="Reconstruction Error")
plot(log10(out_NMF$RelChange[-1]), type="b", main="Relative Change")The product of \(U\) and \(V\) shows that the original data can be
well-recovered by NMF.
recX <- out_NMF$U %*% t(out_NMF$V)
layout(t(1:2))
image(X, main="Original Data")
image(recX, main="Reconstructed Data (NMF)")NMF requires the rank paramter \(J\)
in advance. However, setting optimal values without prior knowledge and
external measures is a basically difficult problem. In
nnTensor, \(14\) of
evaluation scores (Brunet 2004; Han 2007;
Frigyesi 2008; Park 2019; Shao 2017; Fogel 2013; Kim 2003; Hutchins
2008; Hoyer 2004) to estimate rank parameter have been
implemented in the NMF. If multiple rank parameters are
set, the evaluation score is calculated within that range, and we can
estimate the optimal value from the large or small values and rapidly
changing slopes. For the details, see (Brunet
2004; Han 2007; Frigyesi 2008; Park 2019; Shao 2017; Fogel 2013; Kim
2003; Hutchins 2008; Hoyer 2004).
Note that here we run with a small num.iter to
demonstrate the rank estimation with the minimum computation time. When
users try it on their own data, this option should be removed.
set.seed(123456)
out_NMF2 <- NMF(X, J=1:10, num.iter=1)## Each rank, multiple NMF runs are performed
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |======================================================================| 100%
## Each rank estimation method
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======                                                               |  10%ccc
## dispersion
## rss
## evar
## residuals
## sparseness.basis
## sparseness.coef
## sparseness2.basis
## sparseness2.coef
## norm.info.gain.basis
## norm.info.gain.coef
## singular
## volume
## condition
## 
  |                                                                            
  |==============                                                        |  20%ccc
## dispersion
## rss
## evar
## residuals
## sparseness.basis
## sparseness.coef
## sparseness2.basis
## sparseness2.coef
## norm.info.gain.basis
## norm.info.gain.coef
## singular
## volume
## condition
## 
  |                                                                            
  |=====================                                                 |  30%ccc
## dispersion
## rss
## evar
## residuals
## sparseness.basis
## sparseness.coef
## sparseness2.basis
## sparseness2.coef
## norm.info.gain.basis
## norm.info.gain.coef
## singular
## volume
## condition
## 
  |                                                                            
  |============================                                          |  40%ccc
## dispersion
## rss
## evar
## residuals
## sparseness.basis
## sparseness.coef
## sparseness2.basis
## sparseness2.coef
## norm.info.gain.basis
## norm.info.gain.coef
## singular
## volume
## condition
## 
  |                                                                            
  |===================================                                   |  50%ccc
## dispersion
## rss
## evar
## residuals
## sparseness.basis
## sparseness.coef
## sparseness2.basis
## sparseness2.coef
## norm.info.gain.basis
## norm.info.gain.coef
## singular
## volume
## condition
## 
  |                                                                            
  |==========================================                            |  60%ccc
## dispersion
## rss
## evar
## residuals
## sparseness.basis
## sparseness.coef
## sparseness2.basis
## sparseness2.coef
## norm.info.gain.basis
## norm.info.gain.coef
## singular
## volume
## condition
## 
  |                                                                            
  |=================================================                     |  70%ccc
## dispersion
## rss
## evar
## residuals
## sparseness.basis
## sparseness.coef
## sparseness2.basis
## sparseness2.coef
## norm.info.gain.basis
## norm.info.gain.coef
## singular
## volume
## condition
## 
  |                                                                            
  |========================================================              |  80%ccc
## dispersion
## rss
## evar
## residuals
## sparseness.basis
## sparseness.coef
## sparseness2.basis
## sparseness2.coef
## norm.info.gain.basis
## norm.info.gain.coef
## singular
## volume
## condition
## 
  |                                                                            
  |===============================================================       |  90%ccc
## dispersion
## rss
## evar
## residuals
## sparseness.basis
## sparseness.coef
## sparseness2.basis
## sparseness2.coef
## norm.info.gain.basis
## norm.info.gain.coef
## singular
## volume
## condition
## 
  |                                                                            
  |======================================================================| 100%ccc
## dispersion
## rss
## evar
## residuals
## sparseness.basis
## sparseness.coef
## sparseness2.basis
## sparseness2.coef
## norm.info.gain.basis
## norm.info.gain.coef
## singular
## volume
## conditionstr(out_NMF2, 2)## List of 10
##  $ U            : NULL
##  $ V            : NULL
##  $ J            : int [1:10] 1 2 3 4 5 6 7 8 9 10
##  $ RecError     : NULL
##  $ TrainRecError: NULL
##  $ TestRecError : NULL
##  $ RelChange    : NULL
##  $ Trial        :List of 10
##   ..$ Rank1 :List of 14
##   ..$ Rank2 :List of 14
##   ..$ Rank3 :List of 14
##   ..$ Rank4 :List of 14
##   ..$ Rank5 :List of 14
##   ..$ Rank6 :List of 14
##   ..$ Rank7 :List of 14
##   ..$ Rank8 :List of 14
##   ..$ Rank9 :List of 14
##   ..$ Rank10:List of 14
##  $ Runtime      : num 30
##  $ RankMethod   : chr "all"
##  - attr(*, "class")= chr "NMF"Scores in the data matrix and random matrices are plotted at once. Red and green lines are plotted by the original matrix data and the randomly permutated matrix from the original data matrix, respectively.
plot(out_NMF2)## Warning: Ignoring unknown parameters: linewidthAs a different factorization form from NMF, non-negative tri-factrozation (NMTF (Copar e2019; Long 2005; Ding 2006)), which decomposes a non-negative matrix to three factor matrices, can be considered. NMTF approximates the non-negative data matrix \(X\) (\(N \times M\)) as the matrix product of \(U\) (\(N \times J1\)), \(S\) (\(J1 \times J2\)), and \(V\) (\(M \times J2\)) as follows.
\[ X \approx U S V' \ \mathrm{s.t.}\ U \geq 0, S \geq 0, V \geq 0 \]
As same as NMF, these factor matrices are also optimized by MU rule (Copar e2019; Long 2005; Ding 2006). Note that \(S\) is not necessarily a diagonal matrix and often contains non-diagonal elements with non-zero elements.
Unlike NMF, NMTF sets two rank parameters \(J1\) (\(\leq N\)) and \(J2\) (\(\leq M\)) for \(U\) and \(V\), respectively. These values are separately set in advance. For example, here \(4\) and \(5\) are set as follows.
set.seed(123456)
out_NMTF <- NMTF(X, rank=c(4,5))
str(out_NMTF, 2)## List of 6
##  $ U        : num [1:100, 1:4] 2.66e-18 8.52e-18 1.77e-18 8.38e-19 1.05e-18 ...
##  $ S        : num [1:4, 1:5] 0.7598 0.04967 0.62901 0.00506 1.5182 ...
##  $ V        : num [1:300, 1:5] 1.67e-21 1.93e-20 1.34e-20 1.99e-20 1.20e-20 ...
##  $ rank     : num [1:2] 4 5
##  $ RecError : Named num [1:101] 1.00e-09 7.99e+03 7.41e+03 7.37e+03 7.37e+03 ...
##   ..- attr(*, "names")= chr [1:101] "offset" "1" "2" "3" ...
##  $ RelChange: Named num [1:101] 1.00e-09 5.91e-01 7.87e-02 5.25e-03 6.02e-04 ...
##   ..- attr(*, "names")= chr [1:101] "offset" "1" "2" "3" ...As same as NMF, the values of reconstruction error and relative error indicate that the optimization is converged.
layout(t(1:2))
plot(log10(out_NMTF$RecError[-1]), type="b", main="Reconstruction Error")
plot(log10(out_NMTF$RelChange[-1]), type="b", main="Relative Change")The reconstructed matrix (\(USV'\)) shows that the features of the data matrix are well captured by NMTF.
recX2 <- out_NMTF$U %*% out_NMTF$S %*% t(out_NMTF$V)
layout(t(1:2))
image(X, main="Original Data")
image(recX2, main="Reconstructed Data (NMTF)")## R version 3.6.3 (2020-02-29)
## Platform: x86_64-conda-linux-gnu (64-bit)
## Running under: CentOS Linux 7 (Core)
## 
## Matrix products: default
## BLAS/LAPACK: /home/koki/miniconda3/lib/libopenblasp-r0.3.17.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] nnTensor_1.1.13
## 
## loaded via a namespace (and not attached):
##  [1] spam_2.8-0         tidyselect_1.1.1   xfun_0.29          bslib_0.3.1       
##  [5] purrr_0.3.4        tcltk_3.6.3        colorspace_2.0-3   vctrs_0.3.8       
##  [9] generics_0.1.2     htmltools_0.5.2    viridisLite_0.4.0  rTensor_1.4.8     
## [13] yaml_2.3.5         utf8_1.2.2         rlang_0.4.11       jquerylib_0.1.4   
## [17] pillar_1.7.0       glue_1.4.2         DBI_1.1.2          plot3D_1.4        
## [21] RColorBrewer_1.1-2 lifecycle_1.0.1    stringr_1.4.0      fields_13.3       
## [25] dotCall64_1.0-1    munsell_0.5.0      gtable_0.3.0       evaluate_0.15     
## [29] labeling_0.4.2     misc3d_0.9-1       knitr_1.37         fastmap_1.1.0     
## [33] fansi_1.0.2        highr_0.9          Rcpp_1.0.8         scales_1.1.1      
## [37] jsonlite_1.8.0     farver_2.1.0       gridExtra_2.3      ggplot2_3.3.5     
## [41] digest_0.6.29      stringi_1.7.6      tagcloud_0.6       dplyr_1.0.6       
## [45] grid_3.6.3         tools_3.6.3        magrittr_2.0.2     maps_3.4.0        
## [49] sass_0.4.0         tibble_3.1.2       crayon_1.5.0       pkgconfig_2.0.3   
## [53] ellipsis_0.3.2     MASS_7.3-55        assertthat_0.2.1   rmarkdown_2.11    
## [57] viridis_0.6.2      R6_2.5.1           compiler_3.6.3