Harmony is an algorithm for performing integration of single cell genomics datasets. Please check out our latest manuscript on Nature Methods.
Install Harmony from CRAN with standard commands.
Once Harmony is installed, load it up!
The example below follows Figure 2 in the manuscript.
We downloaded 3 cell line datasets from the 10X website. The first two (jurkat and 293t) come from pure cell lines while the half dataset is a 50:50 mixture of Jurkat and HEK293T cells. We inferred cell type with the canonical marker XIST, since the two cell lines come from 1 male and 1 female donor.
We library normalized the cells, log transformed the counts, and scaled the genes. Then we performed PCA and kept the top 20 PCs. The PCA embeddings and meta data are available as part of this package.
Initially, the cells cluster by both dataset (left) and cell type (right).
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.## Warning: `group_by_()` was deprecated in dplyr 0.7.0.
## ℹ Please use `group_by()` instead.
## ℹ See vignette('programming') for more help
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.## Warning in geom_label(data = data_labels, label.size = NA, aes_string(label =
## label_name), : Ignoring unknown parameters: `segment.size`## Warning: `group_by_()` was deprecated in dplyr 0.7.0.
## ℹ Please use `group_by()` instead.
## ℹ See vignette('programming') for more help
## Ignoring unknown parameters: `segment.size`
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.Let’s run Harmony to remove the influence of dataset-of-origin from the cell embeddings.
After Harmony, the datasets are now mixed (left) and the cell types are still separate (right).
## Warning: `group_by_()` was deprecated in dplyr 0.7.0.
## ℹ Please use `group_by()` instead.
## ℹ See vignette('programming') for more help
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.## Warning in geom_label(data = data_labels, label.size = NA, aes_string(label =
## label_name), : Ignoring unknown parameters: `segment.size`## Warning: `group_by_()` was deprecated in dplyr 0.7.0.
## ℹ Please use `group_by()` instead.
## ℹ See vignette('programming') for more help
## Ignoring unknown parameters: `segment.size`
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.You can also run Harmony as part of an established pipeline in several packages, such as Seurat and SingleCellExperiment. For these vignettes, please visit our website.
For more details on how each part of Harmony works, consult our more detailed vignette “Detailed Walkthrough of Harmony Algorithm”.
## R version 4.3.2 (2023-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Arch Linux
## 
## Matrix products: default
## BLAS:   /usr/lib/libblas.so.3.11.0 
## LAPACK: /usr/lib/liblapack.so.3.11.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] patchwork_1.1.2         ggrepel_0.9.3           ggthemes_4.2.4         
##  [4] lubridate_1.9.2         forcats_1.0.0           stringr_1.5.0          
##  [7] purrr_1.0.2             readr_2.1.4             tidyr_1.3.0            
## [10] tibble_3.2.1            ggplot2_3.4.3           tidyverse_2.0.0        
## [13] data.table_1.14.8       cowplot_1.1.1           dplyr_1.1.3            
## [16] Seurat_4.9.9.9067       SeuratObject_4.9.9.9091 sp_1.6-0               
## [19] harmony_1.2.0           Rcpp_1.0.11            
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3      jsonlite_1.8.4          magrittr_2.0.3         
##   [4] spatstat.utils_3.0-3    farver_2.1.1            rmarkdown_2.21         
##   [7] zlibbioc_1.46.0         vctrs_0.6.3             ROCR_1.0-11            
##  [10] spatstat.explore_3.1-0  RCurl_1.98-1.12         htmltools_0.5.5        
##  [13] sass_0.4.5              sctransform_0.4.0       parallelly_1.35.0      
##  [16] KernSmooth_2.23-22      bslib_0.4.2             htmlwidgets_1.6.2      
##  [19] ica_1.0-3               plyr_1.8.8              plotly_4.10.1          
##  [22] zoo_1.8-12              cachem_1.0.7            igraph_1.4.2           
##  [25] mime_0.12               lifecycle_1.0.3         pkgconfig_2.0.3        
##  [28] Matrix_1.6-1.1          R6_2.5.1                fastmap_1.1.1          
##  [31] GenomeInfoDbData_1.2.10 fitdistrplus_1.1-11     future_1.32.0          
##  [34] shiny_1.7.4             digest_0.6.31           colorspace_2.1-0       
##  [37] S4Vectors_0.38.1        tensor_1.5              RSpectra_0.16-1        
##  [40] irlba_2.3.5.1           GenomicRanges_1.52.0    labeling_0.4.3         
##  [43] progressr_0.13.0        timechange_0.2.0        fansi_1.0.5            
##  [46] spatstat.sparse_3.0-1   httr_1.4.5              polyclip_1.10-4        
##  [49] abind_1.4-5             compiler_4.3.2          withr_2.5.1            
##  [52] fastDummies_1.7.3       highr_0.10              MASS_7.3-60            
##  [55] tools_4.3.2             lmtest_0.9-40           httpuv_1.6.9           
##  [58] future.apply_1.10.0     goftest_1.2-3           glue_1.6.2             
##  [61] nlme_3.1-163            promises_1.2.0.1        grid_4.3.2             
##  [64] Rtsne_0.16              cluster_2.1.4           reshape2_1.4.4         
##  [67] generics_0.1.3          gtable_0.3.4            spatstat.data_3.0-1    
##  [70] tzdb_0.3.0              hms_1.1.3               utf8_1.2.3             
##  [73] XVector_0.40.0          BiocGenerics_0.46.0     BPCells_0.1.0          
##  [76] spatstat.geom_3.2-1     RcppAnnoy_0.0.20        RANN_2.6.1             
##  [79] pillar_1.9.0            spam_2.9-1              RcppHNSW_0.5.0         
##  [82] later_1.3.0             splines_4.3.2           lattice_0.21-9         
##  [85] survival_3.5-7          deldir_1.0-6            tidyselect_1.2.0       
##  [88] miniUI_0.1.1.1          pbapply_1.7-0           knitr_1.42             
##  [91] gridExtra_2.3           IRanges_2.34.0          scattermore_1.2        
##  [94] RhpcBLASctl_0.23-42     stats4_4.3.2            xfun_0.39              
##  [97] matrixStats_1.0.0       stringi_1.7.12          lazyeval_0.2.2         
## [100] yaml_2.3.7              evaluate_0.20           codetools_0.2-19       
## [103] cli_3.6.1               uwot_0.1.14             xtable_1.8-4           
## [106] reticulate_1.28         munsell_0.5.0           jquerylib_0.1.4        
## [109] GenomeInfoDb_1.36.0     globals_0.16.2          spatstat.random_3.1-4  
## [112] png_0.1-8               parallel_4.3.2          ellipsis_0.3.2         
## [115] dotCall64_1.0-2         bitops_1.0-7            listenv_0.9.0          
## [118] viridisLite_0.4.2       scales_1.2.1            ggridges_0.5.4         
## [121] crayon_1.5.2            leiden_0.4.3            rlang_1.1.1