Single-Cell Immune Repertoire and Gene Expression Analysis


[Up] [Top]

Documentation for package ‘Platypus’ version 3.3.2

Help Pages

A B C E G H I M N O P S T U V

-- A --

AbForests_AntibodyForest Infer and draw B cell evolutionary networks
AbForests_CompareForests Comparison of distinct B cell repertoires
AbForests_ConvertStructure Extract transcriptome/isotype information and B cell receptor sequences from single cell immune repertoire formatted as list of data.frames
AbForests_CsvToDf Convert list of csvs, to nested list of data.frames
AbForests_ForestMetrics Calculate metrics for networks
AbForests_PlotGraphs Plot igraph and ggplot objects
AbForests_PlyloToMatrix Conversion of phylogenetic tree to distance matrix
AbForests_RemoveNets Filter sub-repertoires with less than N unique sequences or with less than C unique cells
AbForests_SubRepertoiresByCells Split single cell immune repertoire into sub-repertoires by isotype based on number of B cells
AbForests_SubRepertoiresByUniqueSeq Split single cell immune repertoire into sub-repertoires by isotype based on number of unique sequences
AbForests_UniqueAntibodyVariants Count the number of unique antibody variants per clonal lineage
automate_GEX Automates the transcriptional analysis of the gene expression libraries from cellranger. This function will integrate multiple samples

-- B --

Bcell_sequences_example_tree Example csv file 1
Bcell_tree_2 Example csv file 2

-- C --

call_MIXCR Extracts information on the VDJRegion level using MiXCR. This function assumes the user can run an executable instance of MiXCR and is elgible to use MiXCR as determined by license agreements. The VDJRegion corresponds to the recombined heavy and light chain loci starting from framework region 1 (FR1) and extending to frame work region 4 (FR4). This can be useful for extracting full-length sequences ready to clone and further calculating somatic hypermutation occurances.
class_switch_prob_hum class_switch_prob_hum The probability matrix of class switching for human b cells. The row names of the matrix are the isotypes the cell is switching from, the column names are the isotypes the cell is switching to. All B cells start from IGHM, and switch to one of the other isotypes or remain the same.
class_switch_prob_mus class_switch_prob_mus The probability matrix of class switching for mouse b cells. The row names of the matrix are the isotypes the cell is switching from, the column names are the isotypes the cell is switching to. All B cells start from IGHM, and switch to one of the other isotypes or remain the same.
clonofreq Plot clonal frequency barplot of the outout simulated data
clonofreq.isotype.data Get information about the clonotype counts grouped by isotype.
clonofreq.isotype.plot Get information about the clonotype counts grouped by isotype.
clonofreq.trans.data Get information about the clonotype counts grouped by transcriptome state(cell type).
clonofreq.trans.plot Get information about the clonotype counts grouped by transcriptome state(cell type).
cluster.id.igraph Get clone network igraphs colored by seurat cluster id.
colors colors A vector of characters specifying colors used in igraph phylogenetic tree. Default colors: "#66C2A5", "#FC8D62", "#8DA0CB", "#E78AC3" ,"#A6D854"

-- E --

Echidna_simulate_repertoire Simulate immune repertoire and transcriptome data
Echidna_vae_generate Simulate B or T cell receptor sequences by variational autoencodes(VAEs) trained with experimental data.

-- G --

get.avr.mut.data Get information about somatic hypermutation in the simulation. This function return a barplot showing the average mutation.
get.avr.mut.plot Get information about somatic hypermutation in the simulation. This function return a barplot showing the average mutation.
get.barplot.errorbar Return a barplot of mean and standard error bar of certain value of each clone.
get.elbow Get the seurat object from simulated transciptome output.
get.n.node.data Get the number of unique variants in each clone in a vector. The output is the vector representing the numbers of unique variants.
get.n.node.plot Get the number of unique variants in each clone in a vector and the barplot. The first item in the output is the vector representing the numbers of unique variants, the second item is the barplot.
get.seq.distance Computing sequence distance according to the number of unmatched bases.
get.umap Further process the seurat object from simulated transciptome output and make UMAP ready for plotting.
get.vgu.matrix Get paired v gene heavy chain and light chain matrix on clonotype level. A v gene usage pheatmap can be obtain by p<-pheatmap::pheatmap(vgu_matrix,show_colnames= T, main = "V Gene Usage"), where the vgu_matrix is the output of this function.
GEX_automate Automates the transcriptional analysis of the gene expression libraries from cellranger. This function will integrate multiple samples
GEX_clonotype Platypus V2: Integrates VDJ and gene expression libraries by providing cluster membership seq_per_vdj object and the index of the cell in the Seurat RNA-seq object.
GEX_cluster_genes Extracts the differentially expressed genes between two samples. This function uses the FindMarkers function from the Seurat package. Further parameter control can be accomplished by calling the function directly on the output of automate_GEX or VDJ_GEX_matrix
GEX_cluster_genes_heatmap Produces a heatmap displaying the expression of the top genes that define each cluster in the Seurat object. The output heatmap is derived from DoHeatmap from Seurat and thereby can be edited using typical ggplot interactions. The number of genes per cluster and the nunber of cells to display can be specified by the user. Either the log fold change or the p value can be used to select the top n genes.
GEX_cluster_membership Plots the cluster membership for each of the distinct samples in the Seurat object from the automate_GEX function. The distinct samples are determined by "sample_id" field in the Seurat object.
GEX_coexpression_coefficient Returns eiter a plot or numeric data of coexpression levels of selected genes.Coexpression % is calculated as the quotient of double positive cells (counts \> 0) and the sum of total cells positive for either genes.
GEX_DEgenes Extracts the differentially expressed genes between two groups of cells. These groups are defined as cells having either of two entries (group1, group2) in the grouping.column of the input Seurat object metadata This function uses the FindMarkers function from the Seurat package.
GEX_DEgenes_persample !Only for Platypus version v2. For more flexibility and platypus v3 please refer to GEX_Degenes. Extracts the differentially expressed genes between two samples. This function uses the FindMarkers function from the Seurat package. Further parameter control can be accomplished by calling the function directly on the output of automate_GEX and further extracting sample information from the "sample_id" component of the Seurat object.
GEX_dottile_plot Outputs a dotplot for gene expression, where the color of each dot is scaled by the gene expression level and the size is scaled by the % of cells positive for the gene
GEX_GOterm Runs a GO term analysis on a submitted list of genes. Works with the output of GEX_topN_DE_genes_per_cluster or a custom list of genes to obtain GOterms.
GEX_GSEA Conducts a Gene Set Enrichment Analysis (GSEA) on a set of genes submitted in a data frame with a metric each. Works with the output of GEX_genes_cluster or a custom data frame containing the gene symbols either in a column "symbols" or as rownames and a metric for each gene. The name of the column containing the metric has to be declared via the input metric.colname.
GEX_heatmap Produces a heatmap containing gene expression information at the clonotype level. The rows correspond to different genes that can either be determined by pre-made sets of B or T cell markers, or can be customized by the user. The columns correspond to individual cells and the colors correspond to the different clonotype families.
GEX_pairwise_DEGs Produces and saves a list of volcano plots with each showing differentially expressed genes between pairs groups. If e.g. seurat_clusters used as group.by, a plot will be generated for every pairwise comparison of clusters. For large numbers of this may take longer to run. Only available for platypus v3
GEX_phenotype Integrates VDJ and gene expression libraries by providing cluster membership seq_per_vdj object and the index of the cell in the Seurat RNA-seq object.
GEX_phenotype_per_clone Integrates VDJ and gene expression libraries by providing cluster membership seq_per_vdj object and the index of the cell in the Seurat RNA-seq object. ! For platypus.version == "v3" and VDJ_GEX_matrix output the function will iterate over entries in the sample_id column of the GEX by default.
GEX_proportions_barplot Plots proportions of a group of cells within a secondary group of cells. E.g. The proportions of samples in seurat clusters, or the proportions of samples in defined cell subtypes
GEX_scatter_coexpression Clonal frequency plot displaying clonal expansion for either T and B cells with Platypus v3 input.
GEX_topN_DE_genes_per_cluster Organizes the top N genes that define each Seurat cluster and converts them into a single dataframe. This can be useful for obtaining insight into cluster-specific phenotypes.
GEX_visualize_clones !Only for platypus version v2. For platypus v3 refer to: VDJ_GEX_overlay_clones() Visualize selected clonotypes on the tSNE or UMAP projection.
GEX_volcano Plots a volcano plot from the output of the FindMarkers function from the Seurat package or the GEX_cluster_genes function alternatively.

-- H --

hotspot_df hotspot_df Hotspot mutations taken from Yaari et al., Frontiers in Immunology, 2013. This contains transition probabilities for all 5mer combinations based on high throughput sequencing data. The transition probabilities are for the middle nucleotide in each 5mer set. This can be customized by changing the genes and sequences. Custom mutation hotspots can be supplied by modifying this dataframe. Repeating particular hotspot entries allows for the hotspot to mutate more than one time per SHM event.
hum_b_h hum_b_h
hum_b_l hum_b_l
hum_t_h hum_t_h
hum_t_l hum_t_l

-- I --

iso_SHM_prob iso_SHM_prob A probability dataframe specifying SHM.nuc.prob for cells of different isotypes. The first column is the names of isotypes, while the second column is the SHM.nuc.prob of cell of that isotype. user can define different SHM.nuc.prob for isotypes.

-- M --

mus_b_h mus_b_h
mus_b_l mus_b_l
mus_b_trans mus_b_trans A data frame contains mouse B cell average gene expression for multiple cell types, with the rows representing the gene names, column names representing the cell type names. The original single cell sequencing data is retrieved from 10xgenomics and combined with experimental data from.#? The expression level for different cell types are obtained by calculating the average expression after sorting the original data by markers as shown below. NaiveBcell Cd19+;Cd27-;Cd38- GerminalcenterBcell Fas+;Cd19+ Plasmacell Sdc1+ MemoryBcell Cd38+;Fas-
mus_t_h mus_t_h
mus_t_l mus_t_l

-- N --

no.empty.node Get clone network igraphs without empty mode. Empty node represents the 'extincted' sequences, that are not in any living cell but once existed.

-- O --

one_spot_df one_spot_df

-- P --

pheno_SHM_prob pheno_SHM_prob A probability dataframe specifying SHM.nuc.prob for cells of different phenotypes. The first column is the names of phenotypes, while the second column is the SHM.nuc.prob of cell of that phenotype. user can define different SHM.nuc.prob for phenotypes.
PlatypusDB_AIRR_to_VGM Loads in and converts input AIRR-compatible tsv file(s) into the Platypus VGM object format.
PlatypusDB_fetch Loads and saves RData objects from the PlatypusDB
PlatypusDB_find_CDR3s Queries for the occurrence of CDR3 sequences in public datasets on PlatypusDB.
PlatypusDB_list_projects Lists metadata tables of available projects on PlatypusDB
PlatypusDB_load_from_disk Utility function for loading in local dataset as VDJ_GEX_matrix and PlatypusDB compatible R objects. Especially useful when wanting to integrate local and public datasets. This function only imports and does not make changes to format, row and column names. Exception: filtered_contig.fasta are appended to the filtered_contig_annotations.csv as a column for easy access
PlatypusDB_VGM_to_AIRR Exports AIRR compatible tables supplemented with VDJ and GEX information from the Platypus VGM object and the cellranger output airr_rearrangements.tsv

-- S --

select.top.clone Get the index of top ranking clones.
small_vgm Small VDJ GEX matrix (VGM) for function testing purposes
special_v special_v a dataframe, of heavy and light chain v gene combination and their probability to be selected for expansion.

-- T --

trans_switch_prob_b trans_switch_prob_b The probability for B cell transcriptome states switching. The row names of the matrix are the cell states the cell is switching from, the column names are the cells states the cell is switching to.
trans_switch_prob_t trans_switch_prob_t The probability for T cell transcriptome states switching. The row names of the matrix are the cell states the cell is switching from, the column names are the cells states the cell is switching to.

-- U --

umap.top.highlight Set idents for top abundant clones in Seurat object, get ready for highlight the top abundant clones in UMAP.

-- V --

VDJ_abundances Calculate abundances/counts of specific features for a VDJ dataframe
VDJ_alpha_beta_Vgene_circos Produces a Circos plot from the VDJ_analyze output. Connects the V-alpha with the corresponding V-beta gene for each clonotype.
VDJ_analyze Platypus V2 Processes and organizes the repertoire sequening data from cellranger vdj and returns a list of dataframes, where each dataframe corresponds to an individual repertoire. The function will return split CDR3 sequences, germline gene information, filter out those clones with either incomplete information or doublets (multiple CDR3 sequences for a given chain). This function should be called once for desired integrated repertoire and transcriptome. For example, if there are 3 VDJ libraries and 3 GEX libraries and the goal is to analyze all three GEX libraries together (e.g. one UMAP/tSNE reduction) this then function should be called one time and the three VDJ directories should be provided as input to the single function call.
VDJ_antigen_integrate Integrates antigen-specific information into the VDJ/VDJ.GEX.matrix[[1]] object
VDJ_assemble_for_PnP Assembles sequences from MIXCR output into inserts for expression in PnP cells. For detailes check https://doi.org/10.1038/ncomms12535 ! ALWAYS VALIDATE INDIVIDUAL SEQUENCE IN GENEIOUS OR OTHER SOFTWARE BEFORE ORDERING SEQUENCES FOR EXPRESSION ! Check notes on column content below ! Only cells with 1 VDJ and 1 VJ sequence are considered. Warnings are issued if sequences do not pass necessary checks
VDJ_call_MIXCR Extracts information on the VDJRegion level using MiXCR on WINDOWS, MAC and UNIX systems for input from both Platypus v2 (VDJ.per.clone) or v3 (Output of VDJ_GEX_matrix) This function assumes the user can run an executable instance of MiXCR and is elgible to use MiXCR as determined by license agreements. ! FOR WINDOWS USERS THE EXECUTABLE MIXCR.JAR HAS TO PRESENT IN THE CURRENT WORKING DIRECTORY ! The VDJRegion corresponds to the recombined heavy and light chain loci starting from framework region 1 (FR1) and extending to frame work region 4 (FR4). This can be useful for extracting full-length sequences ready to clone and further calculating somatic hypermutation occurrences.
VDJ_call_recon Calls the Kaplinsky/RECON tool
VDJ_circos Plots a Circos diagram from an adjacency matrix. Uses the Circlize chordDiagram function. Is called by VDJ_clonotype_clusters_circos(), VDJ_alpha_beta_Vgene_circos() and VDJ_VJ_usage_circos() functions or works on its own when supplied with an adjacency matrix.
VDJ_clonal_donut Generate circular plots of clonal expansion per repertoire directly from the VDJ matrix of the VDJ_GEX_matrix function
VDJ_clonal_expansion Clonal frequency plot displaying clonal expansion for either T and B cells with Platypus v3 input. Only available for Platypus "v3" available. For v2 plotting of B cell clonotype expansion and isotypes please refer to VDJ_isotypes_per_clone.
VDJ_clonal_expansion_abundances Wrapper function for VDJ_abundances to obtain ranked clonotype barplots
VDJ_clonal_lineages Only Platypus V2 Organizes and extracts full-length sequences for clonal lineage inference. The output sequence can either contain the germline sequence as determined by cellranger or can just contain the sequences contained in each clonal family.
VDJ_clonotype Deprecated function for Platypus V2 with options for Platypus V3. For revised hierarchical clonotyping please use VDJ_clonotype_v3() Returns a list of clonotype dataframes following additional clonotyping. This function works best following filtering to ensure that each clone only has one heavy chain and one light chain.
VDJ_clonotype_clusters_circos Makes a Circos plot from the VDJ_GEX_integrate output. Connects the clonotypes with the corresponding clusters.
VDJ_clonotype_v3 Updated clonotyping function based on implications for cells with different chain numbers than 1 VDJ 1 VJ chains.
VDJ_contigs_to_vgm Formats "VDJ_contigs_annotations.csv" files from cell ranger to match the VDJ_GEX_matrix output using only cells with 1VDJ and 1VJ chain
VDJ_db_annotate Wrapper function of VDJ_antigen_integrate function
VDJ_db_load Load and preprocess a list of antigen-specific databases
VDJ_diversity Calculates and plots common diversity and overlap measures for repertoires and alike. Require the vegan package
VDJ_dublets Only Platypus v2 Produces a matrix indicating either the number of cells or clones which contain multiple heavy or light chains (or alpha/beta in the case of T cells).
VDJ_dynamics Tracks a specific VDJ column across multiple samples/timepoints.
VDJ_expand_aberrants Expand the aberrant cells in a VDJ dataframe by converting them into additional rows
VDJ_extract_germline Only Platypus v2 Extracts the full-length germline sequence as determined by cellranger. This function returns an object that now contains the reference germline for each of the clones. If multiple clones (as determined by cellranger) have been merged using the VDJ_clonotype function then these sequences may have distinct germline sequences despite being in the same clonal family (nested list). This is particularly possible when homology thresholds were used to determine the clonotypes.
VDJ_get_public Function to get shared/public elements across multiple repertoires
VDJ_GEX_clonal_lineage_clusters only Platypus v2 Integrates the transcriptional cluster information into the clonal lineages. This requires that automate_GEX, VDJ_clonal_lineages, and VDJ_GEX_integrate have already been ran. The transcriptional cluster will be added to the end of the Name for each sequence.
VDJ_GEX_expansion only Platypus v2 Integrates VDJ and gene expression libraries by providing cluster membership seq_per_vdj object. Output will plot which transcriptional cluster (GEX) that the cells of a given clonotype are found in.
VDJ_GEX_integrate only Platypus v2 Integrates VDJ and gene expression libraries by providing cluster membership seq_per_vdj object and the index of the cell in the Seurat RNA-seq object.
VDJ_GEX_matrix Processes both raw VDJ and GEX Cellranger output to compile a single cell level table containing all available information for each cell. If using Feature Barcodes please note the [FB] paragraph in the description and all "FB." parameters
VDJ_GEX_overlay_clones Highlights the cells belonging to any number of top clonotypes or of specifically selected clonotypes from one or more samples or groups in a GEX dimensional reduction.
VDJ_GEX_stats Gives stats on number and quality of reads.
VDJ_isotypes_per_clone Only for Platypus v2 Clonal frequency plot displaying the isotype usage of each clone. ! For platypus v3 use VDJ_clonal_expansion
vdj_length_prob vdj_length_prob A list dataframe specifying lengths and probabilities of bases deleted or inserted at each junction site of VDJ recombination event. v3_deletion length and probability of deleted bases at 3' end of V segment d5_deletion length and probability of deleted bases at 5' end of D segment d3_deletion length and probability of deleted bases at 3' end of D segment j5_deletion length and probability of deleted bases at 5' end of J segment dj_insertion length and probability of inserted bases between D-J segment vj_insertion length and probability of inserted bases between V-J segment for light or alpha chains
VDJ_logoplot_vector Plots a logoplot of the CDR3 aminoacid region
VDJ_network Creates a similarity network where clones with similar CDR3s are connected.
VDJ_overlap_heatmap Yields overlap heatmap and datatable of features or combined features for different samples or groups
VDJ_per_clone VDJ_per_clone
VDJ_phylogenetic_trees Creates phylogenetic trees from a VDJ dataframe
VDJ_phylogenetic_trees_plot Function to plot phylogenetic trees obtained from VDJ_phylogenetic_trees
VDJ_plot_SHM Plots for SHM based on MIXCR output generated using the VDJ_call_MIXCR function and appended to the VDJ.GEX.matrix.output
VDJ_reclonotype_list_arrange Only Platypus v2 Organizes the top N genes that define each Seurat cluster and converts them into a single dataframe. This can be useful for obtaining insight into cluster-specific phenotypes.
VDJ_tree only Platypus v2 Produces neighbor joining phylogenetic trees from the output of VDJ_clonal_lineages
VDJ_variants_per_clone Returns statistics and plots to examine diversity of any sequence or metadata item within clones on a by sample level or global level
VDJ_Vgene_usage Produces a matrix counting the number of occurences for each VDJ and VJ Vgene combinations for each list enty in VDJ.clonotype.output or for each sample_id in VDJ.matrix
VDJ_Vgene_usage_barplot Produces a barplot with the most frequently used IgH and IgK/L Vgenes.
VDJ_Vgene_usage_stacked_barplot Produces a stacked barplot with the fraction of the most frequently used IgH and IgK/L Vgenes. This function can be used in combination with the VDJ_Vgene_usage_barplot to vizualize V gene usage per sample and among samples.
VDJ_VJ_usage_circos Makes a Circos plot from the VDJ_analyze output. Connects the V gene with the corresponding J gene for each clonotype.
VGM_expand_featurebarcodes Replaces the original sample_id column of a vgm object with a pasted version of the original sample_id and the last digits of the feature barcode.