Type: | Package |
Title: | Genetic Linkage Maps in Autopolyploids |
Version: | 0.4.1 |
Maintainer: | Marcelo Mollinari <mmollin@ncsu.edu> |
Description: | Construction of genetic maps in autopolyploid full-sib populations. Uses pairwise recombination fraction estimation as the first source of information to sequentially position allelic variants in specific homologous chromosomes. For situations where pairwise analysis has limited power, the algorithm relies on the multilocus likelihood obtained through a hidden Markov model (HMM). For more detail, please see Mollinari and Garcia (2019) <doi:10.1534/g3.119.400378> and Mollinari et al. (2020) <doi:10.1534/g3.119.400620>. |
License: | GPL-3 |
LazyData: | TRUE |
LazyDataCompression: | xz |
Depends: | R (≥ 4.0.0) |
Imports: | Rcpp (≥ 0.12.6), RcppParallel, RCurl, fields, ggpubr, ggsci, rstudioapi, plot3D, dplyr, crayon, cli, magrittr, reshape2, ggplot2, smacof, princurve, dendextend, vcfR, zoo, plotly |
LinkingTo: | Rcpp, RcppParallel |
RoxygenNote: | 7.3.1 |
SystemRequirements: | GNU make |
Encoding: | UTF-8 |
Suggests: | testthat, updog, fitPoly, polymapR, AGHmatrix, gatepoints, knitr, rmarkdown, stringr |
URL: | https://github.com/mmollina/MAPpoly |
BugReports: | https://github.com/mmollina/MAPpoly/issues |
VignetteBuilder: | knitr |
NeedsCompilation: | yes |
Packaged: | 2024-03-06 03:10:56 UTC; mmollin |
Author: | Marcelo Mollinari |
Repository: | CRAN |
Date/Publication: | 2024-03-06 17:20:02 UTC |
Add a single marker to a map
Description
Creates a new map by adding a marker in a given position in a pre-built map.
Usage
add_marker(
input.map,
mrk,
pos,
rf.matrix,
genoprob = NULL,
phase.config = "best",
tol = 0.001,
extend.tail = NULL,
r.test = NULL,
verbose = TRUE
)
Arguments
input.map |
an object of class |
mrk |
the name of the marker to be inserted |
pos |
the name of the marker after which the new marker should be added.
One also can inform the numeric position (between markers) were the
new marker should be added. To insert a marker at the beginning of a
map, use |
rf.matrix |
an object of class |
genoprob |
an object of class |
phase.config |
which phase configuration should be used. "best" (default) will choose the maximum likelihood configuration |
tol |
the desired accuracy (default = 10e-04) |
extend.tail |
the length of the chain's tail that should
be used to calculate the likelihood of the map. If |
r.test |
for internal use only |
verbose |
if |
Details
add_marker
splits the input map into two sub-maps to the left and the
right of the given position. Using the genotype probabilities, it computes
the log-likelihood of all possible linkage phases under a two-point threshold
inherited from function rf_list_to_matrix
.
Value
A list of class mappoly.map
with two elements:
i) info: a list containing information about the map, regardless of the linkage phase configuration:
ploidy |
the ploidy level |
n.mrk |
number of markers |
seq.num |
a vector containing the (ordered) indices of markers in the map, according to the input file |
mrk.names |
the names of markers in the map |
seq.dose.p1 |
a vector containing the dosage in parent 1 for all markers in the map |
seq.dose.p2 |
a vector containing the dosage in parent 2 for all markers in the map |
chrom |
a vector indicating the sequence (usually chromosome) each marker belongs
as informed in the input file. If not available,
|
genome.pos |
physical position (usually in megabase) of the markers into the sequence |
seq.ref |
reference base used for each marker (i.e. A, T, C, G). If not available,
|
seq.alt |
alternative base used for each marker (i.e. A, T, C, G). If not available,
|
chisq.pval |
a vector containing p-values of the chi-squared test of Mendelian segregation for all markers in the map |
data.name |
name of the dataset of class |
ph.thres |
the LOD threshold used to define the linkage phase configurations to test |
ii) a list of maps with possible linkage phase configuration. Each map in the list is also a list containing
seq.num |
a vector containing the (ordered) indices of markers in the map, according to the input file |
seq.rf |
a vector of size ( |
seq.ph |
linkage phase configuration for all markers in both parents |
loglike |
the hmm-based multipoint likelihood |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
Examples
sub.map <- get_submap(maps.hexafake[[1]], 1:20, reestimate.rf = FALSE)
plot(sub.map, mrk.names = TRUE)
s <- make_seq_mappoly(hexafake, sub.map$info$mrk.names)
tpt <- est_pairwise_rf(s)
rf.matrix <- rf_list_to_matrix(input.twopt = tpt,
thresh.LOD.ph = 3,
thresh.LOD.rf = 3,
shared.alleles = TRUE)
###### Removing marker "M_1" (first) #######
mrk.to.remove <- "M_1"
input.map <- drop_marker(sub.map, mrk.to.remove)
plot(input.map, mrk.names = TRUE)
## Computing conditional probabilities using the resulting map
genoprob <- calc_genoprob(input.map)
res.add.M_1 <- add_marker(input.map = input.map,
mrk = "M_1",
pos = 0,
rf.matrix = rf.matrix,
genoprob = genoprob,
tol = 10e-4)
plot(res.add.M_1, mrk.names = TRUE)
best.phase <- res.add.M_1$maps[[1]]$seq.ph
names.id <- names(best.phase$P)
plot_compare_haplotypes(ploidy = 6,
hom.allele.p1 = best.phase$P[names.id],
hom.allele.q1 = best.phase$Q[names.id],
hom.allele.p2 = sub.map$maps[[1]]$seq.ph$P[names.id],
hom.allele.q2 = sub.map$maps[[1]]$seq.ph$Q[names.id])
###### Removing marker "M_10" (middle or last) #######
mrk.to.remove <- "M_10"
input.map <- drop_marker(sub.map, mrk.to.remove)
plot(input.map, mrk.names = TRUE)
# Computing conditional probabilities using the resulting map
genoprob <- calc_genoprob(input.map)
res.add.M_10 <- add_marker(input.map = input.map,
mrk = "M_10",
pos = "M_9",
rf.matrix = rf.matrix,
genoprob = genoprob,
tol = 10e-4)
plot(res.add.M_10, mrk.names = TRUE)
best.phase <- res.add.M_10$maps[[1]]$seq.ph
names.id <- names(best.phase$P)
plot_compare_haplotypes(ploidy = 6,
hom.allele.p1 = best.phase$P[names.id],
hom.allele.q1 = best.phase$Q[names.id],
hom.allele.p2 = sub.map$maps[[1]]$seq.ph$P[names.id],
hom.allele.q2 = sub.map$maps[[1]]$seq.ph$Q[names.id])
Add markers to a pre-existing sequence using HMM analysis and evaluating difference in LOD
Description
Add markers to a pre-existing sequence using HMM analysis and evaluating difference in LOD
Usage
add_md_markers(
input.map,
mrk.to.include,
input.seq,
input.matrix,
input.genoprob,
input.data,
input.mds = NULL,
thresh = 500,
extend.tail = 50,
method = c("hmm", "wMDS_to_1D_pc"),
verbose = TRUE
)
Arguments
input.map |
An object of class |
mrk.to.include |
vector for marker names to be included |
input.seq |
an object of class |
input.matrix |
object of class |
input.genoprob |
an object of class |
input.data |
an object of class |
input.mds |
An object of class |
thresh |
the LOD threshold used to determine if the marker will be included or not after hmm analysis (default = 30) |
extend.tail |
the length of the chain's tail that should be used to calculate the likelihood of the map. If NULL (default), the function uses all markers positioned. Even if info.tail = TRUE, it uses at least extend.tail as the tail length |
method |
indicates whether to use 'hmm' (Hidden Markov Models), 'ols' (Ordinary Least Squares) or 'wMDS_to_1D_pc' (weighted MDS followed by fitting a one dimensional principal curve) to re-estimate the recombination fractions after adding markers |
verbose |
If TRUE (default), current progress is shown; if FALSE, no output is produced |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu with documentation and minor modifications by Cristiane Taniguti chtaniguti@tamu.edu
add a single marker at the tail of a linkage phase list
Description
add a single marker at the tail of a linkage phase list
Usage
add_mrk_at_tail_ph_list(ph.list.1, ph.list.2, cor.index)
Aggregate matrix cells (lower the resolution by a factor)
Description
Aggregate matrix cells (lower the resolution by a factor)
Usage
aggregate_matrix(M, fact)
Frequency of genotypes for two-point recombination fraction estimation
Description
Returns the frequency of each genotype for two-point reduction of dimensionality. The frequency is calculated for all pairwise combinations and for all possible linkage phase configurations.
Usage
cache_counts_twopt(
input.seq,
cached = FALSE,
cache.prev = NULL,
ncpus = 1L,
verbose = TRUE,
joint.prob = FALSE
)
Arguments
input.seq |
an object of class |
cached |
If |
cache.prev |
an object of class |
ncpus |
Number of parallel processes to spawn (default = 1) |
verbose |
If |
joint.prob |
If |
Value
An object of class cache.info
which contains one (conditional probabilities)
or two (both conditional and joint probabilities) lists. Each list
contains all pairs of dosages between parents for all markers
in the sequence. The names in each list are of the form 'A-B-C-D', where: A
represents the dosage in parent 1, marker k; B represents the dosage in parent
1, marker k+1; C represents the dosage in parent 2, marker k;
and D represents the dosage in parent 2, marker k+1. For each
list, the frequencies were computed for all possible linkage
phase configurations. The frequencies for each linkage phase
configuration are distributed in matrices whose names
represents the number of homologous chromosomes that share
alleles. The rows on these matrices represents the dosages in markers k
and k+1 for an individual in the offspring. See Table 3 of
S3 Appendix in Mollinari and Garcia (2019) for an example.
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu with updates by Gabriel Gesteira, gdesiqu@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
all.mrk <- make_seq_mappoly(tetra.solcap, 1:20)
## local computation
counts <- cache_counts_twopt(all.mrk, ncpus = 1)
## load from internal file or web-stored counts (especially important for high ploidy levels)
counts.cached <- cache_counts_twopt(all.mrk, cached = TRUE)
Compute conditional probabilities of the genotypes
Description
Conditional genotype probabilities are calculated for each marker position and each individual given a map.
Usage
calc_genoprob(input.map, step = 0, phase.config = "best", verbose = TRUE)
Arguments
input.map |
An object of class |
step |
Maximum distance (in cM) between positions at which the genotype probabilities are calculated, though for step = 0, probabilities are calculated only at the marker locations. |
phase.config |
which phase configuration should be used. "best" (default) will choose the phase configuration associated with the maximum likelihood |
verbose |
if |
Value
An object of class 'mappoly.genoprob' which has two elements: a tridimensional array containing the probabilities of all possible genotypes for each individual in each marker position; and the marker sequence with it's recombination frequencies
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
## tetraploid example
probs.t <- calc_genoprob(input.map = solcap.dose.map[[1]],
verbose = TRUE)
probs.t
## displaying individual 1, 36 genotypic states
## (rows) across linkage group 1 (columns)
image(t(probs.t$probs[,,1]))
Compute conditional probabilities of the genotypes using probability distribution of dosages
Description
Conditional genotype probabilities are calculated for each marker position and each individual given a map. In this function, the probabilities are not calculated between markers.
Usage
calc_genoprob_dist(
input.map,
dat.prob = NULL,
phase.config = "best",
verbose = TRUE
)
Arguments
input.map |
An object of class |
dat.prob |
an object of class |
phase.config |
which phase configuration should be used. "best" (default) will choose the phase configuration with the maximum likelihood |
verbose |
if |
Value
An object of class 'mappoly.genoprob' which has two elements: a tridimensional array containing the probabilities of all possible genotypes for each individual in each marker position; and the marker sequence with it's recombination frequencies
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
## tetraploid example
probs.t <- calc_genoprob_dist(input.map = solcap.prior.map[[1]],
dat.prob = tetra.solcap.geno.dist,
verbose = TRUE)
probs.t
## displaying individual 1, 36 genotypic states
## (rows) across linkage group 1 (columns)
image(t(probs.t$probs[,,1]))
Compute conditional probabilities of the genotypes using global error
Description
Conditional genotype probabilities are calculated for each marker position and each individual given a map.
Usage
calc_genoprob_error(
input.map,
step = 0,
phase.config = "best",
error = 0.01,
th.prob = 0.95,
restricted = TRUE,
verbose = TRUE
)
Arguments
input.map |
An object of class |
step |
Maximum distance (in cM) between positions at which the genotype probabilities are calculated, though for step = 0, probabilities are calculated only at the marker locations. |
phase.config |
which phase configuration should be used. "best" (default) will choose the maximum likelihood configuration |
error |
the assumed global error rate (default = 0.01) |
th.prob |
the threshold for using global error or genotype probability distribution contained in the dataset (default = 0.95) |
restricted |
if |
verbose |
if |
Value
An object of class 'mappoly.genoprob' which has two elements: a tridimensional array containing the probabilities of all possible genotypes for each individual in each marker position; and the marker sequence with it's recombination frequencies
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
probs.error <- calc_genoprob_error(input.map = solcap.err.map[[1]],
error = 0.05,
verbose = TRUE)
Compute conditional probabilities of the genotypes given a sequence of block markers
Description
Compute conditional probabilities of the genotypes given a sequence of block markers
Usage
calc_genoprob_haplo(
ploidy,
n.mrk,
n.ind,
haplo,
emit = NULL,
rf_vec,
ind.names,
verbose = TRUE,
highprec = FALSE
)
Compute conditional probabilities of the genotype (one informative parent)
Description
Conditional genotype probabilities are calculated for each marker position and each individual given a map
Usage
calc_genoprob_single_parent(
input.map,
step = 0,
info.parent = 1,
uninfo.parent = 2,
global.err = 0,
phase.config = "best",
verbose = TRUE
)
Arguments
input.map |
An object of class |
step |
Maximum distance (in cM) between positions at which the genotype probabilities are calculated, though for step = 0, probabilities are calculated only at the marker locations. |
info.parent |
index for informative parent |
uninfo.parent |
index for uninformative parent |
global.err |
the assumed global error rate (default = 0.0) |
phase.config |
which phase configuration should be used. "best" (default) will choose the phase configuration associated with the maximum likelihood |
verbose |
if |
Value
An object of class 'mappoly.genoprob' which has two elements: a tridimensional array containing the probabilities of all possible genotypes for each individual in each marker position; and the marker sequence with it's recombination frequencies
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
## tetraploid example
s <- make_seq_mappoly(tetra.solcap, 'seq12', info.parent = "p1")
tpt <- est_pairwise_rf(s)
map <- est_rf_hmm_sequential(input.seq = s,
twopt = tpt,
start.set = 10,
thres.twopt = 10,
thres.hmm = 10,
extend.tail = 4,
info.tail = TRUE,
sub.map.size.diff.limit = 8,
phase.number.limit = 4,
reestimate.single.ph.configuration = TRUE,
tol = 10e-2,
tol.final = 10e-3)
plot(map)
probs <- calc_genoprob_single_parent(input.map = map,
info.parent = 1,
uninfo.parent = 2,
step = 1)
probs
## displaying individual 1, 6 genotypic states
## (rows) across linkage group 1 (columns)
image(t(probs$probs[,,2]))
Homolog probabilities
Description
Compute homolog probabilities for all individuals in the full-sib population given a map and conditional genotype probabilities.
Usage
calc_homologprob(input.genoprobs, verbose = TRUE)
Arguments
input.genoprobs |
an object of class |
verbose |
if |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari M., Olukolu B. A., Pereira G. da S., Khan A., Gemenet D., Yencho G. C., Zeng Z-B. (2020), Unraveling the Hexaploid Sweetpotato Inheritance Using Ultra-Dense Multilocus Mapping, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400620
Examples
## tetraploid example
w1 <- calc_genoprob(solcap.dose.map[[1]])
h.prob <- calc_homologprob(w1)
print(h.prob)
plot(h.prob, ind = 5, use.plotly = FALSE)
## using error modeling (removing noise)
w2 <- calc_genoprob_error(solcap.err.map[[1]])
h.prob2 <- calc_homologprob(w2)
print(h.prob2)
plot(h.prob2, ind = 5, use.plotly = FALSE)
Preferential pairing profiles
Description
Given the genotype conditional probabilities for a map, this function computes the probability profiles for all possible homolog pairing configurations in both parents.
Usage
calc_prefpair_profiles(input.genoprobs, verbose = TRUE)
Arguments
input.genoprobs |
an object of class |
verbose |
if |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu and Guilherme Pereira, g.pereira@cgiar.org
References
Mollinari M., Olukolu B. A., Pereira G. da S., Khan A., Gemenet D., Yencho G. C., Zeng Z-B. (2020), Unraveling the Hexaploid Sweetpotato Inheritance Using Ultra-Dense Multilocus Mapping, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400620
Examples
## tetraploid example
w1 <- lapply(solcap.dose.map[1:12], calc_genoprob)
x1 <- calc_prefpair_profiles(w1)
print(x1)
plot(x1)
cat for phase information
Description
cat for phase information
Usage
cat_phase(
input.seq,
input.ph,
all.ph,
ct,
seq.num,
twopt.phase.number,
hmm.phase.number
)
Checks the consistency of dataset (probability distribution)
Description
Checks the consistency of dataset (probability distribution)
Usage
check_data_dist_sanity(x)
Checks the consistency of dataset (dosage)
Description
Checks the consistency of dataset (dosage)
Usage
check_data_dose_sanity(x)
Data sanity check
Description
Checks the consistency of a dataset
Usage
check_data_sanity(x)
Arguments
x |
an object of class |
Value
if consistent, returns 0. If not consistent, returns a
vector with a number of tests, where TRUE
indicates
a failed test.
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
check_data_sanity(tetra.solcap)
Check if it is possible to estimate the recombination fraction between neighbor markers using two-point estimation
Description
Check if it is possible to estimate the recombination fraction between neighbor markers using two-point estimation
Usage
check_if_rf_is_possible(input.seq)
Compare a list of linkage phases and return the markers for which they are different.
Description
Compare a list of linkage phases and return the markers for which they are different.
Usage
check_ls_phase(ph)
Check if all pairwise combinations of elements of input.seq
are contained in twopt
Description
Check if all pairwise combinations of elements of input.seq
are contained in twopt
Usage
check_pairwise(input.seq, twopt)
Arguments
input.seq |
An object of class |
twopt |
An object of class |
Value
If all pairwise combinations of elements of
input.seq
are contained in twopt
, the function
returns 0. Otherwise, returns the missing pairs.
Compare two polyploid haplotypes stored in list format
Description
Compare two polyploid haplotypes stored in list format
Usage
compare_haplotypes(ploidy, h1, h2)
Arguments
ploidy |
ploidy level |
h1 |
homology group 1 |
h2 |
homology group 2 |
Value
a numeric vector of size ploidy
indicating which
homolog in h2 represents the homolog in h1. If there is
no correspondence, i.e. different homolog, it returns NA for
that homolog.
Compare a list of maps
Description
Compare lengths, density, maximum gaps and log likelihoods in a list of maps. In order to make the maps comparable, the function uses the intersection of markers among maps.
Usage
compare_maps(...)
Arguments
... |
a list of objects of class |
Value
A data frame where the lines correspond to the maps in the order provided in input list list
Concatenate new marker
Description
Inserts a new marker at the end of the sequence, taking into account the two-point information
Usage
concatenate_new_marker(X = NULL, d, sh = NULL, seq.num = NULL, ploidy, mrk = 1)
Arguments
X |
a list of matrices whose columns represent homologous chromosomes and the rows represent markers |
d |
the dosage of the inserted marker |
sh |
a list of shared alleles between all markers in the sequence |
seq.num |
a vector of integers containing the number of each marker in the raw data file |
ploidy |
the ploidy level |
mrk |
the marker to be inserted |
Value
a unique list of matrices representing linkage phases
concatenate two linkage phase lists
Description
concatenate two linkage phase lists
Usage
concatenate_ph_list(ph.list.1, ph.list.2)
Create a map with pseudomarkers at a given step
Description
Create a map with pseudomarkers at a given step
Usage
create_map(input.map, step = 0, phase.config = "best")
Simulate an autopolyploid full-sib population
Description
Simulate an autopolyploid full-sib population with one or two informative parents under random chromosome segregation.
Usage
cross_simulate(
parental.phases,
map.length,
n.ind,
draw = FALSE,
file = "output.pdf",
prefix = NULL,
seed = NULL,
width = 12,
height = 6,
prob.P = NULL,
prob.Q = NULL
)
Arguments
parental.phases |
a list containing the linkage phase information for both parents |
map.length |
the map length |
n.ind |
number of individuals in the offspring |
draw |
if |
file |
name of the output file. It is ignored if
|
prefix |
prefix used in all marker names. |
seed |
random number generator seed (default = NULL) |
width |
the width of the graphics region in inches (default = 12) |
height |
the height of the graphics region in inches (default = 6) |
prob.P |
a vector indicating the proportion of preferential pairing in parent P (currently ignored) |
prob.Q |
a vector indicating the proportion of preferential pairing in parent Q (currently ignored) |
Details
parental.phases.p
and parental.phases.q
are lists of vectors
containing linkage phase configurations. Each vector contains the
numbers of the homologous chromosomes in which the alleles are
located. For instance, a vector containing (1,3,4)
means that
the marker has three doses located in the chromosomes 1, 3 and 4. For
zero doses, use 0.
For more sophisticated simulations, we strongly recommend using PedigreeSim V2.0
https://github.com/PBR/pedigreeSim
Value
an object of class mappoly.data
. See
read_geno
for more information
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
h.temp <- sim_homologous(ploidy = 6, n.mrk = 20)
fake.poly.dat <- cross_simulate(h.temp, map.length = 100, n.ind = 200)
plot(fake.poly.dat)
Detects which parent is informative
Description
Detects which parent is informative
Usage
detect_info_par(x)
Arguments
x |
an object of class |
Returns the class with the highest probability in a genotype probability distribution
Description
Returns the class with the highest probability in a genotype probability distribution
Usage
dist_prob_to_class(geno, prob.thres = 0.9)
Arguments
geno |
the probabilistic genotypes contained in the object
|
prob.thres |
probability threshold to select the genotype. Values below this genotype are assumed as missing data |
Value
a matrix containing the doses of each genotype and marker. Markers are disposed in rows and individuals are disposed in columns. Missing data are represented by NAs
Examples
geno.dose <- dist_prob_to_class(hexafake.geno.dist$geno)
geno.dose$geno.dose[1:10,1:10]
Draw simple parental linkage phase configurations
Description
This function draws the parental map (including the linkage phase configuration) in a pdf output. This function is not to be directly called by the user
Usage
draw_cross(
ploidy,
dist.vec = NULL,
hom.allele.p,
hom.allele.q,
file = NULL,
width = 12,
height = 6
)
Plot the linkage phase configuration given a list of homologous chromosomes
Description
Plot the linkage phase configuration given a list of homologous chromosomes
Usage
draw_phases(ploidy, hom.allele.p, hom.allele.q)
Arguments
ploidy |
ploidy level |
hom.allele.p |
a |
hom.allele.q |
same for parent Q |
Remove markers from a map
Description
This function creates a new map by removing markers from an existing one.
Usage
drop_marker(input.map, mrk, verbose = TRUE)
Arguments
input.map |
an object of class |
mrk |
a vector containing markers to be removed from the input map, identified by their names or positions |
verbose |
if |
Value
an object of class mappoly.map
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
Examples
sub.map <- get_submap(maps.hexafake[[1]], 1:50, reestimate.rf = FALSE)
plot(sub.map, mrk.names = TRUE)
mrk.to.remove <- c("M_1", "M_23", "M_34")
red.map <- drop_marker(sub.map, mrk.to.remove)
plot(red.map, mrk.names = TRUE)
Edit sequence ordered by reference genome positions comparing to another set order
Description
Edit sequence ordered by reference genome positions comparing to another set order
Usage
edit_order(input.seq, invert = NULL, remove = NULL)
Arguments
input.seq |
object of class mappoly.sequence with alternative order (not genomic order) |
invert |
vector of marker names to be inverted |
remove |
vector of marker names to be removed |
Value
object of class mappoly.edit.order
: a list containing
vector of marker names ordered according to editions ('edited_order');
vector of removed markers names ('removed');
vector of inverted markers names ('inverted').
Author(s)
Cristiane Taniguti, chtaniguti@tamu.edu
Examples
dat <- filter_segregation(tetra.solcap, inter = FALSE)
seq_dat <- make_seq_mappoly(dat)
seq_chr <- make_seq_mappoly(seq_dat, arg = seq_dat$seq.mrk.names[which(seq_dat$chrom=="1")])
tpt <- est_pairwise_rf(seq_chr)
seq.filt <- rf_snp_filter(tpt, probs = c(0.05, 0.95))
mat <- rf_list_to_matrix(tpt)
mat2 <- make_mat_mappoly(mat, seq.filt)
seq_test_mds <- mds_mappoly(mat2)
seq_mds <- make_seq_mappoly(seq_test_mds)
edit_seq <- edit_order(input.seq = seq_mds)
Eliminate configurations using two-point information
Description
Drops unlikely configuration phases given the two-point information and a LOD threshold
Usage
elim_conf_using_two_pts(input.seq, twopt, thres)
Arguments
input.seq |
an object of class |
twopt |
an object of class |
thres |
threshold from which the linkage phases can be discarded (if abs(ph_LOD) > thres) |
Value
a unique list of matrices representing linkage phases
Eliminates equivalent linkage phase configurations
Description
Drop equivalent linkage phase configurations, i.e. the ones which have permuted homologous chromosomes
Usage
elim_equiv(Z)
Arguments
Z |
a list of matrices whose columns represent homologous chromosomes and the rows represent markers |
Value
a unique list of matrices
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
Eliminate redundant markers
Description
Eliminate markers with identical dosage information for all individuals.
Usage
elim_redundant(input.seq, data = NULL)
Arguments
input.seq |
an object of class |
data |
name of the dataset that contains sequence markers (optional, default = NULL) |
Value
An object of class mappoly.unique.seq
which
is a list containing the following components:
unique.seq |
an object of class |
kept |
a vector containing the name of the informative markers |
eliminated |
a vector containing the name of the non-informative (eliminated) markers |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu, with minor modifications by Gabriel Gesteira, gdesiqu@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
all.mrk <- make_seq_mappoly(hexafake, 'all')
red.mrk <- elim_redundant(all.mrk)
plot(red.mrk)
unique.mrks <- make_seq_mappoly(red.mrk)
Re-estimate genetic map given a global genotyping error
Description
This function considers a global error when re-estimating a genetic map using Hidden Markov models. Since this function uses the whole transition space in the HMM, its computation can take a while, especially for hexaploid maps.
Usage
est_full_hmm_with_global_error(
input.map,
error = NULL,
tol = 0.001,
restricted = TRUE,
th.prob = 0.95,
verbose = FALSE
)
Arguments
input.map |
an object of class |
error |
the assumed global error rate (default = NULL) |
tol |
the desired accuracy (default = 10e-04) |
restricted |
if |
th.prob |
the threshold for using global error or genotype probability distribution if present in the dataset (default = 0.95) |
verbose |
if |
Value
A list of class mappoly.map
with two elements:
i) info: a list containing information about the map, regardless of the linkage phase configuration:
ploidy |
the ploidy level |
n.mrk |
number of markers |
seq.num |
a vector containing the (ordered) indices of markers in the map, according to the input file |
mrk.names |
the names of markers in the map |
seq.dose.p1 |
a vector containing the dosage in parent 1 for all markers in the map |
seq.dose.p2 |
a vector containing the dosage in parent 2 for all markers in the map |
chrom |
a vector indicating the sequence (usually chromosome) each marker belongs
as informed in the input file. If not available,
|
genome.pos |
physical position (usually in megabase) of the markers into the sequence |
seq.ref |
reference base used for each marker (i.e. A, T, C, G). If not available,
|
seq.alt |
alternative base used for each marker (i.e. A, T, C, G). If not available,
|
chisq.pval |
a vector containing p-values of the chi-squared test of Mendelian segregation for all markers in the map |
data.name |
name of the dataset of class |
ph.thres |
the LOD threshold used to define the linkage phase configurations to test |
ii) a list of maps with possible linkage phase configuration. Each map in the list is also a list containing
seq.num |
a vector containing the (ordered) indices of markers in the map, according to the input file |
seq.rf |
a vector of size ( |
seq.ph |
linkage phase configuration for all markers in both parents |
loglike |
the hmm-based multipoint likelihood |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
submap <- get_submap(solcap.dose.map[[1]], mrk.pos = 1:20, verbose = FALSE)
err.submap <- est_full_hmm_with_global_error(submap,
error = 0.01,
tol = 10e-4,
verbose = TRUE)
err.submap
plot_map_list(list(dose = submap, err = err.submap),
title = "estimation procedure")
Re-estimate genetic map using dosage prior probability distribution
Description
This function considers dosage prior distribution when re-estimating a genetic map using Hidden Markov models
Usage
est_full_hmm_with_prior_prob(
input.map,
dat.prob = NULL,
phase.config = "best",
tol = 0.001,
verbose = FALSE
)
Arguments
input.map |
an object of class |
dat.prob |
an object of class |
phase.config |
which phase configuration should be used. "best" (default) will choose the maximum likelihood configuration |
tol |
the desired accuracy (default = 10e-04) |
verbose |
if |
Value
A list of class mappoly.map
with two elements:
i) info: a list containing information about the map, regardless of the linkage phase configuration:
ploidy |
the ploidy level |
n.mrk |
number of markers |
seq.num |
a vector containing the (ordered) indices of markers in the map, according to the input file |
mrk.names |
the names of markers in the map |
seq.dose.p1 |
a vector containing the dosage in parent 1 for all markers in the map |
seq.dose.p2 |
a vector containing the dosage in parent 2 for all markers in the map |
chrom |
a vector indicating the sequence (usually chromosome) each marker belongs
as informed in the input file. If not available,
|
genome.pos |
physical position (usually in megabase) of the markers into the sequence |
seq.ref |
reference base used for each marker (i.e. A, T, C, G). If not available,
|
seq.alt |
alternative base used for each marker (i.e. A, T, C, G). If not available,
|
chisq.pval |
a vector containing p-values of the chi-squared test of Mendelian segregation for all markers in the map |
data.name |
name of the dataset of class |
ph.thres |
the LOD threshold used to define the linkage phase configurations to test |
ii) a list of maps with possible linkage phase configuration. Each map in the list is also a list containing
seq.num |
a vector containing the (ordered) indices of markers in the map, according to the input file |
seq.rf |
a vector of size ( |
seq.ph |
linkage phase configuration for all markers in both parents |
loglike |
the hmm-based multipoint likelihood |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
submap <- get_submap(solcap.dose.map[[1]], mrk.pos = 1:20, verbose = FALSE)
prob.submap <- est_full_hmm_with_prior_prob(submap,
dat.prob = tetra.solcap.geno.dist,
tol = 10e-4,
verbose = TRUE)
prob.submap
plot_map_list(list(dose = submap, prob = prob.submap),
title = "estimation procedure")
Estimate a genetic map given a sequence of block markers
Description
Estimate a genetic map given a sequence of block markers
Usage
est_haplo_hmm(
ploidy,
n.mrk,
n.ind,
haplo,
emit = NULL,
rf_vec,
verbose = TRUE,
use_H0 = FALSE,
highprec = FALSE,
tol = 0.001
)
Estimate a genetic map given a sequence of block markers given the conditional probabilities of the genotypes
Description
Estimate a genetic map given a sequence of block markers given the conditional probabilities of the genotypes
Usage
est_map_haplo_given_genoprob(map.list, genoprob.list, tol = 1e-04)
Pairwise two-point analysis
Description
Performs the two-point pairwise analysis between all markers in a sequence. For each pair, the function estimates the recombination fraction for all possible linkage phase configurations and associated LOD Scores.
Usage
est_pairwise_rf(
input.seq,
count.cache = NULL,
count.matrix = NULL,
ncpus = 1L,
mrk.pairs = NULL,
n.batches = 1L,
est.type = c("disc", "prob"),
verbose = TRUE,
memory.warning = TRUE,
parallelization.type = c("PSOCK", "FORK"),
tol = .Machine$double.eps^0.25,
ll = FALSE
)
Arguments
input.seq |
an object of class |
count.cache |
an object of class |
count.matrix |
similar to |
ncpus |
Number of parallel processes (cores) to spawn (default = 1) |
mrk.pairs |
a matrix of dimensions 2*N, containing N
pairs of markers to be analyzed. If |
n.batches |
deprecated. Not available on MAPpoly 0.3.0 or higher |
est.type |
Indicates whether to use the discrete ("disc") or the probabilistic ("prob") dosage scoring when estimating the two-point recombination fractions. |
verbose |
If |
memory.warning |
if |
parallelization.type |
one of the supported cluster types. This should be either PSOCK (default) or FORK. |
tol |
the desired accuracy. See |
ll |
will return log-likelihood instead of LOD scores. (for internal use) |
Value
An object of class mappoly.twopt
which is a list containing the following components:
data.name
Name of the object of class
mappoly.data
containing the raw data.n.mrk
Number of markers in the sequence.
seq.num
A
vector
containing the (ordered) indices of markers in the sequence, according to the input file.pairwise
A list of size
choose(length(input.seq$seq.num), 2)
, where each element is a matrix. The rows are named in the format x-y, where x and y indicate how many homologues share the same allelic variant in parents P and Q, respectively (see Mollinari and Garcia, 2019 for notation). The first column indicates the LOD Score for the most likely linkage phase configuration. The second column shows the estimated recombination fraction for each configuration, and the third column indicates the LOD Score for comparing the likelihood under no linkage (r = 0.5) with the estimated recombination fraction (evidence of linkage).chisq.pval.thres
Threshold used to perform the segregation tests.
chisq.pval
P-values associated with the performed segregation tests.
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
## Tetraploid example (first 50 markers)
all.mrk <- make_seq_mappoly(tetra.solcap, 1:50)
red.mrk <- elim_redundant(all.mrk)
unique.mrks <- make_seq_mappoly(red.mrk)
all.pairs <- est_pairwise_rf(input.seq = unique.mrks,
ncpus = 1,
verbose = TRUE)
all.pairs
plot(all.pairs, 20, 21)
mat <- rf_list_to_matrix(all.pairs)
plot(mat)
Pairwise two-point analysis - RcppParallel version
Description
Performs the two-point pairwise analysis between all markers in a sequence. For each pair, the function estimates the recombination fraction for all possible linkage phase configurations and associated LOD Scores.
Usage
est_pairwise_rf2(
input.seq,
ncpus = 1L,
mrk.pairs = NULL,
verbose = TRUE,
tol = .Machine$double.eps^0.25
)
Arguments
input.seq |
an object of class |
ncpus |
Number of parallel processes (cores) to spawn (default = 1) |
mrk.pairs |
a matrix of dimensions 2*N, containing N
pairs of markers to be analyzed. If |
verbose |
If |
tol |
the desired accuracy. See |
Details
Differently from est_pairwise_rf this function returns only the values associated to the best linkage phase configuration.
Value
An object of class mappoly.twopt2
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
## Tetraploid example
all.mrk <- make_seq_mappoly(tetra.solcap, 100:200)
all.pairs <- est_pairwise_rf2(input.seq = all.mrk, ncpus = 2)
m <- rf_list_to_matrix(all.pairs)
plot(m, fact = 2)
Multipoint analysis using Hidden Markov Models in autopolyploids
Description
Performs the multipoint analysis proposed by Mollinari and Garcia (2019) in a sequence of markers
Usage
est_rf_hmm(
input.seq,
input.ph = NULL,
thres = 0.5,
twopt = NULL,
verbose = FALSE,
tol = 1e-04,
est.given.0.rf = FALSE,
reestimate.single.ph.configuration = TRUE,
high.prec = TRUE
)
## S3 method for class 'mappoly.map'
print(x, detailed = FALSE, ...)
## S3 method for class 'mappoly.map'
plot(
x,
left.lim = 0,
right.lim = Inf,
phase = TRUE,
mrk.names = FALSE,
cex = 1,
config = "best",
P = "Parent 1",
Q = "Parent 2",
xlim = NULL,
...
)
Arguments
input.seq |
an object of class |
input.ph |
an object of class |
thres |
LOD Score threshold used to determine if the linkage phases compared via two-point analysis should be considered. Smaller values will result in smaller number of linkage phase configurations to be evaluated by the multipoint algorithm. |
twopt |
an object of class |
verbose |
if |
tol |
the desired accuracy (default = 1e-04) |
est.given.0.rf |
logical. If TRUE returns a map forcing all recombination fractions equals to 0 (1e-5, for internal use only. Default = FALSE) |
reestimate.single.ph.configuration |
logical. If |
high.prec |
logical. If |
x |
an object of the class |
detailed |
logical. if TRUE, prints the linkage phase configuration and the marker position for all maps. If FALSE (default), prints a map summary |
... |
currently ignored |
left.lim |
the left limit of the plot (in cM, default = 0). |
right.lim |
the right limit of the plot (in cM, default = Inf, i.e., will print the entire map) |
phase |
logical. If |
mrk.names |
if TRUE, marker names are displayed (default = FALSE) |
cex |
The magnification to be used for marker names |
config |
should be |
P |
a string containing the name of parent P |
Q |
a string containing the name of parent Q |
xlim |
range of the x-axis. If |
Details
This function first enumerates a set of linkage phase configurations
based on two-point recombination fraction information using a threshold
provided by the user (argument thresh
). After that, for each
configuration, it reconstructs the genetic map using the
HMM approach described in Mollinari and Garcia (2019). As result, it returns
the multipoint likelihood for each configuration in form of LOD Score comparing
each configuration to the most likely one. It is recommended to use a small number
of markers (e.g. 50 markers for hexaploids) since the possible linkage
phase combinations bounded only by the two-point information can be huge.
Also, it can be quite sensible to small changes in 'thresh'
.
For a large number of markers, please see est_rf_hmm_sequential
.
Value
A list of class mappoly.map
with two elements:
i) info: a list containing information about the map, regardless of the linkage phase configuration:
ploidy |
the ploidy level |
n.mrk |
number of markers |
seq.num |
a vector containing the (ordered) indices of markers in the map, according to the input file |
mrk.names |
the names of markers in the map |
seq.dose.p1 |
a vector containing the dosage in parent 1 for all markers in the map |
seq.dose.p2 |
a vector containing the dosage in parent 2 for all markers in the map |
chrom |
a vector indicating the sequence (usually chromosome) each marker belongs
as informed in the input file. If not available,
|
genome.pos |
physical position (usually in megabase) of the markers into the sequence |
seq.ref |
reference base used for each marker (i.e. A, T, C, G). If not available,
|
seq.alt |
alternative base used for each marker (i.e. A, T, C, G). If not available,
|
chisq.pval |
a vector containing p-values of the chi-squared test of Mendelian segregation for all markers in the map |
data.name |
name of the dataset of class |
ph.thres |
the LOD threshold used to define the linkage phase configurations to test |
ii) a list of maps with possible linkage phase configuration. Each map in the list is also a list containing
seq.num |
a vector containing the (ordered) indices of markers in the map, according to the input file |
seq.rf |
a vector of size ( |
seq.ph |
linkage phase configuration for all markers in both parents |
loglike |
the hmm-based multipoint likelihood |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. https://doi.org/10.1534/g3.119.400378
Examples
mrk.subset <- make_seq_mappoly(hexafake, 1:10)
red.mrk <- elim_redundant(mrk.subset)
unique.mrks <- make_seq_mappoly(red.mrk)
subset.pairs <- est_pairwise_rf(input.seq = unique.mrks,
ncpus = 1,
verbose = TRUE)
## Estimating subset map with a low tolerance for the E.M. procedure
## for CRAN testing purposes
subset.map <- est_rf_hmm(input.seq = unique.mrks,
thres = 2,
twopt = subset.pairs,
verbose = TRUE,
tol = 0.1,
est.given.0.rf = FALSE)
subset.map
## linkage phase configuration with highest likelihood
plot(subset.map, mrk.names = TRUE, config = "best")
## the second one
plot(subset.map, mrk.names = TRUE, config = 2)
Multipoint analysis using Hidden Markov Models: Sequential phase elimination
Description
Performs the multipoint analysis proposed by Mollinari and Garcia (2019) in a sequence of markers removing unlikely phases using sequential multipoint information.
Usage
est_rf_hmm_sequential(
input.seq,
twopt,
start.set = 4,
thres.twopt = 5,
thres.hmm = 50,
extend.tail = NULL,
phase.number.limit = 20,
sub.map.size.diff.limit = Inf,
info.tail = TRUE,
reestimate.single.ph.configuration = FALSE,
tol = 0.1,
tol.final = 0.001,
verbose = TRUE,
detailed.verbose = FALSE,
high.prec = FALSE
)
Arguments
input.seq |
an object of class |
twopt |
an object of class |
start.set |
number of markers to start the phasing procedure (default = 4) |
thres.twopt |
the LOD threshold used to determine if the linkage
phases compared via two-point analysis should be considered
for the search space reduction (A.K.A. |
thres.hmm |
the LOD threshold used to determine if the linkage phases compared via hmm analysis should be evaluated in the next round of marker inclusion (default = 50) |
extend.tail |
the length of the chain's tail that should
be used to calculate the likelihood of the map. If |
phase.number.limit |
the maximum number of linkage phases of the sub-maps defined
by arguments |
sub.map.size.diff.limit |
the maximum accepted length
difference between the current and the previous sub-map defined
by arguments |
info.tail |
if |
reestimate.single.ph.configuration |
logical. If |
tol |
the desired accuracy during the sequential phase (default = 10e-02) |
tol.final |
the desired accuracy for the final map (default = 10e-04) |
verbose |
If |
detailed.verbose |
If |
high.prec |
logical. If |
Details
This function sequentially includes markers into a map given an
ordered sequence. It uses two-point information to eliminate
unlikely linkage phase configurations given thres.twopt
. The
search is made within a window of size extend.tail
. For the
remaining configurations, the HMM-based likelihood is computed and
the ones that pass the HMM threshold (thres.hmm
) are eliminated.
Value
A list of class mappoly.map
with two elements:
i) info: a list containing information about the map, regardless of the linkage phase configuration:
ploidy |
the ploidy level |
n.mrk |
number of markers |
seq.num |
a vector containing the (ordered) indices of markers in the map, according to the input file |
mrk.names |
the names of markers in the map |
seq.dose.p1 |
a vector containing the dosage in parent 1 for all markers in the map |
seq.dose.p2 |
a vector containing the dosage in parent 2 for all markers in the map |
chrom |
a vector indicating the sequence (usually chromosome) each marker belongs
as informed in the input file. If not available,
|
genome.pos |
physical position (usually in megabase) of the markers into the sequence |
seq.ref |
reference base used for each marker (i.e. A, T, C, G). If not available,
|
seq.alt |
alternative base used for each marker (i.e. A, T, C, G). If not available,
|
chisq.pval |
a vector containing p-values of the chi-squared test of Mendelian segregation for all markers in the map |
data.name |
name of the dataset of class |
ph.thres |
the LOD threshold used to define the linkage phase configurations to test |
ii) a list of maps with possible linkage phase configuration. Each map in the list is also a list containing
seq.num |
a vector containing the (ordered) indices of markers in the map, according to the input file |
seq.rf |
a vector of size ( |
seq.ph |
linkage phase configuration for all markers in both parents |
loglike |
the hmm-based multipoint likelihood |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
mrk.subset <- make_seq_mappoly(hexafake, 1:20)
red.mrk <- elim_redundant(mrk.subset)
unique.mrks <- make_seq_mappoly(red.mrk)
subset.pairs <- est_pairwise_rf(input.seq = unique.mrks,
ncpus = 1,
verbose = TRUE)
subset.map <- est_rf_hmm_sequential(input.seq = unique.mrks,
thres.twopt = 5,
thres.hmm = 10,
extend.tail = 10,
tol = 0.1,
tol.final = 10e-3,
phase.number.limit = 5,
twopt = subset.pairs,
verbose = TRUE)
print(subset.map, detailed = TRUE)
plot(subset.map)
plot(subset.map, left.lim = 0, right.lim = 1, mrk.names = TRUE)
plot(subset.map, phase = FALSE)
## Retrieving simulated linkage phase
ph.P <- maps.hexafake[[1]]$maps[[1]]$seq.ph$P
ph.Q <- maps.hexafake[[1]]$maps[[1]]$seq.ph$Q
## Estimated linkage phase
ph.P.est <- subset.map$maps[[1]]$seq.ph$P
ph.Q.est <- subset.map$maps[[1]]$seq.ph$Q
compare_haplotypes(ploidy = 6, h1 = ph.P[names(ph.P.est)], h2 = ph.P.est)
compare_haplotypes(ploidy = 6, h1 = ph.Q[names(ph.Q.est)], h2 = ph.Q.est)
Multipoint analysis using Hidden Markov Models (single phase)
Description
Multipoint analysis using Hidden Markov Models (single phase)
Usage
est_rf_hmm_single_phase(
input.seq,
input.ph.single,
rf.temp = NULL,
tol,
verbose = FALSE,
ret.map.no.rf.estimation = FALSE,
high.prec = TRUE,
max.rf.to.break.EM = 0.5
)
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
Multilocus analysis using Hidden Markov Models (single parent, single phase)
Description
Multilocus analysis using Hidden Markov Models (single parent, single phase)
Usage
est_rf_hmm_single_phase_single_parent(
input.seq,
input.ph.single,
info.parent = 1,
uninfo.parent = 2,
rf.vec = NULL,
global.err = 0,
tol = 0.001,
verbose = FALSE,
ret.map.no.rf.estimation = FALSE
)
Export data to polymapR
Description
See examples at https://rpubs.com/mmollin/tetra_mappoly_vignette.
Usage
export_data_to_polymapR(data.in)
Arguments
data.in |
an object of class |
Value
a dosage matrix
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
Export a genetic map to a CSV file
Description
Function to export genetic linkage map(s) generated by MAPpoly
.
The map(s) should be passed as a single object or a list of objects of class mappoly.map
.
Usage
export_map_list(map.list, file = "map_output.csv")
Arguments
map.list |
A list of objects or a single object of class |
file |
either a character string naming a file or a connection open for writing. "" indicates output to the console. |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
export_map_list(solcap.err.map[[1]], file = "")
Export to QTLpoly
Description
Compute homolog probabilities for all individuals in the full-sib population given a map and conditional genotype probabilities, and exports the results to be used for QTL mapping in the QTLpoly package.
Usage
export_qtlpoly(input.genoprobs, verbose = TRUE)
Arguments
input.genoprobs |
an object of class |
verbose |
if |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari M., Olukolu B. A., Pereira G. da S., Khan A., Gemenet D., Yencho G. C., Zeng Z-B. (2020), Unraveling the Hexaploid Sweetpotato Inheritance Using Ultra-Dense Multilocus Mapping, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400620
Examples
## tetraploid example
w1 <- calc_genoprob(solcap.dose.map[[1]])
h.prob <- export_qtlpoly(w1)
Extract the maker position from an object of class 'mappoly.map'
Description
Extract the maker position from an object of class 'mappoly.map'
Usage
extract_map(input.map, phase.config = "best")
Arguments
input.map |
An object of class |
phase.config |
which phase configuration should be used. "best" (default) will choose the maximum likelihood configuration |
Examples
x <- maps.hexafake[[1]]$info$genome.pos/1e6
y <- extract_map(maps.hexafake[[1]])
plot(y~x, ylab = "Map position (cM)", xlab = "Genome Position (Mbp)")
Filter aneuploid chromosomes from progeny individuals
Description
Filter aneuploid chromosomes from progeny individuals
Usage
filter_aneuploid(input.data, aneuploid.info, ploidy, rm_missing = TRUE)
Arguments
input.data |
name of input object (class |
aneuploid.info |
data.frame with ploidy information by chromosome (columns) for each individual in progeny (rows). The chromosome and individuals names must match the ones in the file used as input in mappoly. |
ploidy |
main ploidy |
rm_missing |
remove also genotype information from chromosomes with missing data (NA) in the aneuploid.info file |
Value
object of class mappoly.data
Author(s)
Cristiane Taniguti, chtaniguti@tamu.edu
Examples
aneuploid.info <- matrix(4, nrow=tetra.solcap$n.ind, ncol = 12)
set.seed(8080)
aneuploid.info[sample(1:length(aneuploid.info), round((4*length(aneuploid.info))/100),0)] <- 3
aneuploid.info[sample(1:length(aneuploid.info), round((4*length(aneuploid.info))/100),0)] <- 5
colnames(aneuploid.info) <- paste0(1:12)
aneuploid.info <- cbind(inds = tetra.solcap$ind.names, aneuploid.info)
filt.dat <- filter_aneuploid(input.data = tetra.solcap,
aneuploid.info = aneuploid.info, ploidy = 4)
Filter out individuals
Description
This function removes individuals from the data set. Individuals can be user-defined or can be accessed via interactive kinship analysis.
Usage
filter_individuals(
input.data,
ind.to.remove = NULL,
inter = TRUE,
type = c("Gmat", "PCA"),
verbose = TRUE
)
Arguments
input.data |
name of input object (class |
ind.to.remove |
individuals to be removed. If |
inter |
if |
type |
A character string specifying the procedure to be used for detecting outlier offspring. Options include "Gmat", which utilizes the genomic kinship matrix, and "PCA", which employs principal component analysis on the dosage matrix. coefficient (or covariance) is to be computed. One of "pearson" (default), "kendall", or "spearman": can be abbreviated. |
verbose |
if |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
Filter MAPpoly Map Configurations by Loglikelihood Threshold
Description
This function filters configurations within a '"mappoly.map"' object based on a specified log-likelihood threshold.
Usage
filter_map_at_hmm_thres(map, thres.hmm)
Arguments
map |
An object of class '"mappoly.map"', which may contain several maps with different linkage phase configurations and their respective log-likelihoods. |
thres.hmm |
The threshold for filtering configurations. |
Value
Returns the modified '"mappoly.map"' object with configurations filtered based on the log-likelihood threshold.
Filter missing genotypes
Description
Excludes markers or individuals based on their proportion of missing data.
Usage
filter_missing(
input.data,
type = c("marker", "individual"),
filter.thres = 0.2,
inter = TRUE
)
Arguments
input.data |
an object of class |
type |
one of the following options:
Please notice that removing individuals with certain amount of data can change some marker parameters (such as depth), and can also change the estimated genotypes for other individuals. So, be careful when removing individuals. |
filter.thres |
maximum percentage of missing data (default = 0.2). |
inter |
if |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu.
Examples
plot(tetra.solcap)
dat.filt.mrk <- filter_missing(input.data = tetra.solcap,
type = "marker",
filter.thres = 0.1,
inter = TRUE)
plot(dat.filt.mrk)
Filter individuals based on missing genotypes
Description
Filter individuals based on missing genotypes
Usage
filter_missing_ind(input.data, filter.thres = 0.2, inter = TRUE)
Arguments
input.data |
an object of class |
filter.thres |
maximum percentage of missing data |
inter |
if |
Filter markers based on missing genotypes
Description
Filter markers based on missing genotypes
Usage
filter_missing_mrk(input.data, filter.thres = 0.2, inter = TRUE)
Arguments
input.data |
an object of class |
filter.thres |
maximum percentage of missing data |
inter |
if |
Filter non-conforming classes in F1, non double reduced population.
Description
Filter non-conforming classes in F1, non double reduced population.
Usage
filter_non_conforming_classes(input.data, prob.thres = NULL)
Filter markers based on chi-square test
Description
This function filter markers based on p-values of a chi-square test. The chi-square test assumes that markers follow the expected segregation patterns under Mendelian inheritance, random chromosome bivalent pairing and no double reduction.
Usage
filter_segregation(input.obj, chisq.pval.thres = NULL, inter = TRUE)
Arguments
input.obj |
name of input object (class |
chisq.pval.thres |
p-value threshold used for chi-square tests (default = Bonferroni aproximation with global alpha of 0.05, i.e., 0.05/n.mrk) |
inter |
if TRUE (default), plots distorted vs. non-distorted markers |
Value
An object of class mappoly.chitest.seq
which contains a list with the following components:
keep |
markers that follow Mendelian segregation pattern |
exclude |
markers with distorted segregation |
chisq.pval.thres |
threshold p-value used for chi-square tests |
data.name |
input dataset used to perform the chi-square tests |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
Examples
mrks.chi.filt <- filter_segregation(input.obj = tetra.solcap,
chisq.pval.thres = 0.05/tetra.solcap$n.mrk,
inter = TRUE)
seq.init <- make_seq_mappoly(mrks.chi.filt)
Allocate markers into linkage blocks
Description
Function to allocate markers into linkage blocks. This is an EXPERIMENTAL FUNCTION and should be used with caution.
Usage
find_blocks(
input.seq,
clustering.type = c("rf", "genome"),
rf.limit = 1e-04,
genome.block.threshold = 10000,
rf.mat = NULL,
ncpus = 1,
ph.thres = 3,
phase.number.limit = 10,
error = 0.05,
verbose = TRUE,
tol = 0.01,
tol.err = 0.001
)
Arguments
input.seq |
an object of class |
clustering.type |
if |
rf.limit |
the maximum value to consider linked markers in
case of |
genome.block.threshold |
the threshold to assume markers are in the same linkage block.
to be considered when allocating markers into blocks in case of |
rf.mat |
an object of class |
ncpus |
Number of parallel processes to spawn |
ph.thres |
the threshold used to sequentially phase markers.
Used in |
phase.number.limit |
the maximum number of linkage phases of the sub-maps.
The default is 10. See |
error |
the assumed global genotyping error rate. If |
verbose |
if |
tol |
tolerance for the C routine, i.e., the value used to evaluate convergence. |
tol.err |
tolerance for the C routine, i.e., the value used to evaluate convergence, including the global genotyping error in the model. |
Value
a list containing 1: a list of blocks in form of mappoly.map
objects;
2: a vector containing markers that were not included into blocks.
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
Examples
## Not run:
## Selecting 50 markers in chromosome 5
s5 <- make_seq_mappoly(tetra.solcap, "seq5")
s5 <- make_seq_mappoly(tetra.solcap, s5$seq.mrk.names[1:50])
tpt5 <- est_pairwise_rf(s5)
m5 <- rf_list_to_matrix(tpt5, 3, 3)
fb.rf <- find_blocks(s5, rf.mat = m5, verbose = FALSE, ncpus = 2)
bl.rf <- fb.rf$blocks
plot_map_list(bl.rf)
## Merging resulting maps
map.merge <- merge_maps(bl.rf, tpt5)
plot(map.merge, mrk.names = T)
## Comparing linkage phases with pre assembled map
id <- na.omit(match(map.merge$info$mrk.names, solcap.err.map[[5]]$info$mrk.names))
map.orig <- get_submap(solcap.err.map[[5]], mrk.pos = id)
p1.m<-map.merge$maps[[1]]$seq.ph$P
p2.m<-map.merge$maps[[1]]$seq.ph$Q
names(p1.m) <- names(p2.m) <- map.merge$info$mrk.names
p1.o<-map.orig$maps[[1]]$seq.ph$P
p2.o<-map.orig$maps[[1]]$seq.ph$Q
names(p1.o) <- names(p2.o) <- map.orig$info$mrk.names
n <- intersect(names(p1.m), names(p1.o))
plot_compare_haplotypes(4, p1.o[n], p2.o[n], p1.m[n], p2.m[n])
### Using genome
fb.geno <- find_blocks(s5, clustering.type = "genome", genome.block.threshold = 10^4)
plot_map_list(fb.geno$blocks)
splt <- lapply(fb.geno$blocks, split_mappoly, 1)
plot_map_list(splt)
## End(Not run)
Format results from pairwise two-point estimation in C++
Description
Format results from pairwise two-point estimation in C++
Usage
format_rf(res)
Design linkage map framework in two steps: i) estimating the recombination fraction with HMM approach for each parent separately using only markers segregating individually (e.g. map 1 - P1:3 x P2:0, P1: 2x4; map 2 - P1:0 x P2:3, P1:4 x P2:2); ii) merging both maps and re-estimate recombination fractions.
Description
Design linkage map framework in two steps: i) estimating the recombination fraction with HMM approach for each parent separately using only markers segregating individually (e.g. map 1 - P1:3 x P2:0, P1: 2x4; map 2 - P1:0 x P2:3, P1:4 x P2:2); ii) merging both maps and re-estimate recombination fractions.
Usage
framework_map(
input.seq,
twopt,
start.set = 10,
thres.twopt = 10,
thres.hmm = 30,
extend.tail = 30,
inflation.lim.p1 = 5,
inflation.lim.p2 = 5,
phase.number.limit = 10,
tol = 0.01,
tol.final = 0.001,
verbose = TRUE,
method = "hmm"
)
Arguments
input.seq |
object of class |
twopt |
object of class |
start.set |
number of markers to start the phasing procedure (default = 4) |
thres.twopt |
the LOD threshold used to determine if the linkage phases compared via two-point analysis should be considered for the search space reduction (default = 5) |
thres.hmm |
the LOD threshold used to determine if the linkage phases compared via hmm analysis should be evaluated in the next round of marker inclusion (default = 50) |
extend.tail |
the length of the chain's tail that should be used to calculate the likelihood of the map. If NULL (default), the function uses all markers positioned. Even if info.tail = TRUE, it uses at least extend.tail as the tail length |
inflation.lim.p1 |
the maximum accepted length difference between the current and the previous parent 1 sub-map defined by arguments info.tail and extend.tail. If the size exceeds this limit, the marker will not be inserted. If NULL(default), then it will insert all markers. |
inflation.lim.p2 |
same as 'inflation.lim.p1' but for parent 2 sub-map. |
phase.number.limit |
the maximum number of linkage phases of the sub-maps defined by arguments info.tail and extend.tail. Default is 20. If the size exceeds this limit, the marker will not be inserted. If Inf, then it will insert all markers. |
tol |
the desired accuracy during the sequential phase of each parental map (default = 10e-02) |
tol.final |
the desired accuracy for the final parental map (default = 10e-04) |
verbose |
If TRUE (default), current progress is shown; if FALSE, no output is produced |
method |
indicates whether to use 'hmm' (Hidden Markov Models), 'ols' (Ordinary Least Squares) to re-estimate the recombination fractions while merging the parental maps (default:hmm) |
Value
list containing three mappoly.map
objects:1) map built with markers with segregation information from parent 1;
2) map built with markers with segregation information from parent 2; 3) maps in 1 and 2 merged
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu with documentation and minor modifications by Cristiane Taniguti chtaniguti@tamu.edu
Generate all possible linkage phases in matrix form given the dose and the number of shared alleles between a inserted marker and a pre-computed linkage configuration.
Description
Generate all possible linkage phases in matrix form given the dose and the number of shared alleles between a inserted marker and a pre-computed linkage configuration.
Usage
generate_all_link_phase_elim_equivalent(X, d, sh, ploidy, k1, k2)
Arguments
X |
a list of matrices whose columns represent homologous chromosomes and the rows represent markers. Each element of the list represents a linkage phase configuration. |
d |
the dosage of the inserted marker |
sh |
the number of shared alleles between k1 (marker already present on the sequence) and k2 (the inserted marker) |
ploidy |
the ploidy level |
k1 |
marker already present on the sequence |
k2 |
inserted marker |
Value
a unique list of matrices representing linkage phases
Eliminate equivalent linkage phases
Description
Generates all possible linkage phases between two blocks of markers (or a block and a marker), eliminating equivalent configurations, i.e. configurations with the same likelihood and also considering the two-point information (shared alleles)
Usage
generate_all_link_phases_elim_equivalent_haplo(
block1,
block2,
rf.matrix,
ploidy,
max.inc = NULL
)
Arguments
block1 |
submap with markers of the first block |
block2 |
submap with markers of the second block, or just a single marker identified by its name |
rf.matrix |
matrix obtained with the function |
ploidy |
ploidy level (i.e. 4, 6 and so on) |
max.inc |
maximum number of allowed inconsistencies (default = NULL: don't check inconsistencies) |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu and Gabriel Gesteira, gdesiqu@ncsu.edu
Genetic Mapping Functions
Description
These functions facilitate the conversion between recombination fractions (r) and genetic distances (d) using various mapping models. The functions starting with 'mf_' convert recombination fractions to genetic distances, while those starting with 'imf_' convert genetic distances back into recombination fractions.
Usage
mf_k(d)
mf_h(d)
mf_m(d)
imf_k(r)
imf_h(r)
imf_m(r)
Arguments
d |
Numeric or numeric vector, representing genetic distances in centiMorgans (cM) for direct functions (mf_k, mf_h, mf_m). |
r |
Numeric or numeric vector, representing recombination fractions for inverse functions (imf_k, imf_h, imf_m). |
Details
The 'mf_' prefixed functions apply different models to convert recombination fractions into genetic distances:
-
mf_k
: Kosambi mapping function. -
mf_h
: Haldane mapping function. -
mf_m
: Morgan mapping function.
The 'imf_' prefixed functions convert genetic distances back into recombination fractions:
-
imf_k
: Inverse Kosambi mapping function. -
imf_h
: Inverse Haldane mapping function. -
imf_m
: Inverse Morgan mapping function.
References
Kosambi, D.D. (1944). The estimation of map distances from recombination values. Ann Eugen., 12, 172-175. Haldane, J.B.S. (1919). The combination of linkage values, and the calculation of distances between the loci of linked factors. J Genet, 8, 299-309. Morgan, T.H. (1911). Random segregation versus coupling in Mendelian inheritance. Science, 34(873), 384.
Prior probability for genotyping error
Description
If restricted = TRUE
, it restricts the prior to the
possible classes under Mendelian, non double-reduced segregation
given dosage of the parents
Usage
genotyping_global_error(
x,
ploidy,
restricted = TRUE,
error = 0.01,
th.prob = 0.95
)
Extract the LOD Scores in a 'mappoly.map'
object
Description
Extract the LOD Scores in a 'mappoly.map'
object
Usage
get_LOD(x, sorted = TRUE)
Arguments
x |
an object of class |
sorted |
logical. if |
Value
a numeric vector containing the LOD Scores
Access a remote server to get Counts for recombinant classes
Description
Access a remote server to get Counts for recombinant classes
Usage
get_cache_two_pts_from_web(
ploidy,
url.address = NULL,
joint.prob = TRUE,
verbose = FALSE
)
Counts for recombinant classes
Description
Counts for recombinant classes
Usage
get_counts(
ploidy,
P.k = NULL,
P.k1 = NULL,
Q.k = NULL,
Q.k1 = NULL,
verbose = FALSE,
make.names = FALSE,
joint.prob = FALSE
)
Counts for recombinant classes
Description
return the counts of each recombinant class (for two loci) in polyploid cross. The results of this function contains several matrices each one corresponding to one possible linkage phase. The associated names in the matrices indicates the number of shared homologous chromosomes. The row names indicates the dosage in loci k and k+1 respectively
Usage
get_counts_all_phases(
x,
ploidy,
verbose = FALSE,
make.names = FALSE,
joint.prob = FALSE
)
Counts for recombinant classes in a polyploid parent.
Description
The conditional probability of a genotype at locus k+1
given the genotype at locus
k
is ...
Usage
get_counts_single_parent(
ploidy,
gen.par.mk1,
gen.par.mk2,
gen.prog.mk1,
gen.prog.mk2
)
Arguments
ploidy |
Ploidy level |
gen.par.mk1 |
Genotype of marker mk1 (vector |
gen.par.mk2 |
Genotype of marker mk2 (vector |
gen.prog.mk1 |
Dose of marker mk1 on progeny |
gen.prog.mk2 |
Dose of marker mk2 on progeny |
Value
S3 object; a list consisting of
counts |
counts for each one of the classes |
Counts for recombinant classes
Description
Counts for recombinant classes
Usage
get_counts_two_parents(
x = c(2, 2),
ploidy,
p.k,
p.k1,
q.k,
q.k1,
verbose = FALSE,
joint.prob = FALSE
)
Get Dosage Type in a Sequence
Description
Analyzes a genomic sequence object to categorize markers based on their dosage type. The function calculates the dosage type by comparing the dosage of two parental sequences (p1 and p2) against the ploidy level. It categorizes markers into simplex for parent 1 (simplex.p), simplex for parent 2 (simplex.q), double simplex (ds), and multiplex based on the calculated dosages.
Usage
get_dosage_type(input.seq)
Arguments
input.seq |
An object of class |
Value
A list with four components categorizing marker names into:
- simplex.p
Markers with a simplex dosage from parent 1.
- simplex.q
Markers with a simplex dosage from parent 2.
- double.simplex
Markers with a double simplex dosage.
- multiplex
Markers not fitting into the above categories, indicating a multiplex dosage.
Get the tail of a marker sequence up to the point where the markers provide no additional information.
Description
Get the tail of a marker sequence up to the point where the markers provide no additional information.
Usage
get_full_info_tail(input.obj, extend = NULL)
Get the genomic position of markers in a sequence
Description
This functions gets the genomic position of markers in a sequence and return an ordered data frame with the name and position of each marker
Usage
get_genomic_order(input.seq, verbose = TRUE)
## S3 method for class 'mappoly.geno.ord'
print(x, ...)
## S3 method for class 'mappoly.geno.ord'
plot(x, ...)
Arguments
input.seq |
a sequence object of class |
verbose |
if |
x |
an object of the class mappoly.geno.ord |
... |
currently ignored |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
Examples
s1 <- make_seq_mappoly(tetra.solcap, "all")
o1 <- get_genomic_order(s1)
plot(o1)
s.geno.ord <- make_seq_mappoly(o1)
Given a pair of character indicating the numbers i and j : 'i-j', returns a numeric pair c(i,j)
Description
Given a pair of character indicating the numbers i and j : 'i-j', returns a numeric pair c(i,j)
Usage
get_ij(w)
Arguments
w |
a pair of characters 'i-j' |
Value
a numeric pair c(i,j)
Get the indices of selected linkage phases given a threshold
Description
Get the indices of selected linkage phases given a threshold
Usage
get_indices_from_selected_phases(x, thres)
Arguments
x |
a data frame containing information about two markers. In this data frame, the lines indicate the possible configuration phases and the columns indicate the LOD for configuration phase (ph_LOD), the recombination fraction (rf), and the LOD for recombination fraction (rf_LOD) |
thres |
a threshold from which the linkage phases can be discarded (if abs(ph_LOD) > thres) |
Value
a list of indices for both parents
Get weighted ordinary least squared map give a sequence and rf matrix
Description
Get weighted ordinary least squared map give a sequence and rf matrix
Usage
get_ols_map(input.seq, input.mat, weight = TRUE)
Given a homology group in matrix form, it returns the number shared homologous for all pairs of markers in this group
Description
Given a homology group in matrix form, it returns the number shared homologous for all pairs of markers in this group
Usage
get_ph_conf_ret_sh(M)
Arguments
M |
matrix whose columns represent homologous chromosomes and the rows represent markers |
Value
a vector containing the number of shared homologous for all pairs of markers
subset of a linkage phase list
Description
subset of a linkage phase list
Usage
get_ph_list_subset(ph.list, seq.num, conf)
Get the recombination fraction for a sequence of markers given an
object of class mappoly.twopt
and a list
containing the linkage phase configuration. This list can be found
in any object of class two.pts.linkage.phases
, in
x$config.to.test$'Conf-i', where x is the object of class
two.pts.linkage.phases
and i is one of the possible
configurations.
Description
Get the recombination fraction for a sequence of markers given an
object of class mappoly.twopt
and a list
containing the linkage phase configuration. This list can be found
in any object of class two.pts.linkage.phases
, in
x$config.to.test$'Conf-i', where x is the object of class
two.pts.linkage.phases
and i is one of the possible
configurations.
Usage
get_rf_from_list(twopt, ph.list)
Arguments
twopt |
an object of class |
ph.list |
a list containing the linkage phase configuration. This
list can be found in any object of class
|
Value
a vector with the recombination fraction between markers present in ph.list, for that specific order.
Get recombination fraction from a matrix
Description
Get recombination fraction from a matrix
Usage
get_rf_from_mat(M)
Get states and emission in one informative parent
Description
Get states and emission in one informative parent
Usage
get_states_and_emission_single_parent(ploidy, ph, global.err, D, dose.notinf.P)
Extract sub-map from map
Description
Given a pre-constructed map, it extracts a sub-map for a provided sequence of marker positions. Optionally, it can update the linkage phase configurations and respective recombination fractions.
Usage
get_submap(
input.map,
mrk.pos,
phase.config = "best",
reestimate.rf = TRUE,
reestimate.phase = FALSE,
thres.twopt = 5,
thres.hmm = 3,
extend.tail = 50,
tol = 0.1,
tol.final = 0.001,
use.high.precision = FALSE,
verbose = TRUE
)
Arguments
input.map |
An object of class |
mrk.pos |
positions of the markers that should be considered in the new map. This can be in any order |
phase.config |
which phase configuration should be used. "best" (default) will choose the configuration associated with the maximum likelihood |
reestimate.rf |
logical. If |
reestimate.phase |
logical. If |
thres.twopt |
the LOD threshold used to determine if the linkage phases compared via two-point analysis should be considered (default = 5) |
thres.hmm |
the threshold used to determine if the linkage phases compared via hmm analysis should be considered (default = 3) |
extend.tail |
the length of the tail of the chain that should
be used to calculate the likelihood of the linkage phases. If
|
tol |
the desired accuracy during the sequential phase (default = 0.1) |
tol.final |
the desired accuracy for the final map (default = 10e-04) |
use.high.precision |
logical. If |
verbose |
If |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
## selecting the six first markers in linkage group 1
## re-estimating the recombination fractions and linkage phases
submap1.lg1 <- get_submap(input.map = maps.hexafake[[1]],
mrk.pos = 1:6, verbose = TRUE,
reestimate.phase = TRUE,
tol.final = 10e-3)
## no recombination fraction re-estimation: first 20 markers
submap2.lg1 <- get_submap(input.map = maps.hexafake[[1]],
mrk.pos = 1:20, reestimate.rf = FALSE,
verbose = TRUE,
tol.final = 10e-3)
plot(maps.hexafake[[1]])
plot(submap1.lg1, mrk.names = TRUE, cex = .8)
plot(submap2.lg1, mrk.names = TRUE, cex = .8)
Get table of dosage combinations
Description
Internal function
Usage
get_tab_mrks(x)
Arguments
x |
an object of class |
Author(s)
Gabriel Gesteira, gdesiqu@ncsu.edu
Get the number of bivalent configurations
Description
Get the number of bivalent configurations
Usage
get_w_m(ploidy)
Color pallet ggplot-like
Description
Color pallet ggplot-like
Usage
gg_color_hue(n)
Assign markers to linkage groups
Description
Identifies linkage groups of markers using the results of two-point (pairwise) analysis.
Usage
group_mappoly(
input.mat,
expected.groups = NULL,
inter = TRUE,
comp.mat = FALSE,
LODweight = FALSE,
verbose = TRUE
)
Arguments
input.mat |
an object of class |
expected.groups |
when available, inform the number of expected linkage groups (i.e. chromosomes) for the species |
inter |
if |
comp.mat |
if |
LODweight |
if |
verbose |
logical. If |
Value
Returns an object of class mappoly.group
, which is a list
containing the following components:
data.name |
the referred dataset name |
hc.snp |
a list containing information related to the UPGMA grouping method |
expected.groups |
the number of expected linkage groups |
groups.snp |
the groups to which each of the markers belong |
seq.vs.grouped.snp |
comparison between the genomic group information
(when available) and the groups provided by |
chisq.pval.thres |
the threshold used on the segregation test when reading the dataset |
chisq.pval |
the p-values associated with the segregation test for all markers in the sequence |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
## Getting first 20 markers from two linkage groups
all.mrk <- make_seq_mappoly(hexafake, c(1:20,601:620))
red.mrk <- elim_redundant(all.mrk)
unique.mrks <- make_seq_mappoly(red.mrk)
counts <- cache_counts_twopt(unique.mrks, cached = TRUE)
all.pairs <- est_pairwise_rf(input.seq = unique.mrks,
count.cache = counts,
ncpus = 1,
verbose = TRUE)
## Full recombination fraction matrix
mat.full <- rf_list_to_matrix(input.twopt = all.pairs)
plot(mat.full, index = FALSE)
lgs <- group_mappoly(input.mat = mat.full,
expected.groups = 2,
inter = TRUE,
comp.mat = TRUE, #this data has physical information
verbose = TRUE)
lgs
plot(lgs)
Simulated autohexaploid dataset.
Description
A dataset of a hypothetical autohexaploid full-sib population containing three homology groups
Usage
hexafake
Format
An object of class mappoly.data
which contains a
list with the following components:
- plody
ploidy level = 6
- n.ind
number individuals = 300
- n.mrk
total number of markers = 1500
- ind.names
the names of the individuals
- mrk.names
the names of the markers
- dosage.p1
a vector containing the dosage in parent P for all
n.mrk
markers- dosage.p2
a vector containing the dosage in parent Q for all
n.mrk
markers- chrom
a vector indicating the chromosome each marker belongs. Zero indicates that the marker was not assigned to any chromosome
- genome.pos
Physical position of the markers into the sequence
- geno.dose
a matrix containing the dosage for each markers (rows) for each individual (columns). Missing data are represented by
ploidy_level + 1 = 7
- n.phen
There are no phenotypes in this simulation
- phen
There are no phenotypes in this simulation
- chisq.pval
vector containing p-values for all markers associated to the chi-square test for the expected segregation patterns under Mendelian segregation
Simulated autohexaploid dataset with genotype probabilities.
Description
A dataset of a hypothetical autohexaploid full-sib population
containing three homology groups. This dataset contains the
probability distribution of the genotypes and 2% of missing data,
but is essentially the same dataset found in hexafake
Usage
hexafake.geno.dist
Format
An object of class mappoly.data
which contains a
list with the following components:
- ploidy
ploidy level = 6
- n.ind
number individuals = 300
- n.mrk
total number of markers = 1500
- ind.names
the names of the individuals
- mrk.names
the names of the markers
- dosage.p1
a vector containing the dosage in parent P for all
n.mrk
markers- dosage.p2
a vector containing the dosage in parent Q for all
n.mrk
markers- chrom
a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence
- genome.pos
Physical position of the markers into the sequence
- prob.thres = 0.95
probability threshold to associate a marker call to a dosage. Markers with maximum genotype probability smaller than 'prob.thres' are considered as missing data for the dosage calling purposes
- geno
a data.frame containing the probability distribution for each combination of marker and offspring. The first two columns represent the marker and the offspring, respectively. The remaining elements represent the probability associated to each one of the possible dosages
- geno.dose
a matrix containing the dosage for each markers (rows) for each individual (columns). Missing data are represented by
ploidy_level + 1 = 7
- n.phen
There are no phenotypes in this simulation
- phen
There are no phenotypes in this simulation
Import data from polymapR
Description
Function to import datasets from polymapR.
Usage
import_data_from_polymapR(
input.data,
ploidy,
parent1 = "P1",
parent2 = "P2",
input.type = c("discrete", "probabilistic"),
prob.thres = 0.95,
pardose = NULL,
offspring = NULL,
filter.non.conforming = TRUE,
verbose = TRUE
)
Arguments
input.data |
a |
ploidy |
the ploidy level |
parent1 |
a character string containing the name (or pattern of genotype IDs) of parent 1 |
parent2 |
a character string containing the name (or pattern of genotype IDs) of parent 2 |
input.type |
Indicates whether the input is discrete ("disc") or probabilistic ("prob") |
prob.thres |
threshold probability to assign a dosage to offspring. If the probability
is smaller than |
pardose |
matrix of dimensions (n.mrk x 3) containing the name of the markers in the first column, and the dosage of parents 1 and 2 in columns 2 and 3. (see polymapR vignette) |
offspring |
a character string containing the name (or pattern of genotype IDs) of the offspring
individuals. If |
filter.non.conforming |
if |
verbose |
if |
Details
See examples at https://rpubs.com/mmollin/tetra_mappoly_vignette.
Author(s)
Marcelo Mollinari mmollin@ncsu.edu
References
Bourke PM et al: (2019) PolymapR — linkage analysis and genetic map construction from F1 populations of outcrossing polyploids. _Bioinformatics_ 34:3496–3502. doi:10.1093/bioinformatics/bty1002
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Import from updog
Description
Read objects with information related to genotype calling in polyploids.
Currently this function supports output objects created with the
updog
(output of multidog
function) package.
This function creates an object of class mappoly.data
Usage
import_from_updog(
object,
prob.thres = 0.95,
filter.non.conforming = TRUE,
chrom = NULL,
genome.pos = NULL,
verbose = TRUE
)
Arguments
object |
the name of the object of class |
prob.thres |
probability threshold to associate a marker call to a dosage. Markers with maximum genotype probability smaller than 'prob.thres' are considered as missing data for the dosage calling purposes |
filter.non.conforming |
if |
chrom |
a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence |
genome.pos |
vector with physical position of the markers into the sequence |
verbose |
if |
Value
An object of class mappoly.data
which contains a
list with the following components:
ploidy |
ploidy level |
n.ind |
number individuals |
n.mrk |
total number of markers |
ind.names |
the names of the individuals |
mrk.names |
the names of the markers |
dosage.p1 |
a vector containing the dosage in
parent P for all |
dosage.p2 |
a vector containing the dosage in
parent Q for all |
chrom |
a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence |
genome.pos |
physical position of the markers into the sequence |
prob.thres |
probability threshold to associate a marker call to a dosage. Markers with maximum genotype probability smaller than 'prob.thres' were considered as missing data in the 'geno.dose' matrix |
geno.dose |
a matrix containing the dosage for each markers (rows)
for each individual (columns). Missing data are represented by
|
geno |
a data.frame
containing the probability distribution for each combination of
marker and offspring. The first two columns represent the marker
and the offspring, respectively. The remaining elements represent
the probability associated to each one of the possible
dosages. Missing data are converted from |
n.phen |
number of phenotypic traits |
phen |
a matrix containing the phenotypic data. The rows correspond to the traits and the columns correspond to the individuals |
chisq.pval |
a vector containing p-values related to the chi-squared test of Mendelian segregation performed for all markers |
Author(s)
Gabriel Gesteira, gdesiqu@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
if(requireNamespace("updog", quietly = TRUE)){
library("updog")
data("uitdewilligen")
mout = multidog(refmat = t(uitdewilligen$refmat),
sizemat = t(uitdewilligen$sizemat),
ploidy = uitdewilligen$ploidy,
model = "f1",
p1_id = colnames(t(uitdewilligen$sizemat))[1],
p2_id = colnames(t(uitdewilligen$sizemat))[2],
nc = 2)
mydata = import_from_updog(mout)
mydata
plot(mydata)
}
Import phased map list from polymapR
Description
Function to import phased map lists from polymapR
Usage
import_phased_maplist_from_polymapR(maplist, mappoly.data, ploidy = NULL)
Arguments
maplist |
a list of phased maps obtained using function
|
mappoly.data |
a dataset used to obtain |
ploidy |
the ploidy level |
Details
See examples at https://rpubs.com/mmollin/tetra_mappoly_vignette.
Author(s)
Marcelo Mollinari mmollin@ncsu.edu
References
Bourke PM et al: (2019) PolymapR — linkage analysis and genetic map construction from F1 populations of outcrossing polyploids. _Bioinformatics_ 34:3496–3502. doi:10.1093/bioinformatics/bty1002
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Check if Object is a Probability Dataset in MAPpoly
Description
Determines whether the specified object is a probability dataset by checking for the existence of the 'geno' component within a '"mappoly.data"' object.
Usage
is.prob.data(x)
Arguments
x |
An object of class '"mappoly.data"' |
Value
A logical value: ‘TRUE' if the ’geno' component exists within 'x', indicating it is a valid probability dataset for genetic analysis; 'FALSE' otherwise.
Multipoint log-likelihood computation
Description
Update the multipoint log-likelihood of a given map using the method proposed by Mollinari and Garcia (2019).
Usage
loglike_hmm(input.map, input.data = NULL, verbose = FALSE)
Arguments
input.map |
An object of class |
input.data |
An object of class |
verbose |
If |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
hexa.map <- loglike_hmm(maps.hexafake[[1]])
hexa.map
List of linkage phases
Description
Returns a list of possible linkage phase configurations using
the two-point information contained in the object mappoly.twopt
as elimination criteria
Usage
ls_linkage_phases(input.seq, thres, twopt, mrk.to.add = NULL, prev.info = NULL)
## S3 method for class 'two.pts.linkage.phases'
print(x, ...)
## S3 method for class 'two.pts.linkage.phases'
plot(x, ...)
Arguments
input.seq |
an object of class |
thres |
the LOD threshold used to determine whether linkage phases compared via two-point analysis should be considered |
twopt |
an object of class |
mrk.to.add |
marker to be added to the end of the linkage
group. If |
prev.info |
(optional) an object of class |
x |
an object of the class |
... |
currently ignored |
Value
An object of class two.pts.linkage.phases
which
contains the following structure:
config.to.test |
a matrix with all possible linkage phase configurations for both parents, P and Q |
rec.frac |
a matrix with all recombination fractions |
ploidy |
the ploidy level |
seq.num |
the sequence of markers |
thres |
the LOD threshold |
data.name |
the dataset name |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
seq.all.mrk <- make_seq_mappoly(hexafake, 'all')
id <- get_genomic_order(seq.all.mrk)
s <- make_seq_mappoly(id)
seq10 <- make_seq_mappoly(hexafake, s$seq.mrk.names[1:10])
twopt <- est_pairwise_rf(seq10)
## Using the first 10 markers
l10.seq.3.0 <- ls_linkage_phases(input.seq = seq10, thres = 3, twopt = twopt)
l10.seq.3.0
plot(l10.seq.3.0)
l10.seq.2.0 <- ls_linkage_phases(input.seq = seq10, thres = 2.0, twopt = twopt)
l10.seq.2.0
plot(l10.seq.2.0)
l10.seq.1.0 <- ls_linkage_phases(input.seq = seq10, thres = 1.0, twopt = twopt)
l10.seq.1.0
plot(l10.seq.1.0)
## Using the first 5 markers
seq5 <- make_seq_mappoly(hexafake, s$seq.mrk.names[1:5])
l5.seq.5.0 <- ls_linkage_phases(input.seq = seq5, thres = 5, twopt = twopt)
l5.seq.5.0
plot(l5.seq.5.0)
l5.seq.3.0 <- ls_linkage_phases(input.seq = seq5, thres = 3, twopt = twopt)
l5.seq.3.0
plot(l5.seq.3.0)
l5.seq.1.0 <- ls_linkage_phases(input.seq = seq5, thres = 1, twopt = twopt)
l5.seq.1.0
plot(l5.seq.1.0)
Subset recombination fraction matrices
Description
Get a subset of an object of class mappoly.rf.matrix
, i.e.
recombination fraction and LOD score matrices based in a
sequence of markers.
Usage
make_mat_mappoly(input.mat, input.seq)
Arguments
input.mat |
an object of class |
input.seq |
an object of class |
Value
an object of class mappoly.rf.matrix
,
which is a subset of 'input.mat'
.
See rf_list_to_matrix
for details
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
# sequence with 20 markers
mrk.seq <- make_seq_mappoly(hexafake, 1:20)
mrk.pairs <- est_pairwise_rf(input.seq = mrk.seq,
verbose = TRUE)
## Full recombination fraction matrix
mat <- rf_list_to_matrix(input.twopt = mrk.pairs)
plot(mat)
## Matrix subset
id <- make_seq_mappoly(hexafake, 1:10)
mat.sub <- make_mat_mappoly(mat, id)
plot(mat.sub)
Subset pairwise recombination fractions
Description
Get a subset of an object of class mappoly.twopt
or mappoly.twopt2
(i.e.
recombination fraction) and LOD score statistics for all possible linkage
phase combinations based on a sequence of markers.
Usage
make_pairs_mappoly(input.twopt, input.seq)
Arguments
input.twopt |
an object of class |
input.seq |
an object of class |
Value
an object of class mappoly.twopt
which is a
subset of input.twopt
.
See est_pairwise_rf
for details
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
## selecting some markers along the genome
some.mrk <- make_seq_mappoly(hexafake, seq(1, 1500, 30))
all.pairs <- est_pairwise_rf(input.seq = some.mrk)
mat.full <- rf_list_to_matrix(input.twopt = all.pairs)
plot(mat.full)
## selecting two-point information for chromosome 1
mrks.1 <- make_seq_mappoly(hexafake, names(which(some.mrk$chrom == 1)))
p1 <- make_pairs_mappoly(input.seq = mrks.1, input.twopt = all.pairs)
m1 <- rf_list_to_matrix(input.twopt = p1)
plot(m1, main.text = "LG1")
Create a Sequence of Markers
Description
Constructs a sequence of markers based on an object belonging to various specified classes. This function is versatile, supporting multiple input types and configurations for generating marker sequences.
Usage
make_seq_mappoly(
input.obj,
arg = NULL,
data.name = NULL,
info.parent = c("all", "p1", "p2"),
genomic.info = NULL
)
## S3 method for class 'mappoly.sequence'
print(x, ...)
## S3 method for class 'mappoly.sequence'
plot(x, ...)
Arguments
input.obj |
An object belonging to one of the specified classes: |
arg |
Specifies the markers to include in the sequence, accepting several formats: a string 'all' for all
markers; a string or vector of strings 'seqx' where x is the sequence number (0 for unassigned markers); a
vector of integers indicating specific markers; or a vector of integers representing linkage group numbers if
|
data.name |
Name of the |
info.parent |
Selection criteria based on parental information: |
genomic.info |
Optional and applicable only to |
x |
An object of class |
... |
Currently ignored. |
Value
Returns an object of class 'mappoly.sequence', comprising:
"seq.num" |
Ordered vector of marker indices according to the input. |
"seq.phases" |
List of linkage phases between markers; -1 for undefined phases. |
"seq.rf" |
Vector of recombination frequencies; -1 for not estimated frequencies. |
"loglike" |
Log-likelihood of the linkage map. |
"data.name" |
Name of the 'mappoly.data' object with raw data. |
"twopt" |
Name of the 'mappoly.twopt' object with 2-point analyses; -1 if not computed. |
Author(s)
Marcelo Mollinari mmollin@ncsu.edu, with modifications by Gabriel Gesteira gdesiqu@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019). Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models. _G3: Genes|Genomes|Genetics_, doi:10.1534/g3.119.400378.
Examples
all.mrk <- make_seq_mappoly(hexafake, 'all')
seq1.mrk <- make_seq_mappoly(hexafake, 'seq1')
plot(seq1.mrk)
some.mrk.pos <- c(1,4,28,32,45)
some.mrk.1 <- make_seq_mappoly(hexafake, some.mrk.pos)
plot(some.mrk.1)
MAPpoly Color Palettes
Description
Provides a set of color palettes designed for use with MAPpoly, a package for genetic mapping in polyploids. These palettes are intended to enhance the visual representation of genetic data.
Usage
mp_pallet1(n)
Details
The available palettes are:
mp_pallet1
A palette with warm colors ranging from yellow to dark red and brown.
mp_pallet2
A palette with cool colors, including purples, blues, and green.
mp_pallet3
A comprehensive palette that combines colors from both
mp_pallet1
andmp_pallet2
, offering a broad range of colors.
Each palette function returns a function that can generate color vectors of variable length, suitable for mapping or plotting functions in R.
Examples
# Generate a palette of 5 colors from mp_pallet1
pal1 <- mp_pallet1(5)
plot(1:5, pch=19, col=pal1)
# Generate a palette of 10 colors from mp_pallet2
pal2 <- mp_pallet2(10)
plot(1:10, pch=19, col=pal2)
# Generate a palette of 15 colors from mp_pallet3
pal3 <- mp_pallet3(15)
plot(1:15, pch=19, col=pal3)
Resulting maps from hexafake
Description
A list containing three linkage groups estimated using the procedure available in [MAPpoly's tutorial](https://mmollina.github.io/MAPpoly/#estimating_the_map_for_a_given_order)
Usage
maps.hexafake
Format
A list containing three objects of class mappoly.map
, each one
representing one linkage group in the simulated data.
Estimates loci position using Multidimensional Scaling
Description
Estimates loci position using Multidimensional Scaling proposed by
Preedy and Hackett (2016). The code is an adaptation from
the package MDSmap
, available under GNU GENERAL PUBLIC LICENSE,
Version 3, at https://CRAN.R-project.org/package=MDSMap
Usage
mds_mappoly(
input.mat,
p = NULL,
n = NULL,
ndim = 2,
weight.exponent = 2,
verbose = TRUE
)
## S3 method for class 'mappoly.pcmap'
print(x, ...)
## S3 method for class 'mappoly.pcmap3d'
print(x, ...)
Arguments
input.mat |
an object of class |
p |
integer. The smoothing parameter for the principal curve.
If |
n |
vector of integers or strings containing loci to be omitted from the analysis |
ndim |
number of dimensions to be considered in the multidimensional scaling procedure (default = 2) |
weight.exponent |
the exponent that should be used in the LOD score values to weight the MDS procedure (default = 2) |
verbose |
if |
x |
an object of class |
... |
currently ignored |
Value
A list containing:
M |
the input distance map |
sm |
the unconstrained MDS results |
pc |
the principal curve results |
distmap |
a matrix of pairwise distances between loci where the columns are in the estimated order |
locimap |
a data frame of the loci containing the name and position of each locus in order of increasing distance |
length |
integer giving the total length of the segment |
removed |
a vector of the names of loci removed from the analysis |
scale |
the scaling factor from the MDS |
locikey |
a data frame showing the number associated with each locus name for interpreting the MDS configuration plot |
confplotno |
a data frame showing locus name associated with each number on the MDS configuration plots |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu mostly adapted from MDSmap codes, written by Katharine F. Preedy, katharine.preedy@bioss.ac.uk
References
Preedy, K. F., & Hackett, C. A. (2016). A rapid marker ordering approach for high-density genetic linkage maps in experimental autotetraploid populations using multidimensional scaling. _Theoretical and Applied Genetics_, 129(11), 2117-2132. doi:10.1007/s00122-016-2761-8
Examples
s1 <- make_seq_mappoly(hexafake, 1:20)
t1 <- est_pairwise_rf(s1, ncpus = 1)
m1 <- rf_list_to_matrix(t1)
o1 <- get_genomic_order(s1)
s.go <- make_seq_mappoly(o1)
plot(m1, ord = s.go$seq.mrk.names)
mds.ord <- mds_mappoly(m1)
plot(mds.ord)
so <- make_seq_mappoly(mds.ord)
plot(m1, ord = so$seq.mrk.names)
plot(so$seq.num ~ I(so$genome.pos/1e6),
xlab = "Genome Position",
ylab = "MDS position")
Merge datasets
Description
This function merges two datasets of class mappoly.data
. This can be useful
when individuals of a population were genotyped using two or more techniques
and have datasets in different files or formats. Please notice that the datasets
should contain the same number of individuals and they must be represented identically
in both datasets (e.g. Ind_1
in both datasets, not Ind_1
in one dataset and ind_1
or Ind.1
in the other).
Usage
merge_datasets(dat.1 = NULL, dat.2 = NULL)
Arguments
dat.1 |
the first dataset of class |
dat.2 |
the second dataset of class |
Value
An object of class mappoly.data
which contains all markers
from both datasets. It will be a list with the following components:
ploidy |
ploidy level |
n.ind |
number individuals |
n.mrk |
total number of markers |
ind.names |
the names of the individuals |
mrk.names |
the names of the markers |
dosage.p1 |
a vector containing the dosage in
parent P for all |
dosage.p2 |
a vector containing the dosage in
parent Q for all |
chrom |
a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence |
genome.pos |
Physical position of the markers into the sequence |
seq.ref |
if one or both datasets originated from read_vcf, it keeps reference alleles from sequencing platform, otherwise is NULL |
seq.alt |
if one or both datasets originated from read_vcf, it keeps alternative alleles from sequencing platform, otherwise is NULL |
all.mrk.depth |
if one or both datasets originated from read_vcf, it keeps marker read depths from sequencing, otherwise is NULL |
prob.thres |
(unused field) |
geno.dose |
a matrix containing the dosage for each markers (rows)
for each individual (columns). Missing data are represented by
|
geno |
if both datasets contain genotype distribution information, the final object will contain 'geno'. This is set to NULL otherwise |
nphen |
(0) |
phen |
(NULL) |
chisq.pval |
a vector containing p-values related to the chi-squared test of Mendelian segregation performed for all markers in both datasets |
kept |
if elim.redundant = TRUE when reading any dataset, holds all non-redundant markers |
elim.correspondence |
if elim.redundant = TRUE when reading any dataset, holds all non-redundant markers and its equivalence to the redundant ones |
Author(s)
Gabriel Gesteira, gdesiqu@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
## Loading a subset of SNPs from chromosomes 3 and 12 of sweetpotato dataset
## (SNPs anchored to Ipomoea trifida genome)
dat <- NULL
for(i in c(3, 12)){
cat("Loading chromosome", i, "...\n")
tempfl <- tempfile(pattern = paste0("ch", i), fileext = ".vcf.gz")
x <- "https://github.com/mmollina/MAPpoly_vignettes/raw/master/data/sweet_sample_ch"
address <- paste0(x, i, ".vcf.gz")
download.file(url = address, destfile = tempfl)
dattemp <- read_vcf(file = tempfl, parent.1 = "PARENT1", parent.2 = "PARENT2",
ploidy = 6, verbose = FALSE)
dat <- merge_datasets(dat, dattemp)
cat("\n")
}
dat
plot(dat)
Merge two maps
Description
Estimates the linkage phase and recombination fraction between pre-built maps and creates a new map by merging them.
Usage
merge_maps(
map.list,
twopt,
thres.twopt = 10,
genoprob.list = NULL,
thres.hmm = "best",
tol = 1e-04
)
Arguments
map.list |
a list of objects of class |
twopt |
an object of class |
thres.twopt |
the threshold used to determine if the linkage phases compared via two-point analysis should be considered for the search space reduction (default = 3) |
genoprob.list |
a list of objects of class |
thres.hmm |
the threshold used to determine which linkage phase configurations should be returned when merging two maps. If "best" (default), returns only the best linkage phase configuration. NOTE: if merging multiple maps, it always uses the "best" linkage phase configuration at each block insertion. |
tol |
the desired accuracy (default = 10e-04) |
Details
merge_maps
uses two-point information, under a given LOD threshold, to reduce the
linkage phase search space. The remaining linkage phases are tested using the genotype
probabilities.
Value
A list of class mappoly.map
with two elements:
i) info: a list containing information about the map, regardless of the linkage phase configuration:
ploidy |
the ploidy level |
n.mrk |
number of markers |
seq.num |
a vector containing the (ordered) indices of markers in the map, according to the input file |
mrk.names |
the names of markers in the map |
seq.dose.p1 |
a vector containing the dosage in parent 1 for all markers in the map |
seq.dose.p2 |
a vector containing the dosage in parent 2 for all markers in the map |
chrom |
a vector indicating the sequence (usually chromosome) each marker belongs
as informed in the input file. If not available,
|
genome.pos |
physical position (usually in megabase) of the markers into the sequence |
seq.ref |
reference base used for each marker (i.e. A, T, C, G). If not available,
|
seq.alt |
alternative base used for each marker (i.e. A, T, C, G). If not available,
|
chisq.pval |
a vector containing p-values of the chi-squared test of Mendelian segregation for all markers in the map |
data.name |
name of the dataset of class |
ph.thres |
the LOD threshold used to define the linkage phase configurations to test |
ii) a list of maps with possible linkage phase configuration. Each map in the list is also a list containing
seq.num |
a vector containing the (ordered) indices of markers in the map, according to the input file |
seq.rf |
a vector of size ( |
seq.ph |
linkage phase configuration for all markers in both parents |
loglike |
the hmm-based multipoint likelihood |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
Examples
#### Tetraploid example #####
map1 <- get_submap(solcap.dose.map[[1]], 1:5)
map2 <- get_submap(solcap.dose.map[[1]], 6:15)
map3 <- get_submap(solcap.dose.map[[1]], 16:30)
full.map <- get_submap(solcap.dose.map[[1]], 1:30)
s <- make_seq_mappoly(tetra.solcap, full.map$maps[[1]]$seq.num)
twopt <- est_pairwise_rf(input.seq = s)
merged.maps <- merge_maps(map.list = list(map1, map2, map3),
twopt = twopt,
thres.twopt = 3)
plot(merged.maps, mrk.names = TRUE)
plot(full.map, mrk.names = TRUE)
best.phase <- merged.maps$maps[[1]]$seq.ph
names.id <- names(best.phase$P)
compare_haplotypes(ploidy = 4, best.phase$P[names.id],
full.map$maps[[1]]$seq.ph$P[names.id])
compare_haplotypes(ploidy = 4, best.phase$Q[names.id],
full.map$maps[[1]]$seq.ph$Q[names.id])
Build merged parental maps
Description
Build merged parental maps
Usage
merge_parental_maps(
map.p1,
map.p2,
full.seq,
full.mat,
method = c("ols", "hmm"),
verbose = TRUE
)
Arguments
map.p1 |
object of class |
map.p2 |
object of class |
full.seq |
object of class |
full.mat |
object of class |
method |
indicates whether to use 'hmm' (Hidden Markov Models), 'ols' (Ordinary Least Squares) to re-estimate the recombination fractions |
Value
object of class mappoly.map
with both parents information
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu with documentation and minor modifications by Cristiane Taniguti chtaniguti@tamu.edu
Chi-square test
Description
Chi-square test
Usage
mrk_chisq_test(x, ploidy)
Msg function
Description
Msg function
Usage
msg(text, line = 1)
Parallel Pairwise Discrete Estimation
Description
This function performs parallel pairwise estimation of recombination fractions using discrete dosage scoring via a C++ backend.
Usage
paralell_pairwise_discrete(
mrk.pairs,
input.seq,
geno,
dP,
dQ,
count.cache,
tol = .Machine$double.eps^0.25,
ll = ll
)
Arguments
mrk.pairs |
A matrix of dimensions 2*N, containing N pairs of markers to be analyzed. |
input.seq |
An object of class |
geno |
Genotype matrix. |
dP |
Vector of probabilities for the first allele. |
dQ |
Vector of probabilities for the second allele. |
count.cache |
An object of class |
tol |
The tolerance level for the estimation accuracy (default is |
ll |
Logical; if TRUE, the function returns log-likelihood values instead of LOD scores. For internal use. |
Value
Depending on the ll
parameter, returns either log-likelihood values or formatted LOD scores from pairwise recombination fraction estimation.
Wrapper function to discrete-based pairwise two-point estimation in C++
Description
Wrapper function to discrete-based pairwise two-point estimation in C++
Usage
paralell_pairwise_discrete_rcpp(
mrk.pairs,
m,
geno,
dP,
dQ,
count.vector,
count.phases,
count.matrix.rownames,
count.matrix.number,
count.matrix.pos,
count.matrix.length,
tol = .Machine$double.eps^0.25
)
Parallel Pairwise Probability Estimation
Description
This function performs parallel pairwise estimation of recombination fractions using probability-based dosage scoring via a C++ backend.
Usage
paralell_pairwise_probability(
mrk.pairs,
input.seq,
geno,
dP,
dQ,
count.cache,
tol = .Machine$double.eps^0.25,
ll = ll
)
Arguments
mrk.pairs |
A matrix of dimensions 2*N, containing N pairs of markers to be analyzed. |
input.seq |
An object of class |
geno |
Genotype matrix. |
dP |
Vector of probabilities for the first allele. |
dQ |
Vector of probabilities for the second allele. |
count.cache |
An object of class |
tol |
The tolerance level for the estimation accuracy (default is |
ll |
Logical; if TRUE, the function returns log-likelihood values instead of LOD scores. For internal use. |
Value
Depending on the ll
parameter, returns either log-likelihood values or formatted LOD scores from pairwise recombination fraction estimation.
Auxiliary function to estimate a map in a block of markers using parallel processing
Description
Auxiliary function to estimate a map in a block of markers using parallel processing
Usage
parallel_block(
mrk.vec,
dat.name,
ph.thres = 3,
tol = 0.01,
phase.number.limit = 20,
error = 0.05,
tol.err = 0.001,
verbose = FALSE
)
N!/2 combination
Description
N!/2 combination
Usage
perm_pars(v)
N! combination
Description
N! combination
Usage
perm_tot(v)
Linkage phase format conversion: list to matrix
Description
This function converts linkage phase configurations from list to matrix form
Usage
ph_list_to_matrix(L, ploidy)
Arguments
L |
a list of configuration phases |
ploidy |
ploidy level |
Value
a matrix whose columns represent homologous chromosomes and the rows represent markers
Linkage phase format conversion: matrix to list
Description
This function converts linkage phase configurations from matrix form to list
Usage
ph_matrix_to_list(M)
Arguments
M |
matrix whose columns represent homologous chromosomes and the rows represent markers |
Value
a list of linkage phase configurations
Plots mappoly.homoprob
Description
Plots mappoly.homoprob
Usage
## S3 method for class 'mappoly.homoprob'
plot(
x,
stack = FALSE,
lg = NULL,
ind = NULL,
use.plotly = TRUE,
verbose = TRUE,
...
)
Arguments
x |
an object of class |
stack |
logical. If |
lg |
indicates which linkage group should be plotted. If |
ind |
indicates which individuals should be plotted. It can be the
position of the individuals in the dataset or it's name.
If |
use.plotly |
if |
verbose |
if |
... |
unused arguments |
Plots mappoly.prefpair.profiles
Description
Plots mappoly.prefpair.profiles
Usage
## S3 method for class 'mappoly.prefpair.profiles'
plot(
x,
type = c("pair.configs", "hom.pairs"),
min.y.prof = 0,
max.y.prof = 1,
thresh = 0.01,
P1 = "P1",
P2 = "P2",
...
)
Arguments
x |
an object of class |
type |
a character string indicating which type of graphic is plotted:
|
min.y.prof |
lower bound for y axis on the probability profile graphic (default = 0) |
max.y.prof |
upper bound for y axis on the probability profile graphic (default = 1) |
thresh |
threshold for chi-square test (default = 0.01) |
P1 |
a string containing the name of parent P1 |
P2 |
a string containing the name of parent P2 |
... |
unused arguments |
Genotypic information content
Description
This function plots the genotypic information content given
an object of class mappoly.homoprob
.
Usage
plot_GIC(hprobs, P = "P1", Q = "P2")
Arguments
hprobs |
an object of class |
P |
a string containing the name of parent P |
Q |
a string containing the name of parent Q |
Examples
w <- lapply(solcap.err.map[1:3], calc_genoprob)
h.prob <- calc_homologprob(w)
plot_GIC(h.prob)
Plot Two Overlapped Haplotypes
Description
This function plots two sets of haplotypes for comparison, allowing for visual inspection of homologous allele patterns across two groups or conditions. It is designed to handle and display genetic data for organisms with varying ploidy levels.
Usage
plot_compare_haplotypes(
ploidy,
hom.allele.p1,
hom.allele.q1,
hom.allele.p2 = NULL,
hom.allele.q2 = NULL
)
Arguments
ploidy |
Integer, specifying the ploidy level of the organism being represented. |
hom.allele.p1 |
A list where each element represents the alleles for a marker in the first haplotype group, for 'p' parent. |
hom.allele.q1 |
A list where each element represents the alleles for a marker in the first haplotype group, for 'q' parent. |
hom.allele.p2 |
Optionally, a list where each element represents the alleles for a marker in the second haplotype group, for 'p' parent. |
hom.allele.q2 |
Optionally, a list where each element represents the alleles for a marker in the second haplotype group, for 'q' parent. |
Details
The function creates a graphical representation of haplotypes, where each marker's alleles are plotted along a line for each parent ('p' and 'q'). If provided, the second set of haplotypes (for comparison) are overlaid on the same plot. This allows for direct visual comparison of allele presence or absence across the two sets. Different colors are used to distinguish between the first and second sets of haplotypes.
The function uses several internal helper functions ('ph_list_to_matrix' and 'ph_matrix_to_list') to manipulate haplotype data. These functions should correctly handle the conversion between list and matrix representations of haplotype data.
Value
Invisible. The function primarily generates a plot for visual analysis and does not return any data.
Physical versus genetic distance
Description
This function plots scatterplot(s) of physical distance (in Mbp) versus the genetic
distance (in cM). Map(s) should be passed as a single object or a list of objects
of class mappoly.map
.
Usage
plot_genome_vs_map(
map.list,
phase.config = "best",
same.ch.lg = FALSE,
alpha = 1/5,
size = 3
)
Arguments
map.list |
A list or a single object of class |
phase.config |
A vector containing which phase configuration should be
plotted. If |
same.ch.lg |
Logical. If |
alpha |
transparency factor for SNPs points |
size |
size of the SNP points |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
plot_genome_vs_map(solcap.mds.map, same.ch.lg = TRUE)
plot_genome_vs_map(solcap.mds.map, same.ch.lg = FALSE,
alpha = 1, size = 1/2)
Plot a genetic map
Description
This function plots a genetic linkage map(s) generated by MAPpoly
.
The map(s) should be passed as a single object or a list of objects of class mappoly.map
.
Usage
plot_map_list(
map.list,
horiz = TRUE,
col = "lightgray",
title = "Linkage group"
)
Arguments
map.list |
A list of objects or a single object of class |
horiz |
logical. If FALSE, the maps are plotted vertically with the first map to the left. If TRUE (default), the maps are plotted horizontally with the first at the bottom |
col |
a vector of colors for each linkage group. (default = 'lightgray')
|
title |
a title (string) for the maps (default = 'Linkage group') |
Value
A data.frame
object containing the name of the markers and their genetic position
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
## hexafake map
plot_map_list(maps.hexafake, horiz = FALSE)
plot_map_list(maps.hexafake, col = c("#999999", "#E69F00", "#56B4E9"))
## solcap map
plot_map_list(solcap.dose.map, col = "ggstyle")
plot_map_list(solcap.dose.map, col = "mp_pallet3", horiz = FALSE)
Plot object mappoly.map2
Description
Plot object mappoly.map2
Usage
plot_mappoly.map2(x)
Arguments
x |
object of class |
Plot marker information
Description
Plots summary statistics for a given marker
Usage
plot_mrk_info(input.data, mrk)
Arguments
input.data |
an object of class |
mrk |
marker name or position in the dataset |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
plot_mrk_info(tetra.solcap.geno.dist, 2680)
plot_mrk_info(tetra.solcap.geno.dist, "solcap_snp_c2_23828")
plot a single linkage group with no phase
Description
plot a single linkage group with no phase
Usage
plot_one_map(x, i = 0, horiz = FALSE, col = "lightgray")
Display genotypes imputed or changed by the HMM chain given a global genotypic error
Description
Outputs a graphical representation ggplot with the percent of data changed.
Usage
plot_progeny_dosage_change(
map_list,
error,
verbose = TRUE,
output_corrected = FALSE
)
Arguments
map_list |
a list of multiple |
error |
error rate used in global error in the 'calc_genoprob_error()' |
verbose |
if TRUE (default), current progress is shown; if FALSE, no output is produced |
output_corrected |
logical. if FALSE only the ggplot of the changed dosage is printed, if TRUE then a new corrected dosage matrix is output. |
Value
A ggplot of the changed and imputed genotypic dosages
Author(s)
Jeekin Lau, jzl0026@tamu.edu, with optimization by Cristiane Taniguti, chtaniguti@tamu.edu
Examples
x <- get_submap(solcap.err.map[[1]], 1:30, reestimate.rf = FALSE)
plot_progeny_dosage_change(list(x), error=0.05, output_corrected=FALSE)
corrected_matrix <- plot_progeny_dosage_change(list(x), error=0.05,
output_corrected=FALSE) #output corrected
Estimate genetic map using as input the probability distribution of genotypes (wrapper function to C++)
Description
Estimate genetic map using as input the probability distribution of genotypes (wrapper function to C++)
Usage
poly_hmm_est(
ploidy,
n.mrk,
n.ind,
p,
dp,
q,
dq,
g,
rf,
verbose = TRUE,
tol = 0.001
)
prepare maps for plot
Description
prepare maps for plot
Usage
prepare_map(input.map, config = "best")
Summary of a set of markers
Description
Returns information related to a given set of markers
Usage
print_mrk(input.data, mrks)
Arguments
input.data |
an object |
mrks |
marker sequence index (integer vector) |
Examples
print_mrk(tetra.solcap.geno.dist, 1:5)
print_mrk(hexafake, 256)
cat for graphical representation of the phases
Description
cat for graphical representation of the phases
Usage
print_ph(input.ph)
Data Input in fitPoly format
Description
Reads an external data file generated as output of saveMarkerModels
.
This function creates an object of class mappoly.data
.
Usage
read_fitpoly(
file.in,
ploidy,
parent1,
parent2,
offspring = NULL,
filter.non.conforming = TRUE,
elim.redundant = TRUE,
parent.geno = c("joint", "max"),
thresh.parent.geno = 0.95,
prob.thres = 0.95,
file.type = c("table", "csv"),
verbose = TRUE
)
Arguments
file.in |
a character string with the name of (or full path to) the input file |
ploidy |
the ploidy level |
parent1 |
a character string containing the name (or pattern of genotype IDs) of parent 1 |
parent2 |
a character string containing the name (or pattern of genotype IDs) of parent 2 |
offspring |
a character string containing the name (or pattern of genotype IDs) of the offspring
individuals. If |
filter.non.conforming |
if |
elim.redundant |
logical. If |
parent.geno |
indicates whether to use the joint probability |
thresh.parent.geno |
threshold probability to assign a dosage to parents. If the probability
is smaller than |
prob.thres |
threshold probability to assign a dosage to offspring. If the probability
is smaller than |
file.type |
indicates whether the characters in the input file are separated by 'white spaces' ("table") or by commas ("csv"). |
verbose |
if |
Value
An object of class mappoly.data
which contains a
list with the following components:
ploidy |
ploidy level |
n.ind |
number individuals |
n.mrk |
total number of markers |
ind.names |
the names of the individuals |
mrk.names |
the names of the markers |
dosage.p1 |
a vector containing the dosage in
parent P for all |
dosage.p2 |
a vector containing the dosage in
parent Q for all |
chrom |
a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence |
genome.pos |
Physical position of the markers into the sequence |
seq.ref |
NULL (unused in this type of data) |
seq.alt |
NULL (unused in this type of data) |
all.mrk.depth |
NULL (unused in this type of data) |
geno.dose |
a matrix containing the dosage for each markers (rows)
for each individual (columns). Missing data are represented by
|
n.phen |
number of phenotypic traits |
phen |
a matrix containing the phenotypic data. The rows correspond to the traits and the columns correspond to the individuals |
kept |
if elim.redundant = TRUE, holds all non-redundant markers |
elim.correspondence |
if elim.redundant = TRUE, holds all non-redundant markers and its equivalence to the redundant ones |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Voorrips, R.E., Gort, G. & Vosman, B. (2011) Genotype calling in tetraploid species from bi-allelic marker data using mixture models. _BMC Bioinformatics_. doi:10.1186/1471-2105-12-172
Examples
#### Tetraploid Example
ft <- "https://raw.githubusercontent.com/mmollina/MAPpoly_vignettes/master/data/fitpoly.dat"
tempfl <- tempfile()
download.file(ft, destfile = tempfl)
fitpoly.dat <- read_fitpoly(file.in = tempfl, ploidy = 4,
parent1 = "P1", parent2 = "P2",
verbose = TRUE)
print(fitpoly.dat, detailed = TRUE)
plot(fitpoly.dat)
plot_mrk_info(fitpoly.dat, 37)
Data Input
Description
Reads an external data file. The format of the file is described in the Details
section. This function creates an object of class mappoly.data
Usage
read_geno(
file.in,
filter.non.conforming = TRUE,
elim.redundant = TRUE,
verbose = TRUE
)
## S3 method for class 'mappoly.data'
print(x, detailed = FALSE, ...)
## S3 method for class 'mappoly.data'
plot(x, thresh.line = 1e-05, ...)
Arguments
file.in |
a character string with the name of (or full path to) the input file which contains the data to be read |
filter.non.conforming |
if |
elim.redundant |
logical. If |
verbose |
if |
x |
an object of class |
detailed |
if available, print the number of markers per sequence (default = FALSE) |
... |
currently ignored |
thresh.line |
position of a threshold line for p values of the segregation test (default = 10e-06) |
Details
The first line of the input file contains the string ploidy
followed by the ploidy level of the parents.
The second and third lines contain the strings n.ind
and n.mrk
followed by the number of individuals in
the dataset and the total number of markers, respectively. Lines number 4 and 5 contain the strings
mrk.names
and ind.names
followed by a sequence of the names of the markers and the name of the individuals,
respectively. Lines 6 and 7 contain the strings dosageP
and dosageQ
followed by a sequence of numbers
containing the dosage of all markers in parent P
and Q
. Line 8, contains the string seq followed by
a sequence of integer numbers indicating the chromosome each marker belongs. It can be any 'a priori'
information regarding the physical distance between markers. For example, these numbers could refer
to chromosomes, scaffolds or even contigs, in which the markers are positioned. If this information
is not available for a particular marker, NA should be used. If this information is not available for
any of the markers, the string seq
should be followed by a single NA
. Line number 9 contains the string
seqpos
followed by the physical position of the markers into the sequence. The physical position can be
given in any unity of physical genomic distance (base pairs, for instance). However, the user should be
able to make decisions based on these values, such as the occurrence of crossing overs, etc. Line number 10
should contain the string nphen
followed by the number of phenotypic traits. Line number 11 is skipped
(Usually used as a spacer). The next elements are strings containing the name of the phenotypic trait with no space characters
followed by the phenotypic values. The number of lines should be the same number of phenotypic traits.
NA
represents missing values. The line number 12 + nphen
is skipped. Finally, the last element is a table
containing the dosage for each marker (rows) for each individual (columns). NA
represents missing values.
Value
An object of class mappoly.data
which contains a
list with the following components:
ploidy |
ploidy level |
n.ind |
number individuals |
n.mrk |
total number of markers |
ind.names |
the names of the individuals |
mrk.names |
the names of the markers |
dosage.p1 |
a vector containing the dosage in
parent P for all |
dosage.p2 |
a vector containing the dosage in
parent Q for all |
chrom |
a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence |
genome.pos |
Physical position of the markers into the sequence |
seq.ref |
NULL (unused in this type of data) |
seq.alt |
NULL (unused in this type of data) |
all.mrk.depth |
NULL (unused in this type of data) |
geno.dose |
a matrix containing the dosage for each markers (rows)
for each individual (columns). Missing data are represented by
|
n.phen |
number of phenotypic traits |
phen |
a matrix containing the phenotypic data. The rows correspond to the traits and the columns correspond to the individuals |
kept |
if elim.redundant = TRUE, holds all non-redundant markers |
elim.correspondence |
if elim.redundant = TRUE, holds all non-redundant markers and its equivalence to the redundant ones |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari M., Olukolu B. A., Pereira G. da S., Khan A., Gemenet D., Yencho G. C., Zeng Z-B. (2020), Unraveling the Hexaploid Sweetpotato Inheritance Using Ultra-Dense Multilocus Mapping, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400620
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
#### Tetraploid Example
fl1 = "https://raw.githubusercontent.com/mmollina/MAPpoly_vignettes/master/data/SolCAP_dosage"
tempfl <- tempfile()
download.file(fl1, destfile = tempfl)
SolCAP.dose <- read_geno(file.in = tempfl)
print(SolCAP.dose, detailed = TRUE)
plot(SolCAP.dose)
Data Input in CSV format
Description
Reads an external comma-separated values (CSV) data file. The format of the file is described in the Details
section. This function creates an object of class mappoly.data
.
Usage
read_geno_csv(
file.in,
ploidy,
filter.non.conforming = TRUE,
elim.redundant = TRUE,
verbose = TRUE
)
Arguments
file.in |
a character string with the name of (or full path to) the input file containing the data to be read |
ploidy |
the ploidy level |
filter.non.conforming |
if |
elim.redundant |
logical. If |
verbose |
if |
Details
This is an alternative and a somewhat more straightforward version of the function
read_geno
. The input is a standard CSV file where the rows
represent the markers, except for the first row which is used as a header.
The first five columns contain the marker names, the dosage in parents 1 and 2,
the chromosome information (i.e. chromosome, scaffold, contig, etc) and the
position of the marker within the sequence. The remaining columns contain
the dosage of the full-sib population. A tetraploid example of such file
can be found in the Examples
section.
Value
An object of class mappoly.data
which contains a
list with the following components:
ploidy |
ploidy level |
n.ind |
number individuals |
n.mrk |
total number of markers |
ind.names |
the names of the individuals |
mrk.names |
the names of the markers |
dosage.p1 |
a vector containing the dosage in
parent P for all |
dosage.p2 |
a vector containing the dosage in
parent Q for all |
chrom |
a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence |
genome.pos |
Physical position of the markers into the sequence |
seq.ref |
NULL (unused in this type of data) |
seq.alt |
NULL (unused in this type of data) |
all.mrk.depth |
NULL (unused in this type of data) |
geno.dose |
a matrix containing the dosage for each markers (rows)
for each individual (columns). Missing data are represented by
|
n.phen |
number of phenotypic traits |
phen |
a matrix containing the phenotypic data. The rows correspond to the traits and the columns correspond to the individuals |
kept |
if elim.redundant = TRUE, holds all non-redundant markers |
elim.correspondence |
if elim.redundant = TRUE, holds all non-redundant markers and its equivalence to the redundant ones |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu, with minor changes by Gabriel Gesteira, gdesiqu@ncsu.edu
References
Mollinari M., Olukolu B. A., Pereira G. da S., Khan A., Gemenet D., Yencho G. C., Zeng Z-B. (2020), Unraveling the Hexaploid Sweetpotato Inheritance Using Ultra-Dense Multilocus Mapping, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400620
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
#### Tetraploid Example
ft = "https://raw.githubusercontent.com/mmollina/MAPpoly_vignettes/master/data/tetra_solcap.csv"
tempfl <- tempfile()
download.file(ft, destfile = tempfl)
SolCAP.dose <- read_geno_csv(file.in = tempfl, ploidy = 4)
print(SolCAP.dose, detailed = TRUE)
plot(SolCAP.dose)
Data Input
Description
Reads an external data file. The format of the file is described in the Details
section. This function creates an object of class mappoly.data
Usage
read_geno_prob(
file.in,
prob.thres = 0.95,
filter.non.conforming = TRUE,
elim.redundant = TRUE,
verbose = TRUE
)
Arguments
file.in |
a character string with the name of (or full path to) the input file which contains the data to be read |
prob.thres |
probability threshold to associate a marker call to a
dosage. Markers with maximum genotype probability smaller than |
filter.non.conforming |
if |
elim.redundant |
logical. If |
verbose |
if |
Details
The first line of the input file contains the string ploidy
followed by the ploidy level of the parents.
The second and third lines contains the strings n.ind
and n.mrk
followed by the number of individuals in
the dataset and the total number of markers, respectively. Lines number 4 and 5 contain the string
mrk.names
and ind.names
followed by a sequence of the names of the markers and the name of the individuals,
respectively. Lines 6 and 7 contain the strings dosageP
and dosageQ
followed by a sequence of numbers
containing the dosage of all markers in parent P
and Q
. Line 8, contains the string seq followed by
a sequence of integer numbers indicating the chromosome each marker belongs. It can be any 'a priori'
information regarding the physical distance between markers. For example, these numbers could refer
to chromosomes, scaffolds or even contigs, in which the markers are positioned. If this information
is not available for a particular marker, NA should be used. If this information is not available for
any of the markers, the string seq
should be followed by a single NA
. Line number 9 contains the string
seqpos
followed by the physical position of the markers into the sequence. The physical position can be
given in any unity of physical genomic distance (base pairs, for instance). However, the user should be
able to make decisions based on these values, such as the occurrence of crossing overs, etc. Line number 10
should contain the string nphen
followed by the number of phenotypic traits. Line number 11 is skipped
(Usually used as a spacer). The next elements are strings containing the name of the phenotypic trait with no space characters
followed by the phenotypic values. The number of lines should be the same number of phenotypic traits.
NA
represents missing values. The line number 12 + nphen
is skipped. Finally, the last element is a table
containing the probability distribution for each combination of marker and offspring. The first two columns
represent the marker and the offspring, respectively. The remaining elements represent the probability
associated with each one of the possible dosages. NA
represents missing data.
Value
an object of class mappoly.data
which contains a
list with the following components:
ploidy |
ploidy level |
n.ind |
number individuals |
n.mrk |
total number of markers |
ind.names |
the names of the individuals |
mrk.names |
the names of the markers |
dosage.p1 |
a vector containing the dosage in
parent P for all |
dosage.p2 |
a vector containing the dosage in
parent Q for all |
chrom |
a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence |
genome.pos |
physical position of the markers into the sequence |
seq.ref |
NULL (unused in this type of data) |
seq.alt |
NULL (unused in this type of data) |
all.mrk.depth |
NULL (unused in this type of data) |
prob.thres |
probability threshold to associate a marker call to a dosage. Markers with maximum genotype probability smaller than 'prob.thres' were considered as missing data in the 'geno.dose' matrix |
geno.dose |
a matrix containing the dosage for each markers (rows)
for each individual (columns). Missing data are represented by
|
geno |
a data.frame
containing the probability distribution for each combination of
marker and offspring. The first two columns represent the marker
and the offspring, respectively. The remaining elements represent
the probability associated to each one of the possible
dosages. Missing data are converted from NA to the expected
segregation ratio using function |
n.phen |
number of phenotypic traits |
phen |
a matrix containing the phenotypic data. The rows correspond to the traits and the columns correspond to the individuals |
chisq.pval |
a vector containing p-values related to the chi-squared test of Mendelian segregation performed for all markers |
kept |
if elim.redundant = TRUE, holds all non-redundant markers |
elim.correspondence |
if elim.redundant = TRUE, holds all non-redundant markers and its equivalence to the redundant ones |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari M., Olukolu B. A., Pereira G. da S., Khan A., Gemenet D., Yencho G. C., Zeng Z-B. (2020), Unraveling the Hexaploid Sweetpotato Inheritance Using Ultra-Dense Multilocus Mapping, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400620
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
#### Tetraploid Example
ft = "https://raw.githubusercontent.com/mmollina/MAPpoly_vignettes/master/data/hexa_sample"
tempfl <- tempfile()
download.file(ft, destfile = tempfl)
SolCAP.dose.prob <- read_geno_prob(file.in = tempfl)
print(SolCAP.dose.prob, detailed = TRUE)
plot(SolCAP.dose.prob)
Data Input VCF
Description
Reads an external VCF file and creates an object of class mappoly.data
Usage
read_vcf(
file.in,
parent.1,
parent.2,
ploidy = NA,
filter.non.conforming = TRUE,
thresh.line = 0.05,
min.gt.depth = 0,
min.av.depth = 0,
max.missing = 1,
elim.redundant = TRUE,
verbose = TRUE,
read.geno.prob = FALSE,
prob.thres = 0.95
)
Arguments
file.in |
a character string with the name of (or full path to) the input file which contains the data (VCF format) |
parent.1 |
a character string containing the name of parent 1 |
parent.2 |
a character string containing the name of parent 2 |
ploidy |
the species ploidy (optional, it will be automatically detected) |
filter.non.conforming |
if |
thresh.line |
threshold used for p-values on segregation test (default = 0.05) |
min.gt.depth |
minimum genotype depth to keep information.
If the genotype depth is below |
min.av.depth |
minimum average depth to keep markers (default = 0) |
max.missing |
maximum proportion of missing data to keep markers (range = 0-1; default = 1) |
elim.redundant |
logical. If |
verbose |
if |
read.geno.prob |
If genotypic probabilities are available (PL field),
generates a probability-based dataframe (default = |
prob.thres |
probability threshold to associate a marker call to a
dosage. Markers with maximum genotype probability smaller than |
Details
This function can handle .vcf files versions 4.0 or higher. The ploidy can be automatically detected, but it is highly recommended that you inform it to check for mismatches. All individual and marker names will be kept as they are in the .vcf file.
Value
An object of class mappoly.data
which contains a
list with the following components:
ploidy |
ploidy level |
n.ind |
number individuals |
n.mrk |
total number of markers |
ind.names |
the names of the individuals |
mrk.names |
the names of the markers |
dosage.p1 |
a vector containing the dosage in
parent P for all |
dosage.p2 |
a vector containing the dosage in
parent Q for all |
chrom |
a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence |
genome.pos |
Physical position of the markers into the sequence |
seq.ref |
Reference base used for each marker (i.e. A, T, C, G) |
seq.alt |
Alternative base used for each marker (i.e. A, T, C, G) |
prob.thres |
(unused field) |
geno.dose |
a matrix containing the dosage for each markers (rows)
for each individual (columns). Missing data are represented by
|
geno |
a dataframe containing all genotypic probabilities columns for each
marker and individual combination (rows). Missing data are represented by
|
nphen |
(unused field) |
phen |
(unused field) |
all.mrk.depth |
DP information for all markers on VCF file |
chisq.pval |
a vector containing p-values related to the chi-squared test of Mendelian segregation performed for all markers |
kept |
if elim.redundant = TRUE, holds all non-redundant markers |
elim.correspondence |
if elim.redundant = TRUE, holds all non-redundant markers and its equivalence to the redundant ones |
Author(s)
Gabriel Gesteira, gdesiqu@ncsu.edu
References
Mollinari M., Olukolu B. A., Pereira G. da S., Khan A., Gemenet D., Yencho G. C., Zeng Z-B. (2020), Unraveling the Hexaploid Sweetpotato Inheritance Using Ultra-Dense Multilocus Mapping, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400620
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
## Hexaploid sweetpotato: Subset of chromosome 3
fl = "https://github.com/mmollina/MAPpoly_vignettes/raw/master/data/sweet_sample_ch3.vcf.gz"
tempfl <- tempfile(pattern = 'chr3_', fileext = '.vcf.gz')
download.file(fl, destfile = tempfl)
dat.dose.vcf = read_vcf(file = tempfl, parent.1 = "PARENT1", parent.2 = "PARENT2")
print(dat.dose.vcf)
plot(dat.dose.vcf)
Re-estimate the recombination fractions in a genetic map
Description
This function re-estimates the recombination fractions between all markers in a given map.
Usage
reest_rf(
input.map,
input.mat = NULL,
tol = 0.01,
phase.config = "all",
method = c("hmm", "ols", "wMDS_to_1D_pc"),
weight = TRUE,
verbose = TRUE,
high.prec = FALSE,
max.rf.to.break.EM = 0.5,
input.mds = NULL
)
Arguments
input.map |
An object of class |
input.mat |
An object of class |
tol |
tolerance for determining convergence (default = 10e-03) |
phase.config |
which phase configuration should be used. "best" (default) will choose the maximum likelihood configuration |
method |
indicates whether to use |
weight |
if |
verbose |
if |
high.prec |
logical. If |
max.rf.to.break.EM |
for internal use only. |
input.mds |
An object of class |
Value
An updated object of class mappoly.pcmap
whose
order was used in the input.map
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Stam P (1993) Construction of integrated genetic-linkage maps by means of a new computer package: Joinmap. _Plant J_ 3:739–744 doi:10.1111/j.1365-313X.1993.00739.x
Reverse map
Description
Provides the reverse of a given map.
Usage
rev_map(input.map)
Arguments
input.map |
an object of class |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
Examples
plot_genome_vs_map(solcap.mds.map[[1]])
plot_genome_vs_map(rev_map(solcap.mds.map[[1]]))
Recombination fraction list to matrix
Description
Transforms the recombination fraction list contained in an object
of class mappoly.twopt
or mappoly.twopt2
into a recombination
fraction matrix
Usage
rf_list_to_matrix(
input.twopt,
thresh.LOD.ph = 0,
thresh.LOD.rf = 0,
thresh.rf = 0.5,
ncpus = 1L,
shared.alleles = FALSE,
verbose = TRUE
)
## S3 method for class 'mappoly.rf.matrix'
print(x, ...)
## S3 method for class 'mappoly.rf.matrix'
plot(
x,
type = c("rf", "lod"),
ord = NULL,
rem = NULL,
main.text = NULL,
index = FALSE,
fact = 1,
...
)
Arguments
input.twopt |
an object of class |
thresh.LOD.ph |
LOD score threshold for linkage phase configurations (default = 0) |
thresh.LOD.rf |
LOD score threshold for recombination fractions (default = 0) |
thresh.rf |
the threshold used for recombination fraction filtering (default = 0.5) |
ncpus |
number of parallel processes (i.e. cores) to spawn (default = 1) |
shared.alleles |
if |
verbose |
if |
x |
an object of class |
... |
currently ignored |
type |
type of matrix that should be printed. Can be one of the
following: |
ord |
the order in which the markers should be plotted (default = NULL) |
rem |
which markers should be removed from the heatmap (default = NULL) |
main.text |
a character string as the title of the heatmap (default = NULL) |
index |
|
fact |
positive integer. factor expressed as number of cells to be aggregated (default = 1, no aggregation) |
Details
thresh_LOD_ph
should be set in order to only select
recombination fractions that have LOD scores associated to the
linkage phase configuration higher than thresh_LOD_ph
when compared to the second most likely linkage phase configuration.
Value
A list containing two matrices. The first one contains the filtered recombination fraction and the second one contains the information matrix
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
all.mrk <- make_seq_mappoly(hexafake, 1:20)
red.mrk <- elim_redundant(all.mrk)
unique.mrks <- make_seq_mappoly(red.mrk)
all.pairs <- est_pairwise_rf(input.seq = unique.mrks,
ncpus = 1,
verbose = TRUE)
## Full recombination fraction matrix
mat.full <- rf_list_to_matrix(input.twopt = all.pairs)
plot(mat.full)
plot(mat.full, type = "lod")
Remove markers that do not meet a LOD criteria
Description
Remove markers that do not meet a LOD and recombination fraction criteria for at least a percentage of the pairwise marker combinations. It also removes markers with strong evidence of linkage across the whole linkage group (false positive).
Usage
rf_snp_filter(
input.twopt,
thresh.LOD.ph = 5,
thresh.LOD.rf = 5,
thresh.rf = 0.15,
probs = c(0.05, 1),
diag.markers = NULL,
mrk.order = NULL,
ncpus = 1L,
diagnostic.plot = TRUE,
breaks = 100
)
Arguments
input.twopt |
an object of class |
thresh.LOD.ph |
LOD score threshold for linkage phase configuration (default = 5) |
thresh.LOD.rf |
LOD score threshold for recombination fraction (default = 5) |
thresh.rf |
threshold for recombination fractions (default = 0.15) |
probs |
indicates the probability corresponding to the filtering quantiles. (default = c(0.05, 1)) |
diag.markers |
A window where marker pairs should be considered. If NULL (default), all markers are considered. |
mrk.order |
marker order. Only has effect if 'diag.markers' is not NULL |
ncpus |
number of parallel processes (i.e. cores) to spawn (default = 1) |
diagnostic.plot |
if |
breaks |
number of cells for the histogram |
Details
thresh.LOD.ph
should be set in order to only select
recombination fractions that have LOD scores associated to the
linkage phase configuration higher than thresh_LOD_ph
when compared to the second most likely linkage phase configuration.
That action usually eliminates markers that are unlinked to the
set of analyzed markers.
Value
A filtered object of class mappoly.sequence
.
See make_seq_mappoly
for details
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu with updates by Gabriel Gesteira, gdesiqu@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
all.mrk <- make_seq_mappoly(hexafake, 1:20)
red.mrk <- elim_redundant(all.mrk)
unique.mrks <- make_seq_mappoly(red.mrk)
all.pairs <- est_pairwise_rf(input.seq = unique.mrks,
ncpus = 1,
verbose = TRUE)
## Full recombination fraction matrix
mat.full <- rf_list_to_matrix(input.twopt = all.pairs)
plot(mat.full)
## Removing disruptive SNPs
tpt.filt <- rf_snp_filter(all.pairs, 2, 2, 0.07, probs = c(0.15, 1))
p1.filt <- make_pairs_mappoly(input.seq = tpt.filt, input.twopt = all.pairs)
m1.filt <- rf_list_to_matrix(input.twopt = p1.filt)
plot(mat.full, main.text = "LG1")
plot(m1.filt, main.text = "LG1.filt")
Random sampling of dataset
Description
Random sampling of dataset
Usage
sample_data(
input.data,
n = NULL,
percentage = NULL,
type = c("individual", "marker"),
selected.ind = NULL,
selected.mrk = NULL
)
Arguments
input.data |
an object of class |
n |
number of individuals or markers to be sampled |
percentage |
if |
type |
should sample individuals or markers? |
selected.ind |
a vector containing the name of the individuals to select. Only has effect
if |
selected.mrk |
a vector containing the name of the markers to select. Only has effect
if |
Value
an object of class mappoly.data
Polysomic segregation frequency
Description
Computes the polysomic segregation frequency given a ploidy level and the dosage of the locus in both parents. It does not consider double reduction.
Usage
segreg_poly(ploidy, dP, dQ)
Arguments
ploidy |
the ploidy level |
dP |
the dosage in parent P |
dQ |
the dosage in parent Q |
Value
a vector containing the expected segregation frequency for all possible genotypic classes.
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Serang O, Mollinari M, Garcia AAF (2012) Efficient Exact Maximum a Posteriori Computation for Bayesian SNP Genotyping in Polyploids. _PLoS ONE_ 7(2): e30906.
Examples
# autohexaploid with two and three doses in parents P and Q,
# respectively
seg <- segreg_poly(ploidy = 6, dP = 2, dQ = 3)
barplot(seg, las = 2)
Select rf and lod based on thresholds
Description
Select rf and lod based on thresholds
Usage
select_rf(x, thresh.LOD.ph, thresh.LOD.rf, thresh.rf, shared.alleles = FALSE)
Simulate mapping population (one parent)
Description
This function simulates a polyploid mapping population under random chromosome segregation with one informative parent. This function is not to be directly called by the user
Usage
sim_cross_one_informative_parent(
ploidy,
n.mrk,
rf.vec,
hom.allele,
n.ind,
seed = NULL,
prob = NULL
)
Simulate mapping population (tow parents)
Description
Simulate mapping population (tow parents)
Usage
sim_cross_two_informative_parents(
ploidy,
n.mrk,
rf.vec,
n.ind,
hom.allele.p,
hom.allele.q,
prob.P = NULL,
prob.Q = NULL,
seed = NULL
)
Simulate homology groups
Description
Simulate two homology groups (one for each parent) and their linkage phase configuration.
Usage
sim_homologous(ploidy, n.mrk, prob.dose = NULL, seed = NULL)
Arguments
ploidy |
ploidy level. Must be an even number |
n.mrk |
number of markers |
prob.dose |
a vector indicating the proportion of markers for different dosage to be simulated (default = NULL) |
seed |
random number generator seed |
Details
This function prevents the simulation of linkage phase configurations which are impossible to estimate via two point methods
Value
a list containing the following components:
hom.allele.p |
a list of vectors
containing linkage phase configurations. Each vector contains the
numbers of the homologous chromosomes in which the alleles are
located. For instance, a vector containing |
p |
contains the indices of the starting positions of the
dosages, considering that the vectors contained in |
hom.allele.q |
Analogously to |
q |
Analogously to |
ploidy |
ploidy level |
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
h.temp <- sim_homologous(ploidy = 6, n.mrk = 20)
Resulting maps from tetra.solcap
Description
A list containing 12 linkage groups estimated using genomic order and dosage call
Usage
solcap.dose.map
Format
A list containing 12 objects of class mappoly.map
, each one
representing one linkage group in the tetra.solcap
dataset.
Resulting maps from tetra.solcap
Description
A list containing 12 linkage groups estimated using genomic order, dosage call and global calling error
Usage
solcap.err.map
Format
A list containing 12 objects of class mappoly.map
, each one
representing one linkage group in the tetra.solcap
dataset.
Resulting maps from tetra.solcap
Description
A list containing 12 linkage groups estimated using mds_mappoly
order and dosage call
Usage
solcap.mds.map
Format
A list containing 12 objects of class mappoly.map
, each one
representing one linkage group in the tetra.solcap
dataset.
Resulting maps from tetra.solcap.geno.dist
Description
A list containing 12 linkage groups estimated using genomic order and prior probability distribution
Usage
solcap.prior.map
Format
A list containing 12 objects of class mappoly.map
, each one
representing one linkage group in the tetra.solcap.geno.dist
dataset.
Divides map in sub-maps and re-phase them
Description
The function splits the input map in sub-maps given a distance threshold of neighboring markers and evaluates alternative phases between the sub-maps.
Usage
split_and_rephase(
input.map,
twopt,
gap.threshold = 5,
size.rem.cluster = 1,
phase.config = "best",
thres.twopt = 3,
thres.hmm = "best",
tol.merge = 0.001,
tol.final = 0.001,
verbose = TRUE
)
Arguments
input.map |
an object of class |
twopt |
an object of class |
gap.threshold |
distance threshold of neighboring markers where the map should be spitted. The default value is 5 cM |
size.rem.cluster |
the size of the marker cluster (in number of markers) from which the cluster should be removed. The default value is 1 |
phase.config |
which phase configuration should be used. "best" (default) will choose the maximum likelihood phase configuration |
thres.twopt |
the threshold used to determine if the linkage phases compared via two-point analysis should be considered for the search space reduction (default = 3) |
thres.hmm |
the threshold used to determine which linkage phase configurations should be returned when merging two maps. If "best" (default), returns only the best linkage phase configuration. NOTE: if merging multiple maps, it always uses the "best" linkage phase configuration at each block insertion. |
tol.merge |
the desired accuracy for merging maps (default = 10e-04) |
tol.final |
the desired accuracy for the final map (default = 10e-04) |
verbose |
if |
Value
An object of class mappoly.map
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu
References
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Examples
map <- get_submap(solcap.dose.map[[1]], 1:20, verbose = FALSE)
tpt <- est_pairwise_rf(make_seq_mappoly(map))
new.map <- split_and_rephase(map, tpt, 1, 1)
map
new.map
plot_map_list(list(old.map = map, new.map = new.map), col = "ggstyle")
Split map into sub maps given a gap threshold
Description
Split map into sub maps given a gap threshold
Usage
split_mappoly(
input.map,
gap.threshold = 5,
size.rem.cluster = 1,
phase.config = "best",
tol.final = 0.001,
verbose = TRUE
)
Summary maps
Description
This function generates a brief summary table of a list of mappoly.map
objects
Usage
summary_maps(map.list, verbose = TRUE)
Arguments
map.list |
a list of objects of class |
verbose |
if |
Value
a data frame containing a brief summary of all maps contained in map.list
Author(s)
Gabriel Gesteira, gdesiqu@ncsu.edu
Examples
tetra.sum <- summary_maps(solcap.err.map)
tetra.sum
Conversion of data.frame to mappoly.data
Description
Conversion of data.frame to mappoly.data
Usage
table_to_mappoly(
dat,
ploidy,
filter.non.conforming = TRUE,
elim.redundant = TRUE,
verbose = TRUE
)
Autotetraploid potato dataset.
Description
A dataset of the B2721 population which derived from a cross between two tetraploid potato varieties: Atlantic × B1829-5. The population comprises 160 offsprings genotyped with the SolCAP Infinium 8303 potato array. The original data set can be found in [The Solanaceae Coordinated Agricultural Project (SolCAP) webpage](http://solcap.msu.edu/potato_infinium.shtml) The dataset also contains the genomic order of the SNPs from the Solanum tuberosum genome version 4.03. The genotype calling was performed using the fitPoly R package.
Usage
tetra.solcap
Format
An object of class mappoly.data
which contains a
list with the following components:
- ploidy
ploidy level = 4
- n.ind
number individuals = 160
- n.mrk
total number of markers = 4017
- ind.names
the names of the individuals
- mrk.names
the names of the markers
- dosage.p1
a vector containing the dosage in parent P for all
n.mrk
markers- dosage.p2
a vector containing the dosage in parent Q for all
n.mrk
markers- chrom
a vector indicating the chromosome each marker belongs. Zero indicates that the marker was not assigned to any sequence
- genome.pos
Physical position of the markers into the sequence
- geno.dose
a matrix containing the dosage for each markers (rows) for each individual (columns). Missing data are represented by
ploidy_level + 1 = 5
- n.phen
There are no phenotypes in this simulation
- phen
There are no phenotypes in this simulation
- chisq.pval
vector containing p-values for all markers associated to the chi-square test for the expected segregation patterns under Mendelian segregation
Autotetraploid potato dataset with genotype probabilities.
Description
A dataset of the B2721 population which derived from a cross between
two tetraploid potato varieties: Atlantic × B1829-5. The population comprises 160
offsprings genotyped with the SolCAP Infinium 8303 potato array. The original data
set can be found in [The Solanaceae Coordinated Agricultural Project (SolCAP) webpage](http://solcap.msu.edu/potato_infinium.shtml)
The dataset also contains the genomic order of the SNPs from the Solanum
tuberosum genome version 4.03. The genotype calling was performed using the
fitPoly R package. Although this dataset contains the
probability distribution of the genotypes,
it is essentially the same dataset found in tetra.solcap
Usage
tetra.solcap.geno.dist
Format
An object of class mappoly.data
which contains a
list with the following components:
- ploidy
ploidy level = 4
- n.ind
number individuals = 160
- n.mrk
total number of markers = 4017
- ind.names
the names of the individuals
- mrk.names
the names of the markers
- dosage.p1
a vector containing the dosage in parent P for all
n.mrk
markers- dosage.p2
a vector containing the dosage in parent Q for all
n.mrk
markers- chrom
a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence
- genome.pos
Physical position of the markers into the sequence
- prob.thres = 0.95
probability threshold to associate a marker call to a dosage. Markers with maximum genotype probability smaller than 'prob.thres' are considered as missing data for the dosage calling purposes
- geno
a data.frame containing the probability distribution for each combination of marker and offspring. The first two columns represent the marker and the offspring, respectively. The remaining elements represent the probability associated to each one of the possible dosages
- geno.dose
a matrix containing the dosage for each markers (rows) for each individual (columns). Missing data are represented by
ploidy_level + 1 = 5
- n.phen
There are no phenotypes in this simulation
- phen
There are no phenotypes in this simulation
Add markers that are informative in both parents using HMM approach and evaluating difference in LOD and gap size
Description
Add markers that are informative in both parents using HMM approach and evaluating difference in LOD and gap size
Usage
update_framework_map(
input.map.list,
input.seq,
twopt,
thres.twopt = 10,
init.LOD = 30,
verbose = TRUE,
method = "hmm",
input.mds = NULL,
max.rounds = 50,
size.rem.cluster = 2,
gap.threshold = 4
)
Arguments
input.map.list |
list containing three |
input.seq |
object of class |
twopt |
object of class |
thres.twopt |
the LOD threshold used to determine if the linkage phases compared via two-point analysis should be considered for the search space reduction (default = 5) |
init.LOD |
the LOD threshold used to determine if the marker will be included or not after hmm analysis (default = 30) |
verbose |
If TRUE (default), current progress is shown; if FALSE, no output is produced |
method |
indicates whether to use 'hmm' (Hidden Markov Models), 'ols' (Ordinary Least Squares) or 'wMDS_to_1D_pc' (weighted MDS followed by fitting a one dimensional principal curve) to re-estimate the recombination fractions after adding markers |
input.mds |
An object of class |
max.rounds |
integer defining number of times to try to fit the remaining markers in the sequence |
size.rem.cluster |
threshold for number of markers that must contain in a segment after a gap is removed to keep this segment in the sequence |
gap.threshold |
threshold for gap size |
Value
object of class mappoly.map2
Author(s)
Marcelo Mollinari, mmollin@ncsu.edu with documentation and minor modifications by Cristiane Taniguti chtaniguti@tamu.edu
Update map
Description
This function takes an object of class mappoly.map
and checks for
removed redundant markers in the original dataset. Once redundant markers
are found, they are re-added to the map in their respective equivalent positions
and another HMM round is performed.
Usage
update_map(input.maps, verbose = TRUE)
Arguments
input.maps |
a single map or a list of maps of class |
verbose |
if TRUE (default), shows information about each update process |
Value
an updated map (or list of maps) of class mappoly.map
, containing the original map(s) plus redundant markers
Author(s)
Gabriel Gesteira, gdesiqu@ncsu.edu
Examples
orig.map <- solcap.err.map
up.map <- lapply(solcap.err.map, update_map)
summary_maps(orig.map)
summary_maps(up.map)
Update missing information
Description
Update missing information
Usage
update_missing(input.data, prob.thres = 0.95)
makes a phase list from map, selecting only configurations under a certain threshold
Description
makes a phase list from map, selecting only configurations under a certain threshold
Usage
update_ph_list_at_hmm_thres(map, thres.hmm)
Conversion: vector to matrix
Description
Conversion: vector to matrix
Usage
v_2_m(x, n)