EstimateEM
performs one EM for one given number of shifts. It is called
from function PhyloEM
. Its use is mostly internal, and most user
should not need it.
Usage
estimateEM(
phylo,
Y_data,
Y_data_imp = Y_data,
process = c("BM", "OU", "scOU", "rBM"),
independent = FALSE,
tol_EM = list(variance = 10^(-2), value.root = 10^(-2), exp.root = 10^(-2), var.root =
10^(-2), selection.strength = 10^(-2), normalized_half_life = 10^(-2), log_likelihood
= 10^(-2)),
Nbr_It_Max = 500,
method.variance = c("simple", "upward_downward"),
method.init = c("default", "lasso"),
method.init.alpha = c("default", "estimation"),
method.init.alpha.estimation = c("regression", "regression.MM", "median"),
nbr_of_shifts = 0,
random.root = TRUE,
stationary.root = TRUE,
alpha_known = FALSE,
eps = 10^(-3),
known.selection.strength = 1,
init.selection.strength = 1,
max_selection.strength = 100,
use_sigma_for_lasso = TRUE,
max_triplet_number = 10000,
min_params = list(variance = 0, value.root = -10^(5), exp.root = -10^(5), var.root = 0,
selection.strength = 0),
max_params = list(variance = 10^(5), value.root = 10^(5), exp.root = 10^(5), var.root =
10^(5), selection.strength = 10^(5)),
var.init.root = diag(1, nrow(Y_data)),
variance.init = diag(1, nrow(Y_data), nrow(Y_data)),
methods.segmentation = c("lasso", "same_shifts", "best_single_move"),
check.tips.names = FALSE,
times_shared = NULL,
distances_phylo = NULL,
subtree.list = NULL,
T_tree = NULL,
U_tree = NULL,
h_tree = NULL,
F_moments = NULL,
tol_half_life = TRUE,
warning_several_solutions = TRUE,
convergence_mode = c("relative", "absolute"),
check_convergence_likelihood = TRUE,
sBM_variance = FALSE,
method.OUsun = c("rescale", "raw"),
K_lag_init = 0,
allow_negative = FALSE,
trait_correlation_threshold = 0.9,
...
)
Arguments
- phylo
A phylogenetic tree of class
phylo
(from packageape
).- Y_data
Matrix of data at the tips, size p x ntaxa. Each line is a trait, and each column is a tip. The column names are checked against the tip names of the tree.
- Y_data_imp
(optional) imputed data if previously computed, same format as
Y_data
. Mostly here for internal calls.- process
The model used for the fit. One of "BM" (for a full BM model, univariate or multivariate); "OU" (for an OU with independent traits, univariate or multivariate); or "scOU" (for a "scalar OU" model, see details).
- independent
Are the trait assumed to be independent from one another? Default to FALSE. OU in a multivariate setting only works if TRUE.
- tol_EM
the tolerance for the convergence of the parameters. A named list, with items:
- variance
default to 10^(-2)
- value.root
default to 10^(-2)
- exp.root
default to 10^(-2)
- var.root
default to 10^(-2)
- selection.strength
default to 10^(-2)
- normalized_half_life
default to 10^(-2)
- log_likelihood
default to 10^(-2)
- Nbr_It_Max
the maximal number of iterations of the EM allowed. Default to 500 iterations.
- method.variance
Algorithm to be used for the moments computations at the E step. One of "simple" for the naive method; of "upward_downward" for the Upward Downward method (usually faster). Default to "upward_downward".
- method.init
The initialization method. One of "lasso" for the LASSO base initialization method; or "default" for user-specified initialization values. Default to "lasso".
- method.init.alpha
For OU model, initialization method for the selection strength alpha. One of "estimation" for a cherry-based initialization, using
nlrob
; or "default" for user-specified initialization values. Default to "estimation".- method.init.alpha.estimation
If method.init.alpha="estimation", choice of the estimation(s) methods to be used. Choices among "regression", (method="M" is passed to
nlrob
); "regression.MM" (method="MM" is passed tonlrob
) or "median" (nlrob
is not used, a simple median is taken). Default to all of them.- nbr_of_shifts
the number of shifts allowed.
- random.root
whether the root is assumed to be random (TRUE) of fixed (FALSE). Default to TRUE
- stationary.root
whether the root is assumed to be in the stationary state. Default to TRUE.
- alpha_known
is the selection strength assumed to be known ? Default to FALSE.
- eps
tolerance on the selection strength value before switching to a BM. Default to 10^(-3).
- known.selection.strength
if
alpha_known=TRUE
, the value of the known selection strength.- init.selection.strength
(optional) a starting point for the selection strength value.
- max_selection.strength
the maximal value allowed of the selection strength. Default to 100.
- use_sigma_for_lasso
whether to use the first estimation of the variance matrix in the lasso regression. Default to TRUE.
- max_triplet_number
for the initialization of the selection strength value (when estimated), the maximal number of triplets of tips to be considered.
- min_params
a named list containing the minimum allowed values for the parameters. If the estimation is smaller, then the EM stops, and is considered to be divergent. Default values:
- variance
default to 0
- value.root
default to -10^(5)
- exp.root
default to -10^(5)
- var.root
default to 0
- selection.strength
default to 0
- max_params
a named list containing the maximum allowed values for the parameters. If the estimation is larger, then the EM stops, and is considered to be divergent. Default values:
- variance
default to 10^(5)
- value.root
default to 10^(5)
- exp.root
default to 10^(5)
- var.root
default to 10^(5)
- selection.strength
default to 10^(5)
- var.init.root
optional initialization value for the variance of the root.
- variance.init
optional initialization value for the variance.
- methods.segmentation
For OU, method(s) used at the M step to find new candidate shifts positions. Choices among "lasso" for a LASSO-based algorithm; and "best_single_move" for a one-move at a time based heuristic. Default to both of them. Using only "lasso" might speed up the function a lot.
- check.tips.names
whether to check the tips names of the tree against the column names of the data. Default to TRUE.
(optional) times of shared ancestry of all nodes and tips, result of function
compute_times_ca
- distances_phylo
(optional) phylogenetic distances, result of function
compute_dist_phy
.- subtree.list
(optional) tips descendants of all the edges, result of function
enumerate_tips_under_edges
.- T_tree
(optional) matrix of incidence of the tree, result of function
incidence.matrix
.- U_tree
(optional) full matrix of incidence of the tree, result of function
incidence.matrix.full
.- h_tree
(optional) total height of the tree.
- F_moments
(optional, internal)
- tol_half_life
should the tolerance criterion be applied to the phylogenetic half life (TRUE, default) or to the raw selection strength ?
- warning_several_solutions
whether to issue a warning if several equivalent solutions are found (default to TRUE).
- convergence_mode
one of "relative" (the default) or "absolute". Should the tolerance be applied to the raw parameters, or to the renormalized ones ?
- check_convergence_likelihood
should the likelihood be taken into consideration for convergence assessment ? (default to TRUE).
- sBM_variance
Is the root of the BM supposed to be random and "stationary"? Used for BM equivalent computations. Default to FALSE.
- method.OUsun
Method to be used in univariate OU. One of "rescale" (rescale the tree to fit a BM) or "raw" (directly use an OU, only available for univariate processes).
- K_lag_init
Number of extra shifts to be considered at the initialization step. Increases the accuracy, but can make computations quite slow of taken too high. Default to 5.
- allow_negative
whether to allow negative values for alpha (Early Burst). See documentation of
PhyloEM
for more details. Default to FALSE.- trait_correlation_threshold
the trait correlation threshold to stop the analysis. Default to 0.9.
- ...
Further arguments to be passed to
estimateEM
, including tolerance parameters for stopping criteria, maximal number of iterations, etc.
Details
See documentation of PhyloEM
for further details.
All the parameters monitoring the EM (like tol_EM
, Nbr_It_Max
, etc.)
can be called from PhyloEM
.