EstimateEM
performs one EM for one given number of shifts. It is called
from function PhyloEM
. Its use is mostly internal, and most user
should not need it.
estimateEM(
phylo,
Y_data,
Y_data_imp = Y_data,
process = c("BM", "OU", "scOU", "rBM"),
independent = FALSE,
tol_EM = list(variance = 10^(-2), value.root = 10^(-2), exp.root = 10^(-2), var.root
= 10^(-2), selection.strength = 10^(-2), normalized_half_life = 10^(-2),
log_likelihood = 10^(-2)),
Nbr_It_Max = 500,
method.variance = c("simple", "upward_downward"),
method.init = c("default", "lasso"),
method.init.alpha = c("default", "estimation"),
method.init.alpha.estimation = c("regression", "regression.MM", "median"),
nbr_of_shifts = 0,
random.root = TRUE,
stationary.root = TRUE,
alpha_known = FALSE,
eps = 10^(-3),
known.selection.strength = 1,
init.selection.strength = 1,
max_selection.strength = 100,
use_sigma_for_lasso = TRUE,
max_triplet_number = 10000,
min_params = list(variance = 0, value.root = -10^(5), exp.root = -10^(5), var.root =
0, selection.strength = 0),
max_params = list(variance = 10^(5), value.root = 10^(5), exp.root = 10^(5), var.root
= 10^(5), selection.strength = 10^(5)),
var.init.root = diag(1, nrow(Y_data)),
variance.init = diag(1, nrow(Y_data), nrow(Y_data)),
methods.segmentation = c("lasso", "same_shifts", "best_single_move"),
check.tips.names = FALSE,
times_shared = NULL,
distances_phylo = NULL,
subtree.list = NULL,
T_tree = NULL,
U_tree = NULL,
h_tree = NULL,
F_moments = NULL,
tol_half_life = TRUE,
warning_several_solutions = TRUE,
convergence_mode = c("relative", "absolute"),
check_convergence_likelihood = TRUE,
sBM_variance = FALSE,
method.OUsun = c("rescale", "raw"),
K_lag_init = 0,
allow_negative = FALSE,
trait_correlation_threshold = 0.9,
...
)
A phylogenetic tree of class phylo
(from package ape
).
Matrix of data at the tips, size p x ntaxa. Each line is a trait, and each column is a tip. The column names are checked against the tip names of the tree.
(optional) imputed data if previously computed, same format as
Y_data
. Mostly here for internal calls.
The model used for the fit. One of "BM" (for a full BM model, univariate or multivariate); "OU" (for an OU with independent traits, univariate or multivariate); or "scOU" (for a "scalar OU" model, see details).
Are the trait assumed to be independent from one another? Default to FALSE. OU in a multivariate setting only works if TRUE.
the tolerance for the convergence of the parameters. A named list, with items:
default to 10^(-2)
default to 10^(-2)
default to 10^(-2)
default to 10^(-2)
default to 10^(-2)
default to 10^(-2)
default to 10^(-2)
the maximal number of iterations of the EM allowed. Default to 500 iterations.
Algorithm to be used for the moments computations at the E step. One of "simple" for the naive method; of "upward_downward" for the Upward Downward method (usually faster). Default to "upward_downward".
The initialization method. One of "lasso" for the LASSO base initialization method; or "default" for user-specified initialization values. Default to "lasso".
For OU model, initialization method for the selection
strength alpha. One of "estimation" for a cherry-based initialization, using
nlrob
; or "default" for user-specified
initialization values. Default to "estimation".
If method.init.alpha="estimation",
choice of the estimation(s) methods to be used. Choices among "regression",
(method="M" is passed to nlrob
); "regression.MM"
(method="MM" is passed to nlrob
) or "median"
(nlrob
is not used, a simple median is taken).
Default to all of them.
the number of shifts allowed.
whether the root is assumed to be random (TRUE) of fixed (FALSE). Default to TRUE
whether the root is assumed to be in the stationary state. Default to TRUE.
is the selection strength assumed to be known ? Default to FALSE.
tolerance on the selection strength value before switching to a BM. Default to 10^(-3).
if alpha_known=TRUE
, the value of the
known selection strength.
(optional) a starting point for the selection strength value.
the maximal value allowed of the selection strength. Default to 100.
whether to use the first estimation of the variance matrix in the lasso regression. Default to TRUE.
for the initialization of the selection strength value (when estimated), the maximal number of triplets of tips to be considered.
a named list containing the minimum allowed values for the parameters. If the estimation is smaller, then the EM stops, and is considered to be divergent. Default values:
default to 0
default to -10^(5)
default to -10^(5)
default to 0
default to 0
a named list containing the maximum allowed values for the parameters. If the estimation is larger, then the EM stops, and is considered to be divergent. Default values:
default to 10^(5)
default to 10^(5)
default to 10^(5)
default to 10^(5)
default to 10^(5)
optional initialization value for the variance of the root.
optional initialization value for the variance.
For OU, method(s) used at the M step to find new candidate shifts positions. Choices among "lasso" for a LASSO-based algorithm; and "best_single_move" for a one-move at a time based heuristic. Default to both of them. Using only "lasso" might speed up the function a lot.
whether to check the tips names of the tree against the column names of the data. Default to TRUE.
(optional) times of shared ancestry of all nodes and tips,
result of function compute_times_ca
(optional) phylogenetic distances, result of function
compute_dist_phy
.
(optional) tips descendants of all the edges, result of
function enumerate_tips_under_edges
.
(optional) matrix of incidence of the tree, result of function
incidence.matrix
.
(optional) full matrix of incidence of the tree, result of function
incidence.matrix.full
.
(optional) total height of the tree.
(optional, internal)
should the tolerance criterion be applied to the phylogenetic half life (TRUE, default) or to the raw selection strength ?
whether to issue a warning if several equivalent solutions are found (default to TRUE).
one of "relative" (the default) or "absolute". Should the tolerance be applied to the raw parameters, or to the renormalized ones ?
should the likelihood be taken into consideration for convergence assessment ? (default to TRUE).
Is the root of the BM supposed to be random and "stationary"? Used for BM equivalent computations. Default to FALSE.
Method to be used in univariate OU. One of "rescale" (rescale the tree to fit a BM) or "raw" (directly use an OU, only available for univariate processes).
Number of extra shifts to be considered at the initialization step. Increases the accuracy, but can make computations quite slow of taken too high. Default to 5.
whether to allow negative values for alpha (Early Burst).
See documentation of PhyloEM
for more details. Default to FALSE.
the trait correlation threshold to stop the analysis. Default to 0.9.
Further arguments to be passed to estimateEM
, including
tolerance parameters for stopping criteria, maximal number of iterations, etc.
An object of class EstimateEM
.
See documentation of PhyloEM
for further details.
All the parameters monitoring the EM (like tol_EM
, Nbr_It_Max
, etc.)
can be called from PhyloEM
.