EstimateEM performs one EM for one given number of shifts. It is called from function PhyloEM. Its use is mostly internal, and most user should not need it.

estimateEM(
  phylo,
  Y_data,
  Y_data_imp = Y_data,
  process = c("BM", "OU", "scOU", "rBM"),
  independent = FALSE,
  tol_EM = list(variance = 10^(-2), value.root = 10^(-2), exp.root = 10^(-2), var.root
    = 10^(-2), selection.strength = 10^(-2), normalized_half_life = 10^(-2),
    log_likelihood = 10^(-2)),
  Nbr_It_Max = 500,
  method.variance = c("simple", "upward_downward"),
  method.init = c("default", "lasso"),
  method.init.alpha = c("default", "estimation"),
  method.init.alpha.estimation = c("regression", "regression.MM", "median"),
  nbr_of_shifts = 0,
  random.root = TRUE,
  stationary.root = TRUE,
  alpha_known = FALSE,
  eps = 10^(-3),
  known.selection.strength = 1,
  init.selection.strength = 1,
  max_selection.strength = 100,
  use_sigma_for_lasso = TRUE,
  max_triplet_number = 10000,
  min_params = list(variance = 0, value.root = -10^(5), exp.root = -10^(5), var.root =
    0, selection.strength = 0),
  max_params = list(variance = 10^(5), value.root = 10^(5), exp.root = 10^(5), var.root
    = 10^(5), selection.strength = 10^(5)),
  var.init.root = diag(1, nrow(Y_data)),
  variance.init = diag(1, nrow(Y_data), nrow(Y_data)),
  methods.segmentation = c("lasso", "same_shifts", "best_single_move"),
  check.tips.names = FALSE,
  times_shared = NULL,
  distances_phylo = NULL,
  subtree.list = NULL,
  T_tree = NULL,
  U_tree = NULL,
  h_tree = NULL,
  F_moments = NULL,
  tol_half_life = TRUE,
  warning_several_solutions = TRUE,
  convergence_mode = c("relative", "absolute"),
  check_convergence_likelihood = TRUE,
  sBM_variance = FALSE,
  method.OUsun = c("rescale", "raw"),
  K_lag_init = 0,
  allow_negative = FALSE,
  trait_correlation_threshold = 0.9,
  ...
)

Arguments

phylo

A phylogenetic tree of class phylo (from package ape).

Y_data

Matrix of data at the tips, size p x ntaxa. Each line is a trait, and each column is a tip. The column names are checked against the tip names of the tree.

Y_data_imp

(optional) imputed data if previously computed, same format as Y_data. Mostly here for internal calls.

process

The model used for the fit. One of "BM" (for a full BM model, univariate or multivariate); "OU" (for an OU with independent traits, univariate or multivariate); or "scOU" (for a "scalar OU" model, see details).

independent

Are the trait assumed to be independent from one another? Default to FALSE. OU in a multivariate setting only works if TRUE.

tol_EM

the tolerance for the convergence of the parameters. A named list, with items:

variance

default to 10^(-2)

value.root

default to 10^(-2)

exp.root

default to 10^(-2)

var.root

default to 10^(-2)

selection.strength

default to 10^(-2)

normalized_half_life

default to 10^(-2)

log_likelihood

default to 10^(-2)

Nbr_It_Max

the maximal number of iterations of the EM allowed. Default to 500 iterations.

method.variance

Algorithm to be used for the moments computations at the E step. One of "simple" for the naive method; of "upward_downward" for the Upward Downward method (usually faster). Default to "upward_downward".

method.init

The initialization method. One of "lasso" for the LASSO base initialization method; or "default" for user-specified initialization values. Default to "lasso".

method.init.alpha

For OU model, initialization method for the selection strength alpha. One of "estimation" for a cherry-based initialization, using nlrob; or "default" for user-specified initialization values. Default to "estimation".

method.init.alpha.estimation

If method.init.alpha="estimation", choice of the estimation(s) methods to be used. Choices among "regression", (method="M" is passed to nlrob); "regression.MM" (method="MM" is passed to nlrob) or "median" (nlrob is not used, a simple median is taken). Default to all of them.

nbr_of_shifts

the number of shifts allowed.

random.root

whether the root is assumed to be random (TRUE) of fixed (FALSE). Default to TRUE

stationary.root

whether the root is assumed to be in the stationary state. Default to TRUE.

alpha_known

is the selection strength assumed to be known ? Default to FALSE.

eps

tolerance on the selection strength value before switching to a BM. Default to 10^(-3).

known.selection.strength

if alpha_known=TRUE, the value of the known selection strength.

init.selection.strength

(optional) a starting point for the selection strength value.

max_selection.strength

the maximal value allowed of the selection strength. Default to 100.

use_sigma_for_lasso

whether to use the first estimation of the variance matrix in the lasso regression. Default to TRUE.

max_triplet_number

for the initialization of the selection strength value (when estimated), the maximal number of triplets of tips to be considered.

min_params

a named list containing the minimum allowed values for the parameters. If the estimation is smaller, then the EM stops, and is considered to be divergent. Default values:

variance

default to 0

value.root

default to -10^(5)

exp.root

default to -10^(5)

var.root

default to 0

selection.strength

default to 0

max_params

a named list containing the maximum allowed values for the parameters. If the estimation is larger, then the EM stops, and is considered to be divergent. Default values:

variance

default to 10^(5)

value.root

default to 10^(5)

exp.root

default to 10^(5)

var.root

default to 10^(5)

selection.strength

default to 10^(5)

var.init.root

optional initialization value for the variance of the root.

variance.init

optional initialization value for the variance.

methods.segmentation

For OU, method(s) used at the M step to find new candidate shifts positions. Choices among "lasso" for a LASSO-based algorithm; and "best_single_move" for a one-move at a time based heuristic. Default to both of them. Using only "lasso" might speed up the function a lot.

check.tips.names

whether to check the tips names of the tree against the column names of the data. Default to TRUE.

times_shared

(optional) times of shared ancestry of all nodes and tips, result of function compute_times_ca

distances_phylo

(optional) phylogenetic distances, result of function compute_dist_phy.

subtree.list

(optional) tips descendants of all the edges, result of function enumerate_tips_under_edges.

T_tree

(optional) matrix of incidence of the tree, result of function incidence.matrix.

U_tree

(optional) full matrix of incidence of the tree, result of function incidence.matrix.full.

h_tree

(optional) total height of the tree.

F_moments

(optional, internal)

tol_half_life

should the tolerance criterion be applied to the phylogenetic half life (TRUE, default) or to the raw selection strength ?

warning_several_solutions

whether to issue a warning if several equivalent solutions are found (default to TRUE).

convergence_mode

one of "relative" (the default) or "absolute". Should the tolerance be applied to the raw parameters, or to the renormalized ones ?

check_convergence_likelihood

should the likelihood be taken into consideration for convergence assessment ? (default to TRUE).

sBM_variance

Is the root of the BM supposed to be random and "stationary"? Used for BM equivalent computations. Default to FALSE.

method.OUsun

Method to be used in univariate OU. One of "rescale" (rescale the tree to fit a BM) or "raw" (directly use an OU, only available for univariate processes).

K_lag_init

Number of extra shifts to be considered at the initialization step. Increases the accuracy, but can make computations quite slow of taken too high. Default to 5.

allow_negative

whether to allow negative values for alpha (Early Burst). See documentation of PhyloEM for more details. Default to FALSE.

trait_correlation_threshold

the trait correlation threshold to stop the analysis. Default to 0.9.

...

Further arguments to be passed to estimateEM, including tolerance parameters for stopping criteria, maximal number of iterations, etc.

Value

An object of class EstimateEM.

Details

See documentation of PhyloEM for further details. All the parameters monitoring the EM (like tol_EM, Nbr_It_Max, etc.) can be called from PhyloEM.

See also