Reaction | Days | Subject |
---|---|---|
249.5600 | 0 | 308 |
258.7047 | 1 | 308 |
250.8006 | 2 | 308 |
321.4398 | 3 | 308 |
356.8519 | 4 | 308 |
414.6901 | 5 | 308 |
2025-04-30
For observation \(i\) of group \(k\): \[ y_{i,k} = \mu + \alpha_k + (\beta + \gamma_k) \times x_{i} + \epsilon_{i, k} \]
Problem: What if there are many groups, with few observations?
\(\to\) Estimation problem.
Call:
lm(formula = Reaction ~ Days * Subject, data = sleepstudy)
Residuals:
Min 1Q Median 3Q Max
-106.397 -10.692 -0.177 11.417 132.510
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 244.193 15.042 16.234 < 2e-16 ***
Days 21.765 2.818 7.725 1.74e-12 ***
Subject309 -39.138 21.272 -1.840 0.067848 .
Subject310 -40.708 21.272 -1.914 0.057643 .
Subject330 45.492 21.272 2.139 0.034156 *
Subject331 41.546 21.272 1.953 0.052749 .
Subject332 20.059 21.272 0.943 0.347277
Subject333 30.826 21.272 1.449 0.149471
Subject334 -4.030 21.272 -0.189 0.850016
Subject335 18.842 21.272 0.886 0.377224
Subject337 45.911 21.272 2.158 0.032563 *
Subject349 -29.081 21.272 -1.367 0.173728
Subject350 -18.358 21.272 -0.863 0.389568
Subject351 16.954 21.272 0.797 0.426751
Subject352 32.179 21.272 1.513 0.132535
Subject369 10.775 21.272 0.507 0.613243
Subject370 -33.744 21.272 -1.586 0.114870
Subject371 9.443 21.272 0.444 0.657759
Subject372 22.852 21.272 1.074 0.284497
Days:Subject309 -19.503 3.985 -4.895 2.61e-06 ***
Days:Subject310 -15.650 3.985 -3.928 0.000133 ***
Days:Subject330 -18.757 3.985 -4.707 5.84e-06 ***
Days:Subject331 -16.499 3.985 -4.141 5.88e-05 ***
Days:Subject332 -12.198 3.985 -3.061 0.002630 **
Days:Subject333 -12.623 3.985 -3.168 0.001876 **
Days:Subject334 -9.512 3.985 -2.387 0.018282 *
Days:Subject335 -24.646 3.985 -6.185 6.07e-09 ***
Days:Subject337 -2.739 3.985 -0.687 0.492986
Days:Subject349 -8.271 3.985 -2.076 0.039704 *
Days:Subject350 -2.261 3.985 -0.567 0.571360
Days:Subject351 -15.331 3.985 -3.848 0.000179 ***
Days:Subject352 -8.198 3.985 -2.057 0.041448 *
Days:Subject369 -10.417 3.985 -2.614 0.009895 **
Days:Subject370 -3.709 3.985 -0.931 0.353560
Days:Subject371 -12.576 3.985 -3.156 0.001947 **
Days:Subject372 -10.467 3.985 -2.627 0.009554 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 25.59 on 144 degrees of freedom
Multiple R-squared: 0.8339, Adjusted R-squared: 0.7936
F-statistic: 20.66 on 35 and 144 DF, p-value: < 2.2e-16
For observation \(i\) of group \(k\): \[ y_{i,k} = \mu + A_{k} + (\beta + G_k) \times x_{i} + \epsilon_i \]
\(\to\) Replace fixed effects with random effects.
\(\to\) Idea: works when there are many groups in a factor.
For observation \(i\) of group \(k\): \[ y_{i,k} = \mu + A_{k} + \epsilon_i \]
\[ \mathbb{E}[y_{i,k}] = \mu \]
\[ \mathbb{V}[y_{i,k}] = s_A^2 + \sigma^2 \]
\[ \mathbb{C}[y_{i,k}, y_{i',k'}] = \begin{cases} s_A^2 & \text{if } k = k' \\ 0 & \text{otherwise.} \end{cases} \]
ANOVA \[ y_{i,k} = \mu + \alpha_{k} + \epsilon_{ik} \quad \epsilon_{i,k} \sim \mathcal{N}(0, \sigma^2) ~ iid \]
\[ \mathcal{H}_0: \alpha_1 = \cdots = \alpha_K = 0 \]
Mixed Model \[ y_{i,k} = \mu + A_{k} + \epsilon_{ik} \quad A_k \sim \mathcal{N}(0, s_A^2) ~ iid \quad \epsilon_{i,k} \sim \mathcal{N}(0, \sigma^2) ~ iid \]
\[ \mathcal{H}_0: s_A^2 = 0 \] Different models but same question: does the structure of the population in groups impact the mean of \(y_{i,k}\) ?
\[ y_{i,k} = \mu + A_{k} + \epsilon_i \]
\[ \mathbf{y} = \mu\mathbf{1}_{nK} + \mathbf{Z}\mathbf{U} + \boldsymbol{\epsilon} \]
\[ \mathbf{y} = \text{vec}((y_{i,k})_{i, k}) = \begin{pmatrix} y_{1,1}\\ y_{2,1}\\ \vdots\\ y_{n, 1}\\ y_{1, 2}\\ \vdots\\ y_{n, 2}\\ \vdots\\ y_{n, K} \end{pmatrix} \qquad \mathbf{Z} = \begin{pmatrix} 1 & 0 & \cdots & 0\\ \vdots & \vdots & \vdots & \vdots\\ 1 & 0 & \cdots & 0\\ 0 & 1 & \cdots & 0\\ \vdots & \vdots & \vdots & \vdots\\ 0 & 1 & \cdots & 0\\ 0 & 0 & \cdots & 1\\ \vdots & \vdots & \vdots & \vdots\\ 0 & 0 & \cdots & 1\\ \end{pmatrix} \qquad \begin{aligned} \mathbf{U} &\sim \mathcal{N}(\mathbf{0}, s_A^2\mathbf{I}_{K}) \\ \boldsymbol{\epsilon} &\sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I}_{nK}) \end{aligned} \]
\[ y_{i,k} = \mu + A_{k} + \epsilon_i \]
\[~\]
\[ \mathbf{y} = \mu\mathbf{1}_{nK} + \mathbf{E} \qquad \mathbf{E} \sim \mathcal{N}(\mathbf{0}, \mathbf{\Sigma}) \]
\[~\]
\[ \mathbf{\Sigma} = \begin{pmatrix} \mathbf{R} & \mathbf{0} & \cdots & \mathbf{0}\\ \mathbf{0} & \ddots & \ddots & \vdots\\ \vdots & \ddots & \ddots & \mathbf{0}\\ \mathbf{0} & \cdots & \mathbf{0} & \mathbf{R}\\ \end{pmatrix} \qquad \mathbf{R} = \begin{pmatrix} \sigma^2 + s_A^2 & s_A^2 & \cdots & s_A^2\\ s_A^2 & \ddots & \ddots & \vdots\\ \vdots & \ddots & \ddots & s_A^2\\ s_A^2 & \cdots & s_A^2 & \sigma^2 + s_A^2\\ \end{pmatrix} \]
Linear Model \[ \mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \sigma^2 \mathbf{I}_n) \]
\[~\]
Mixed Model \[ \mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \mathbf{\Sigma}) \]
\[~\] with \(\mathbf{\Sigma}\) structured covariance matrix.
For observation \(i\) of group \(k\): \[ y_{i,k} = \mu + A_{k} + (\beta + G_k) \times x_{i} + \epsilon_i \]
\[ \mathbb{E}[y_{i,k}] = \mu + \beta x_i \]
\[ \mathbb{V}[y_{i,k}] = s_A^2 + s_G^2x_i^2 + \sigma^2 \]
\[ \mathbb{C}[y_{i,k}, y_{i',k'}] = \begin{cases} s_A^2 + s_G^2 x_i x_{i'}& \text{if } k = k' \\ 0 & \text{otherwise.} \end{cases} \]
\[ y_{i,k} = \mu + A_{k} + (\beta + G_k) \times x_{i} + \epsilon_i \]
\[ \mathbf{y} = \mu\mathbf{1}_{nK} + \beta \mathbf{x} + \mathbf{Z}_1\mathbf{U}_1 + \mathbf{Z}_2\mathbf{U}_2 + \boldsymbol{\epsilon} \]
\[ \mathbf{y} = \begin{pmatrix} y_{1,1}\\ y_{2,1}\\ \vdots\\ y_{n, 1}\\ y_{1, 2}\\ \vdots\\ y_{n, 2}\\ \vdots\\ y_{n, K} \end{pmatrix} \qquad \mathbf{Z}_2 = \begin{pmatrix} x_1 & 0 & \cdots & 0\\ \vdots & \vdots & \vdots & \vdots\\ x_n & 0 & \cdots & 0\\ 0 & x_1 & \cdots & 0\\ \vdots & \vdots & \vdots & \vdots\\ 0 & x_n & \cdots & 0\\ 0 & 0 & \cdots & x_1\\ \vdots & \vdots & \vdots & \vdots\\ 0 & 0 & \cdots & x_n\\ \end{pmatrix} \qquad \begin{aligned} \mathbf{U}_1 &\sim \mathcal{N}(\mathbf{0}, s_A^2\mathbf{I}_{K}) \\ \mathbf{U}_2 &\sim \mathcal{N}(\mathbf{0}, s_G^2\mathbf{I}_{K}) \\ \boldsymbol{\epsilon} &\sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I}_{nK}) \end{aligned} \]
\[ y_{i,k} = \mu + A_{k} + (\beta + G_k) \times x_{i} + \epsilon_i \]
\[ \mathbf{y} = \mu\mathbf{1}_{nK} + \beta \mathbf{x} + \mathbf{Z}_1\mathbf{U}_1 + \mathbf{Z}_2\mathbf{U}_2 + \boldsymbol{\epsilon} \]
\[ \mathbf{y} = \mu\mathbf{1}_{nK} + \beta\mathbf{x} + \mathbf{E} \qquad \mathbf{E} \sim \mathcal{N}(\mathbf{0}, \mathbf{\Sigma}) \]
\[ \mathbf{\Sigma} = \sigma^2 \mathbf{I} + s_A^2 \mathbf{Z}_1^T \mathbf{Z}_1 + s_G^2 \mathbf{Z}_2^T \mathbf{Z}_2 = \begin{pmatrix} \mathbf{R} & \mathbf{0} & \cdots & \mathbf{0}\\ \mathbf{0} & \ddots & \ddots & \vdots\\ \vdots & \ddots & \ddots & \mathbf{0}\\ \mathbf{0} & \cdots & \mathbf{0} & \mathbf{R}\\ \end{pmatrix} \]
\[ \mathbf{R} = \begin{pmatrix} \sigma^2 + s_A^2 + s_G^2 x_1^2 & s_A^2 + s_G^2 x_2x_1 & \cdots & s_A^2 + s_G^2 x_nx_1\\ s_A^2 + s_G^2 x_1x_2 & \ddots & \ddots & \vdots\\ \vdots & \ddots & \ddots & s_A^2 + s_G^2 x_{n-1}x_n\\ s_A^2 + s_G^2 x_1x_n & \cdots & s_A^2 + s_G^2 x_nx_{n-1} & \sigma^2 + s_A^2 + s_G^2 x_nx_n\\ \end{pmatrix} \]
Linear mixed model fit by REML ['lmerMod']
Formula: Reaction ~ Days + (Days | Subject)
Data: sleepstudy
REML criterion at convergence: 1743.628
Random effects:
Groups Name Std.Dev. Corr
Subject (Intercept) 24.741
Days 5.922 0.07
Residual 25.592
Number of obs: 180, groups: Subject, 18
Fixed Effects:
(Intercept) Days
251.41 10.47
Linear Model \[ \mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \sigma^2 \mathbf{I}_n) \]
\[~\]
Mixed Model \[ \mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \mathbf{\Sigma}) \]
\[~\] with \(\mathbf{\Sigma}\) structured covariance matrix.
\[~\] \(\to\) General fit and tests more difficult
Repeated Measures \[ y_{ikt} = \mu + \alpha_k + \beta_t + \gamma_{kt} + \epsilon_{ikt} \]
\[ \mathbb{C}[y_{i,k,t}, y_{i',k',t'}] = \begin{cases} \sigma^2 \rho & \text{if } (i,k) = (i',k') \\ 0 & \text{otherwise.} \end{cases} \] \[~\] \(\to\) Constant covariance between time measures.
Repeated Measures \[ y_{ikt} = \mu + \alpha_k + \beta_t + \gamma_{kt} + \epsilon_{ikt} \]
\[ \mathbb{C}[y_{i,k,t}, y_{i',k',t'}] = \begin{cases} \sigma^2 \rho^{|t-t'|} & \text{if } (i,k) = (i',k') \\ 0 & \text{otherwise.} \end{cases} \] \[~\] \(\to\) AR(1) (with \(|\rho| < 1\))
Spatial Measures \[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \]
\[ \mathbb{C}[y_{i}, y_{i'}] = \begin{cases} \gamma^2 e^{-d(i,i')/\rho} & \text{if } i \neq i' \\ \sigma^2 + \gamma^2 & \text{if } i = i' \end{cases} \]
Mixed Model \[ \mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \mathbf{\Sigma}(\boldsymbol{\psi})) \] with \(\mathbf{\Sigma}(\psi)\) depends on paramaters \(\boldsymbol{\psi}\)
\[ \mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \mathbf{\Sigma}) \]
Special case: \[ \mathbf{\Sigma} = \sigma^2 \mathbf{C} \] with \(\mathbf{C}\) known.
Then: \[ \hat{\boldsymbol{\beta}} = (\mathbf{X}^{T}\mathbf{C}^{-1}\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{C}^{-1}\mathbf{y} \]
\[ \hat{\sigma}^2 = \frac{\|\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}} \|^2_{\mathbf{C}^{-1}}}{n-p} = \frac{(\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}})^T \mathbf{C}^{-1}(\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}})}{n-p} \]
Proof: Use Cholesky decomposition.
Mixed Model \[ \mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \mathbf{\Sigma}(\boldsymbol{\psi})) \]
Maximum-Likelihood Estimators: \[ \hat{\boldsymbol{\beta}}, \hat{\boldsymbol{\psi}} = \underset{\boldsymbol{\beta}, \boldsymbol{\psi}}{\operatorname{argmax}} \log L(\boldsymbol{\beta}, \sigma^2 | \mathbf{y}) \]
\[ \hat{\boldsymbol{\beta}}, \hat{\boldsymbol{\psi}} = \underset{\boldsymbol{\beta}, \boldsymbol{\psi}}{\operatorname{argmin}} \left\{ \log |\mathbf{\Sigma}(\boldsymbol{\psi})| + (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T \mathbf{\Sigma}(\boldsymbol{\psi})^{-1}(\mathbf{y} - \mathbf{X}\boldsymbol{\beta}) \right\} \]
\[ S(\boldsymbol{\beta}) = (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T \mathbf{\Sigma}(\boldsymbol{\psi})^{-1}(\mathbf{y} - \mathbf{X}\boldsymbol{\beta}) \]
\[ S(\boldsymbol{\beta}) = (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T \mathbf{\Sigma}(\boldsymbol{\psi})^{-1}(\mathbf{y} - \mathbf{X}\boldsymbol{\beta}) \]
\[ \nabla S(\hat{\boldsymbol{\beta}}) = 2 \mathbf{X}^T\mathbf{\Sigma}(\boldsymbol{\psi})^{-1}\mathbf{X}\hat{\boldsymbol{\beta}} - 2 \mathbf{X}^T\mathbf{\Sigma}(\boldsymbol{\psi})^{-1}\mathbf{y} = 0 \]
\[ \hat{\boldsymbol{\beta}} = h(\boldsymbol{\psi}) = (\mathbf{X}^T\mathbf{\Sigma}(\boldsymbol{\psi})^{-1}\mathbf{X})^{-1} \mathbf{X}^T\mathbf{\Sigma}(\boldsymbol{\psi})^{-1}\mathbf{y} \]
Maximisation:
\[ \hat{\boldsymbol{\psi}} = \underset{\boldsymbol{\psi}}{\operatorname{argmin}} \left\{ \log |\mathbf{\Sigma}(\boldsymbol{\psi})| + (\mathbf{y} - \mathbf{X}h(\boldsymbol{\psi}))^T \mathbf{\Sigma}(\boldsymbol{\psi})^{-1}(\mathbf{y} - \mathbf{X}h(\boldsymbol{\psi})) \right\} \]
\[ \hat{\boldsymbol{\beta}} = h(\hat{\boldsymbol{\psi}}) \]
\(\to\) Optimization in \(\boldsymbol{\psi}\) can be complex.
Linear Model \[ \hat{\sigma}^2_{ML} = \frac{\|\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}} \|^2}{n} \qquad \text{vs} \qquad \hat{\sigma}^2_{REML} = \frac{\|\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}} \|^2}{n-p} \] \[ \hat{\sigma}^2_{REML} = \underset{\sigma^2}{\operatorname{argmax}} \log L(0, \sigma^2 | \mathbf{P}^{\mathbf{X}^\bot}\mathbf{y}) \]
\[ \mathbf{P}^{\mathbf{X}^\bot}\mathbf{y} = \hat{\boldsymbol{\epsilon}} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{P}^{\mathbf{X}^\bot}) \]
\[ \hat{\sigma}^2_{REML} = \underset{\sigma^2}{\operatorname{argmax}} \left\{ \log |\sigma^2 \mathbf{P}^{\mathbf{X}^\bot}| + (\mathbf{P}^{\mathbf{X}^\bot}\mathbf{y})^T (\sigma^2 \mathbf{P}^{\mathbf{X}^\bot})^{-1}(\mathbf{P}^{\mathbf{X}^\bot}\mathbf{y}) \right\} \]
\[ \hat{\sigma}^2_{REML} = \underset{\sigma^2}{\operatorname{argmax}} \left\{ (n-p) \log(\sigma^2) + \frac{1}{\sigma^2} \|\mathbf{P}^{\mathbf{X}^\bot}\mathbf{y}\|^2 \right\} \]
Mixed Model \[ \mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \mathbf{\Sigma}(\boldsymbol{\psi})) \]
Let \(\mathbf{T}\) a \((n-p)\times n\) such that \(\mathbf{T}\mathbf{X}=\mathbf{0}\). \[ \tilde{\mathbf{y}} = \mathbf{T}\mathbf{y} \sim \mathcal{N}(\mathbf{0}, \mathbf{T}^T\mathbf{\Sigma}(\boldsymbol{\psi})\mathbf{T}) \]
\[ \hat{\boldsymbol{\psi}}_{REML} = \underset{\boldsymbol{\psi}}{\operatorname{argmax}} \log L(\boldsymbol{\psi} | \tilde{\mathbf{y}}) \]
\(\to\) Independent from \(\boldsymbol{\beta}\).
\(\to\) Independent from the choice of \(\mathbf{T}\).
\(\to\) Often, \(\mathbf{T} = \mathbf{P}^{\mathbf{X}^\bot}\).
\[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \]
An Introduction to Statistical Learning
G. James, D. Witten, T. Hastie and R. Tibshirani
Le modèle linéaire et ses extensions
L. Bel, JJ. Daudin, M. Etienne, E. Lebarbier, T. Mary-Huard, S. Robin, C. Vuillet
The Elements of Statistical Learning
T. Hastie, R. Tibshirani, J. Friedman