When \(\sigma^2\) is unknown: \[
\frac{1}{p\hat{\sigma}^2}\left(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right)^T (\mathbf{X}^T\mathbf{X})\left(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right)
\sim
\mathcal{F}^p_{n-p}
\]
with \(\mathcal{F}^p_{n-p}\) the Fisher distribution.
Reminder: Fisher Distribution
Let \(U_1 \sim \chi^2_{p_1}\) and \(U_2 \sim \chi^2_{p_2}\), \(U_1\) and \(U_2\) independent. Then \[
F = \frac{U_1/p_1}{U_2/p_2} \sim \mathcal{F}^{p_1}_{p_2}
\]
Joint distribution of \(\hat{\boldsymbol{\beta}}\) - Proof - Hints
Test Statistic: \[
F
= \frac{1}{p\hat{\sigma}^2}\hat{\boldsymbol{\beta}}^T (\mathbf{X}^T\mathbf{X})\hat{\boldsymbol{\beta}}
\underset{\mathcal{H}_0}{\sim}
\mathcal{F}^p_{n-p}
\]
\(\mathcal{H}_0: \beta_1 = \cdots = \beta_p = 0\) i.e. the model is useless.
If \(\|\hat{\boldsymbol{\epsilon}}\|^2\) is small, then the error is small,
and \(F\) tends to be large, i.e. we tend to reject\(\mathcal{H}_0\):
the model is useful.
If \(\|\hat{\boldsymbol{\epsilon}}\|^2\) is large, then the error is large,
and \(F\) tends to be small, i.e. we tend to not reject\(\mathcal{H}_0\):
the model is rather useless.
Geometrical interpretation
Good model
\(\mathbf{y}\)not too far from \(\mathcal{M}(\mathbf{X})\).
\(\|\hat{\boldsymbol{\epsilon}}\|^2\)not too large compared to \(\|\hat{\mathbf{y}}\|^2\).
\(F\) tends to be large.
Bad model
\(\mathbf{y}\)almost orth. to \(\mathcal{M}(\mathbf{X})\).
\(\|\hat{\boldsymbol{\epsilon}}\|^2\)rather large compared to \(\|\hat{\mathbf{y}}\|^2\).
\(F\) tends to be small.
Simulated Dataset - True and Junk
## Data framedata_all <-data.frame(y_sim = y_sim,x_1 = x_1,x_2 = x_2,x_junk = x_junk)## Fitfit <-lm(y_sim ~ ., data = data_all)## multiple t testsp_values_t <-summary(fit)$coefficients[, 4]names(p_values_t[p_values_t <=0.05])
If \(\|\hat{\mathbf{y}} - \hat{\mathbf{y}}_0\|^2\)small compared to \(\|\mathbf{y} - \hat{\mathbf{y}}\|^2\), then “\(\hat{\mathbf{y}} \approx \hat{\mathbf{y}}_0\)” and the null model is enough.
If \(\|\hat{\mathbf{y}} - \hat{\mathbf{y}}_0\|^2\)large compared to \(\|\mathbf{y} - \hat{\mathbf{y}}\|^2\), then the full model is useful.
Geometrical interpretation
Good model
\(\hat{\mathbf{y}}\)quite different from \(\hat{\mathbf{y}}_0\).
Full model does add information.
Bad model
\(\hat{\mathbf{y}}\)similar to \(\hat{\mathbf{y}}_0\).
\(\hat{\mathbf{y}} - \hat{\mathbf{y}}_0\) and \(\mathbf{y} - \hat{\mathbf{y}}\) are orthogonal
i.e. their covariance is zero
i.e. (Gaussian assumption) they are independent.
\[
\frac{1}{\sigma^2}\|\hat{\mathbf{y}} - \hat{\mathbf{y}}_0 - \mathbf{P}_0^\bot \mathbf{P}\mathbf{X}\boldsymbol{\beta}\|^2
= \frac{1}{\sigma^2}\|\mathbf{P}_0^\bot \mathbf{P}(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})\|^2
\sim \chi^2_{p-p_0}
\] (\(\mathbf{P}_0^\bot \mathbf{P}\) is a projection on a space of dimension \(p - p_0\))
If \(\mathcal{H}_0\) is true, then \(\mathbf{P}_0^\bot \mathbf{P}\mathbf{X}\boldsymbol{\beta} = 0\), hence: