Gaussian Model
\[y_i = \beta_0 + \beta_1 x_i + \epsilon_i, \quad \forall 1 \leq i \leq n\]
- \(y_i\): quantitative response for \(i\) (
sales
)
- \(x_i\): quantitative predicting variable for \(i\) (
TV
)
- \(\epsilon_i\): “error” for \(i\)
- random variable (r.v.)
- independent, identically distributed (iid), Gaussian
- Centered, with variance \(\sigma^2\):
\[
\epsilon_i
\underset{\text{iid}}{\operatorname{\sim}}
\mathcal{N}(0, \sigma^2)
\quad
\text{for}
\quad
1 \leq i \leq n
\]
- [Note (H1), (H2) and (H3) are still verified.]
Reminder: Gaussian Distribution 1/2
Let \(X\) be a Gaussian r.v. with expectation \(\mu\) and variance \(\sigma^2\): \[
X \sim \mathcal{N}(\mu, \sigma^2).
\]
It admits a probability density function (pdf): \[
f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right)
\quad
\forall x \in \mathbb{R},
\]
And a characteristic function: \[
\phi_{X}(t) = \mathbb{E}[e^{itX}] = e^{i\mu t - \frac{\sigma^2 t^2}{2}}
\quad
\forall t \in \mathbb{R}.
\]
Reminder: Gaussian Distribution 2/2
Distribution of \(\mathbf{y}\)
- Model: \[
y_i = \beta_0 + \beta_1 x_i + \epsilon_i
\quad
\text{with}
\quad
\epsilon_i
\underset{\text{iid}}{\operatorname{\sim}}
\mathcal{N}(0, \sigma^2)
\quad
\text{for}
\quad
1 \leq i \leq n
\]
- Distribution of \(y_i\) ?
Reminder:
Linear function of a Gaussian
If \(X \sim \mathcal{N}(\mu, \sigma^2)\), then for any \((a, b) \in \mathbb{R}^2\),
\[
Y = aX + b \sim \mathcal{N}(a\mu + b, a^2\sigma^2).
\]
- Proof: Exercise.
[Hint: use the characteristic function.]
- \(y_i = \beta_0 + \beta_1 x_i + \epsilon_i\)
Apply the property with \(a = 1\) and \(b = \beta_0 + \beta_1 x_i\).
[Reminder: \(x_i\) are assumed to be non random here.]
Distribution of \(\mathbf{y}\)
Model: \[
y_i = \beta_0 + \beta_1 x_i + \epsilon_i
\quad
\text{with}
\quad
\epsilon_i
\underset{\text{iid}}{\operatorname{\sim}}
\mathcal{N}(0, \sigma^2)
\quad
\text{for}
\quad
1 \leq i \leq n
\]
Distribution of \(y_i\): \[
y_i
\underset{\text{iid}}{\operatorname{\sim}}
\mathcal{N}(\beta_0 + \beta_1 x_i, \sigma^2)
\quad
\text{for}
\quad
1 \leq i \leq n
\]
- Question: What is the likelihood of \(\mathbf{y} = (y_1, \dotsc, y_n)^T\) ?
Maximum Likelihood Estimators
Reminder: Likelihood
If \(X\) is a rv that admits a density \(f_{\theta}\) that depends on a parameter \(\theta\), then the likelihood function is the density, seen as a function of \(\theta\) given an outcome \(X = x\): \[
\theta \mapsto L(\theta | x) = f_{\theta}(x).
\]
Example:
Assume that we observe one realization \(x\) of a Gaussian random variable \(X\), with unknown expectation \(\mu\), but known variance \(\sigma^2 = 1\).
Then the Likelihood of the observation \(x_{obs}\) is the function: \[
\mu \to \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(x_{obs} - \mu)^2}{2\sigma^2}\right).
\]
Reminder: Maximum Likelihood 1/2
The Maximum Likelihood (ML) estimator \(\hat{\theta}\) of the parameter \(\theta\) is the one that maximizes the likelihood function in \(\theta\), given an observation \(x\):
\[
\hat{\theta} = \underset{\theta}{\operatorname{argmax}} L(\theta | x).
\] Example:
Let \(x_{obs}\) be one realization of a Gaussian r.v. \(X\), with unknown expectation \(\mu\), but known variance \(\sigma^2 = 1\).
Then the ML estimator \(\hat{\mu}\) of \(\mu\) is: \[
\hat{\mu} = \underset{\mu \in \mathbb{R}}{\operatorname{argmax}} \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(x_{obs} - \mu)^2}{2\sigma^2}\right).
\]
Reminder: Maximum Likelihood 2/2
Reminder: Maximum Likelihood 2/2
Reminder: Maximum Likelihood 2/2
Likelihood of \(\mathbf{y}\)
Distribution of \(y_i\): \[
y_i
\underset{\text{iid}}{\operatorname{\sim}}
\mathcal{N}(\beta_0 + \beta_1 x_i, \sigma^2)
\quad
\text{for}
\quad
1 \leq i \leq n
\]
Likelihood of \(\mathbf{y} = (y_1, \dotsc, y_n)^T\):
\[
\begin{aligned}
&L(\beta_0, \beta_1, \sigma^2 | y_1, \dotsc, y_n)
= \prod_{i = 1}^n L(\beta_0, \beta_1, \sigma^2 | y_i) & [ind.]\\
\end{aligned}
\]
and \[
\begin{aligned}
L(\beta_0, \beta_1, \sigma^2 | y_i)
= \cdots
\end{aligned}
\]
Likelihood of \(\mathbf{y}\)
Distribution of \(y_i\): \[
y_i
\underset{\text{iid}}{\operatorname{\sim}}
\mathcal{N}(\beta_0 + \beta_1 x_i, \sigma^2)
\quad
\text{for}
\quad
1 \leq i \leq n
\]
Likelihood of \(\mathbf{y} = (y_1, \dotsc, y_n)^T\):
\[
\begin{aligned}
&L(\beta_0, \beta_1, \sigma^2 | y_1, \dotsc, y_n)
= \prod_{i = 1}^n L(\beta_0, \beta_1, \sigma^2 | y_i) & [ind.]\\
\end{aligned}
\]
and \[
\begin{aligned}
L(\beta_0, \beta_1, \sigma^2 | y_i)
= \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(y_i - \beta_0 - \beta_1 x_i)^2}{2\sigma^2}\right)
\end{aligned}
\]
Likelihood of \(\mathbf{y}\)
- Likelihood of \(\mathbf{y} = (y_1, \dotsc, y_n)^T\): \[
\begin{aligned}
L(\beta_0, \beta_1, \sigma^2 | y_1, \dotsc, y_n)
&= \prod_{i = 1}^n L(\beta_0, \beta_1, \sigma^2 | y_i)\\
&= \prod_{i = 1}^n \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(y_i - \beta_0 - \beta_1 x_i)^2}{2\sigma^2}\right)\\
&= \frac{1}{\left(\sqrt{2\pi\sigma^2}\right)^n}\exp\left(-\frac{1}{2\sigma^2}\sum_{i = 1}^n (y_i - \beta_0 - \beta_1 x_i)^2\right)\\
\log L(\beta_0, \beta_1, \sigma^2 | y_1, \dotsc, y_n) &= \cdots
\end{aligned}
\]
Log-Likelihood of \(\mathbf{y}\)
\[
\log L(\beta_0, \beta_1, \sigma^2 | y_1, \dotsc, y_n)
= - \frac{n}{2} \log(2\pi\sigma^2) - \frac{1}{2\sigma^2}\sum_{i = 1}^n (y_i - \beta_0 - \beta_1 x_i)^2
\]
- Question: What are the Maximum Likelihood estimators of \(\beta_0\), \(\beta_1\) and \(\sigma^2\) ?
- We need to find: \[
(\hat{\beta}_0^{ML}, \hat{\beta}_1^{ML}, \hat{\sigma}^2_{ML})
= \underset{(\beta_0, \beta_1, \sigma^2) \in \mathbb{R}^2\times\mathbb{R}_+^*}{\operatorname{argmax}}
\log L(\beta_0, \beta_1, \sigma^2 | \mathbf{y})
\]
ML Estimators - \(\beta_0\) and \(\beta_1\)
\[
\log L(\beta_0, \beta_1, \sigma^2 | \mathbf{y})
= - \frac{n}{2} \log(2\pi\sigma^2) - \frac{1}{2\sigma^2}\sum_{i = 1}^n (y_i - \beta_0 - \beta_1 x_i)^2
\]
ML Estimators - \(\beta_0\) and \(\beta_1\)
\[
\log L(\beta_0, \beta_1, \sigma^2 | \mathbf{y})
= - \frac{n}{2} \log(2\pi\sigma^2) - \frac{1}{2\sigma^2}\underbrace{\sum_{i = 1}^n (y_i - \beta_0 - \beta_1 x_i)^2}_{\text{Sum of Squares } SS(\beta_0, \beta_1)}
\]
- For any \(\sigma^2 > 0\), \[
\begin{aligned}
(\hat{\beta}_0^{ML}, \hat{\beta}_1^{ML})
&= \underset{(\beta_0, \beta_1) \in \mathbb{R}^2}{\operatorname{argmax}}
\log L(\beta_0, \beta_1, \sigma^2 | \mathbf{y})\\
&= \underset{(\beta_0, \beta_1) \in \mathbb{R}^2}{\operatorname{argmin}} SS(\beta_0, \beta_1)\\
&= (\hat{\beta}_0^{OLS}, \hat{\beta}_1^{OLS})
\end{aligned}
\]
ML Estimators - \(\beta_0\) and \(\beta_1\)
\[
\begin{aligned}
&\log L(\beta_0, \beta_1, \sigma^2 | \mathbf{y}) = \\
&\qquad - \frac{n}{2} \log(2\pi\sigma^2) - \frac{1}{2\sigma^2}\underbrace{\sum_{i = 1}^n (y_i - \beta_0 - \beta_1 x_i)^2}_{\text{Sum of Squares } SS(\beta_0, \beta_1)}\\
\end{aligned}
\]
\[
(\hat{\beta}_0^{ML}, \hat{\beta}_1^{ML}) = (\hat{\beta}_0^{OLS}, \hat{\beta}_1^{OLS})
\]
\(\hookrightarrow\) The ML estimators are the same as the OLS estimators.
ML Estimators - \(\sigma^2\)
\[
\begin{aligned}
\hat{\sigma}^2_{ML}
&= \underset{\sigma^2 \in \mathbb{R}_+^*}{\operatorname{argmax}}
\log L(\hat{\beta}_0, \hat{\beta}_1, \sigma^2 | \mathbf{y})\\
&= \underset{\sigma^2 \in \mathbb{R}_+^*}{\operatorname{argmax}}
- \frac{n}{2} \log(2\pi\sigma^2) - \frac{1}{2\sigma^2}\sum_{i = 1}^n (y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i)^2\\
&= \underset{\sigma^2 \in \mathbb{R}_+^*}{\operatorname{argmin}}
\frac{n}{2} \log(2\pi\sigma^2) + \frac{1}{2\sigma^2}\sum_{i = 1}^n \hat{\epsilon}_i^2\\
\end{aligned}
\]
ML Estimators - \(\sigma^2\)
\[
\frac{\partial}{\partial \sigma^2} \left[\frac{n}{2} \log(2\pi\sigma^2) + \frac{1}{2\sigma^2}\sum_{i = 1}^n \hat{\epsilon}_i^2\right]
= \cdots
= 0
\]
ML Estimators - \(\sigma^2\)
\[
\begin{aligned}
\frac{\partial}{\partial \sigma^2} \left[\frac{n}{2} \log(2\pi\sigma^2) + \frac{1}{2\sigma^2}\sum_{i = 1}^n \hat{\epsilon}_i^2\right]
&= \frac{n}{2} \frac{1}{\sigma^2} - \frac{1}{2\sigma^4}\sum_{i = 1}^n \hat{\epsilon}_i^2 = 0
\end{aligned}
\] And we get:
\[
\hat{\sigma}^2_{ML} = \frac{1}{n}\sum_{i = 1}^n \hat{\epsilon}_i^2
\]
Distribution of the Coefficients - \(\sigma^2\) known
Moments of the estimators
- We already know the moments of the estimators:
\[
\begin{aligned}
\mathbb{E}\left[
\begin{pmatrix}
\hat{\beta}_0 \\
\hat{\beta}_1
\end{pmatrix}
\right]
&=
\begin{pmatrix}
\beta_0\\
\beta_1
\end{pmatrix}
\\
\mathbb{Var}\left[
\begin{pmatrix}
\hat{\beta}_0 \\
\hat{\beta}_1
\end{pmatrix}
\right]
&=
\frac{\sigma^2}{n}
\begin{pmatrix}
1 + \frac{\bar{x}^2}{s_{\mathbf{x}}^2} & -\frac{\bar{x}}{s_{\mathbf{x}}^2}\\
-\frac{\bar{x}}{s_{\mathbf{x}}^2} & \frac{1}{s_{\mathbf{x}}^2}
\end{pmatrix}
= \sigma^2 \mathbf{V}_n
\end{aligned}
\]
- What can we say about their distribution ?
Distribution of \((\hat{\beta}_0, \hat{\beta}_1)\)
When the variance \(\sigma^2\) is known, we find that the vector of estimators \((\hat{\beta}_0, \hat{\beta}_1)\) is a Gaussian vector, with expectation \((\beta_0, \beta_1)\) and variance \(\sigma^2 \mathbf{V}_n\):
\[
\begin{pmatrix}
\hat{\beta}_0 \\
\hat{\beta}_1
\end{pmatrix}
\sim
\mathcal{N}\left(
\begin{pmatrix}
\beta_0\\
\beta_1
\end{pmatrix};
\sigma^2 \mathbf{V}_n
\right)
\]
- Proof delayed to next chapter (multivariate regression).
Simulated dataset
Simulate according to the model : \[\mathbf{y} = -2 \cdot \mathbb{1} + 3 \cdot \mathbf{x} + \boldsymbol{\epsilon}\]
Simulated dataset : replicates
Simulate according to the model : \[\mathbf{y} = -2 \cdot \mathbb{1} + 3 \cdot \mathbf{x} + \boldsymbol{\epsilon}\]
Simulated dataset : replicates
Simulate according to the model : \[\mathbf{y} = -2 \cdot \mathbb{1} + 3 \cdot \mathbf{x} + \boldsymbol{\epsilon}\]
Simulated dataset : replicates
Simulate according to the model : \[\mathbf{y} = -2 \cdot \mathbb{1} + 3 \cdot \mathbf{x} + \boldsymbol{\epsilon}\]
![]()
\[
\hat{\beta}_0 \sim \mathcal{N}\left(\beta_0, \frac{\sigma^2}{n}\left(1 + \frac{\bar{x}^2}{s_{\mathbf{x}}^2}\right)\right)
\quad
\hat{\beta}_1 \sim \mathcal{N}\left(\beta_1, \frac{\sigma^2}{n}\frac{1}{s_{\mathbf{x}}^2}\right)
\]
Distribution of the Coefficients - \(\sigma^2\) unknown
Distribution of \((\hat{\beta}_0, \hat{\beta}_1)\)
When \(\sigma^2\) is known:
\[
\begin{pmatrix}
\hat{\beta}_0 \\
\hat{\beta}_1
\end{pmatrix}
\sim
\mathcal{N}\left(
\begin{pmatrix}
\beta_0\\
\beta_1
\end{pmatrix};
\sigma^2 \mathbf{V}_n
\right)
\qquad
\mathbf{V}_n= \frac{1}{n}
\begin{pmatrix}
1 + \frac{\bar{x}^2}{s_{\mathbf{x}}^2} & -\frac{\bar{x}}{s_{\mathbf{x}}^2}\\
-\frac{\bar{x}}{s_{\mathbf{x}}^2} & \frac{1}{s_{\mathbf{x}}^2}
\end{pmatrix}
\]
i.e.
\[
\frac{\hat{\beta}_0 - \beta_0}{\sqrt{\sigma^2(1 + \bar{x}^2/s_{\mathbf{x}}^2)/n}}
\sim
\mathcal{N}(0, 1)
\qquad
\frac{\hat{\beta}_1 - \beta_1}{\sqrt{\sigma^2(1/s_{\mathbf{x}}^2)/n}}
\sim
\mathcal{N}(0, 1)
\]
- Problem \(\sigma^2\) is generally unknown.
- Solution Replace \(\sigma^2\) by \(\hat{\sigma}^2\).
- How is the distribution changed ?
Distribution of \((\hat{\beta}_0, \hat{\beta}_1)\)
When \(\sigma^2\) is known:
\[
\frac{\hat{\beta}_0 - \beta_0}{\sqrt{\sigma^2(1 + \bar{x}^2/s_{\mathbf{x}}^2)/n}}
\sim
\mathcal{N}(0, 1)
\qquad
\frac{\hat{\beta}_1 - \beta_1}{\sqrt{\sigma^2(1/s_{\mathbf{x}}^2)/n}}
\sim
\mathcal{N}(0, 1)
\]
When \(\sigma^2\) is unknown: \[
\frac{\hat{\beta}_0 - \beta_0}{\sqrt{\hat{\sigma}^2(1 + \bar{x}^2/s_{\mathbf{x}}^2)/n}}
\sim
\mathcal{T}_{n-2}
\qquad
\frac{\hat{\beta}_1 - \beta_1}{\sqrt{\hat{\sigma}^2(1/s_{\mathbf{x}}^2)/n}}
\sim
\mathcal{T}_{n-2}
\]
- Replace standard Gaussian by Student distribution.
Reminder: Student Distribution 1/2
Let \(Z\) be a standard normal rv : \(Z \sim \mathcal{N}(0, 1)\),
and \(X\) a chi squared rv with \(p\) degrees of freedom: \(X \sim \chi^2_p\), \(Z\) and \(X\) independent. Then \[
T = \frac{Z}{\sqrt{X/p}} \sim \mathcal{T}_p
\]
is a Student r.v. with \(p\) degrees of freedom.
We have (for \(p > 2\)): \[
\mathbb{E}[T] = 0
\quad
\mathbb{Var}[T] = \frac{p}{p-2}.
\]
It converges in distribution towards the standard Gaussian.
Reminder: Student Distribution 2/2
Joint Distribution of the Coefficients - \(\sigma^2\) unknown
Distribution of \((\hat{\beta}_0, \hat{\beta}_1)\)
When \(\sigma^2\) is known:
\[
\hat{\boldsymbol{\beta}} =
\begin{pmatrix}
\hat{\beta}_0 \\
\hat{\beta}_1
\end{pmatrix}
\sim
\mathcal{N}\left(
\boldsymbol{\beta} =
\begin{pmatrix}
\beta_0\\
\beta_1
\end{pmatrix};
\sigma^2 \mathbf{V}_n
\right)
\]
When \(\sigma^2\) is unknown: \[
\frac{1}{2\hat{\sigma}^2}\left(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right)^T \mathbf{V}_n^{-1}\left(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right)
\sim
\mathcal{F}^2_{n-2}
\]
- with \(\mathcal{F}^2_{n-2}\) the Fisher distribution.
Reminder: Fisher Distribution 1/2
Let \(U_1 \sim \chi^2_{p_1}\) and \(U_2 \sim \chi^2_{p_2}\), \(U_1\) and \(U_2\) independent. Then \[
F = \frac{U_1/p_1}{U_2/p_2} \sim \mathcal{F}^{p_1}_{p_2}
\]
is a Fisher r.v. with \((p_1, p_2)\) degrees of freedom.
We have (for \(p_2 > 2\)): \[
\mathbb{E}[F] = \frac{p_2}{p_2 - 2}.
\]
When \(p_2\) goes to infinity, \(\mathcal{F}^{p_1}_{p_2}\) converges in distribution to \(\frac{1}{p_1}\chi^2_{p_1}\).
Reminder: Fisher Distribution 2/2