Module 1: Hypothesis Testing

Determining a Model

Statistical Models represent the data generation process.
- Bernoulli Distribution: Models binary outcomes (success/failure).
  - PMF: $P(X = x) = p^x (1-p)^{1-x}$
- Poisson Distribution: Models count data (number of events in fixed interval).
  - PMF: $P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!}$

Null and Alternative Hypotheses

Null Hypothesis $H_0$: Default assumption (e.g., no effect, no difference).
Alternative Hypothesis $H_a$: Contradicts $H_0$ (e.g., there is an effect).
- Two-tailed Test: Tests for any significant difference.
- One-tailed Test: Tests for a difference in a specific direction.

Test Statistic

A function of sample data used to decide whether to reject $H_0$.
Common test statistics:
- Z-statistic: $z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}$ (known population variance).
- T-statistic: $t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}$ (unknown population variance).

P-Value and Significance Level

P-Value: Probability of observing data at least as extreme as current, assuming $H_0$ is true.
Significance Level $\alpha$: Threshold for rejecting $H_0$ (commonly 0.05).
- Decision Rule:
  - If $\text{P-value} \leq \alpha$: Reject $H_0$.
  - If $\text{P-value} > \alpha$: Fail to reject $H_0$.

Type I and Type II Errors

Type I Error $\alpha$: Rejecting $H_0$ when it's true (false positive).
Type II Error $\beta$: Failing to reject $H_0$ when $H_a$ is true (false negative).

Power of a Test

Power $1 - \beta$: Probability of correctly rejecting $H_0$ when $H_a$ is true.
- Higher power means a lower chance of Type II error.

Likelihood Ratio Test

Compares likelihoods under $H_0$ and $H_a$.
Likelihood Ratio $\Lambda$: $\Lambda = \frac{L(\theta_0)}{L(\hat{\theta})}$
- $L(\theta_0)$: Likelihood under $H_0$.
- $L(\hat{\theta})$: Likelihood under $H_a$ (maximum likelihood estimate).