Module 1: Hypothesis Testing
Determining a Model
- Statistical Models represent the data generation process.
- Bernoulli Distribution: Models binary outcomes (success/failure).
- PMF: $P(X = x) = p^x (1-p)^{1-x}$
- Poisson Distribution: Models count data (number of events in fixed interval).
- PMF: $P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!}$
Null and Alternative Hypotheses
- Null Hypothesis $H_0$: Default assumption (e.g., no effect, no difference).
- Alternative Hypothesis $H_a$: Contradicts $H_0$ (e.g., there is an effect).
- Two-tailed Test: Tests for any significant difference.
- One-tailed Test: Tests for a difference in a specific direction.
Test Statistic
- A function of sample data used to decide whether to reject $H_0$.
- Common test statistics:
- Z-statistic: $z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}$ (known population variance).
- T-statistic: $t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}$ (unknown population variance).
P-Value and Significance Level
- P-Value: Probability of observing data at least as extreme as current, assuming $H_0$ is true.
- Significance Level $\alpha$: Threshold for rejecting $H_0$ (commonly 0.05).
- Decision Rule:
- If $\text{P-value} \leq \alpha$: Reject $H_0$.
- If $\text{P-value} > \alpha$: Fail to reject $H_0$.
Type I and Type II Errors
- Type I Error $\alpha$: Rejecting $H_0$ when it's true (false positive).
- Type II Error $\beta$: Failing to reject $H_0$ when $H_a$ is true (false negative).
Power of a Test
- Power $1 - \beta$: Probability of correctly rejecting $H_0$ when $H_a$ is true.
- Higher power means a lower chance of Type II error.
Likelihood Ratio Test
- Compares likelihoods under $H_0$ and $H_a$.
- Likelihood Ratio $\Lambda$: $\Lambda = \frac{L(\theta_0)}{L(\hat{\theta})}$
- $L(\theta_0)$: Likelihood under $H_0$.
- $L(\hat{\theta})$: Likelihood under $H_a$ (maximum likelihood estimate).