1. Lecture introduction

This lecture sets up the basics of randomized experiments, hypothesis testing, and data modeling. The next two lectures will then cover further methods for hypothesis testing and approaches to multiple hypothesis testing. They will build on the basic methods in this lecture to achieve better methodological accuracy and more generalizable models.

The discussion will be guided by a study conducted by the Health Insurance Program that offered mammographies for early detection of breast cancer, where the objective is to determine whether offering mammographies will lead to fewer deaths due to breast cancer. From this example, we will discuss two main considerations:

As you follow the discussion videos, your main take-away should not be particular details about this study. Rather, you should focus on developing an intuition on the important considerations for a study, as you may be conducting similar experiments or analyzing similar datasets of your interest in the future.

2. Introducing the mammography study

Breast cancer is one of the most common fatal diseases among women. The earlier breast cancer is detected in a patient, the more likely it is for the patient to have a recovery and thus survive the disease. Screenings, as a result, play a large role in preventing breast cancer fatalities, and women over the age of 50 are nowadays advised to undergo a mammography once every two years.

While it may be intuitive to reason that mammographies prevent breast cancer fatalities through early detection, there are reasons to confirm this with data. For example, a local government may be considering standardizing mammographies as part of a healthcare plan. Having an estimate of the actual effect will allow one to evaluate the benefits of doing so in light of its costs.

3. Introduction to experimental design and hypothesis testing

3.1. Experimental design: variables

Consider the overarching task to determine whether mammographies are effective in preventing breast cancer deaths. An experiment frames this in terms of assessing the effect of one variable to another. The most basic setting has two variables:

The goal of the experimental and statistical procedures is to establish the link between the treatment and the outcome variables. We call the overall procedure the experimental design .

3.2. Treatment and outcome variables in the mammography study

For the mammography study, we determine what is the treatment variable and what is the outcome variable. We start with the outcome variable since it is more straightforward.