Chapter 4 Estimation preliminaries: review of doubly robust estimators for the average treatment effect

Recall our motivation for doing mediation analysis. We would like to decompose the total effect of a treatment \(A\) on an outcome \(Y\) into effects that operate through a mediator \(M\) vs effects that operate independently of \(M\).

Recall that we define the average treatment effect as \(E(Y_1-Y_0)\), and decompose it as follows

\[\begin{equation*} \E[Y_{1,M_1} - Y_{0,M_0}] = \underbrace{\E[Y_{\color{red}{1},\color{blue}{M_1}} - Y_{\color{red}{1},\color{blue}{M_0}}]}_{\text{natural indirect effect}} + \underbrace{\E[Y_{\color{blue}{1},\color{red}{M_0}} - Y_{\color{blue}{0},\color{red}{M_0}}]}_{\text{natural direct effect}} \end{equation*}\]

To introduce some of the ideas that we will use for estimation of the NDE, let us first briefly discuss estimation of \(\E(Y_1)\) (estimation of \(\E(Y_0)\) can be performed analogously).

First, notice that under the assumption of no unmeasured confounders (\(Y_1\indep A\mid W\)), we have

\[ \E(Y_1) = \E[ \E(Y \mid A=1, W) ]\]

4.1 Option 1: G-computation estimator

The first estimator of \(\E[ \E(Y \mid A=1, W)]\) can be obtained in a three step procedure:

  • Fit a regression for \(Y\) on \(A\) and \(W\)
  • Use the above regression to predict the outcome mean if everyone’s \(A\) is set to \(A=1\)
  • Average these predictions

In formulas, this estimator can be written as \[\frac{1}{n} \sum_{i=1}^n \hat{\E}(Y \mid A_i=1, W_i)\]

  • Note that this is just a plug-in estimator for the above formula (called the g-formula): \(\E[\E(Y \mid A=1, W)]\)
  • This estimator requires that the regression model for \(\hat{\E}(Y \mid A_i=1, W_i)\) is correctly specified.
  • Downside: If we use machine learning for this model, we do not have general theory for computing standard errors and confidence intervals

4.2 Option 2: Inverse probability weighted estimator

An alternative method of estimation can be constructed after noticing that \[\E[\E(Y \mid A=1, W)] = \E \left[\frac{A}{\P(A=1\mid W)} Y \right],\] using the following procedure:

  • Fit a regression for \(A\) and \(W\)
  • Use the above regression to predict the probability of treatment \(A=1\)
  • Compute the inverse probability weights \(A_i / \hat{\P}(A_i =1 \mid W_i)\).
  • This weight will be zero for untreated units, and the inverse of the probability of treatment for treated units.
  • Compute the weighted average of the outcome:

\[\frac{1}{n} \sum_{i=1}^n \frac{A_i}{\hat{\P}(A_i=1 \mid W_i)} Y_i\]

  • This estimator requires that the regression model for \(\hat{\P}(A=1 \mid W_i)\) is correctly specified.
  • Downside: If we use machine learning for this model, we do not have general theory for computing standard errors and confidence intervals

4.3 Option 3: Augmented inverse probability weighted estimator

Fortunately, we can combine these two estimators to get an estimator with improved properties.

The improved estimator can be seen both as a corrected (or augmented) IPW estimator:

\[\begin{equation*} \underbrace{\frac{1}{n} \sum_{i=1}^n \frac{A_i}{\hat{\P}(A_i=1 \mid W_i)} Y_i}_{\text{IPW estimator}} - \underbrace{\frac{1}{n} \sum_{i=1}^n \frac{\hat{\E}(Y \mid A_i=1, W_i)} {\hat{\P}(A_i=1 \mid W_i)}[A_i - \hat{\P}(A_i=1 \mid W_i)]}_{\text{Correction term}} \end{equation*}\]

or

\[\begin{equation*} \underbrace{\frac{1}{n} \sum_{i=1}^n \hat{\E}(Y \mid A_i=1, W_i)}_{\text{G-comp estimator}} + \underbrace{\frac{1}{n} \sum_{i=1}^n \frac{A_i}{\hat{\P}(A_i=1\mid W_i)} [Y_i - \hat{\E}(Y \mid A_i=1, W_i)]}_{\text{Correction term}} \end{equation*}\]

This estimator has some desirable properties:

  • It is robust to misspecification of at most one of the models (outcome or treatment) (Q: can you see why?)
  • It is distributed as a normal random variable as sample size grows. This allows us to easily compute confidence intervals and do hypothesis tests
  • It allows us to use machine learning to estimate the treatment and outcome regressions to alleviate model misspecification bias

Next, we will work towards constructing estimators with these same properties for the mediation parameters that we have introduced.