Simulating DAGs
I came across a cool example for a DAG (thanks to a comment by Paul Bürkner) while working on some simulation and to me it became a somewhat minimal example for how DAGs work, how they can help inform modeling decisions and what can go wrong if one ignores causality. This post is basically about this paper by Cinelli, Forney, and Pearl (2020) but as it turns out, Richard McElreath spent an entire lecture on the same paper as well. While you should read the paper itself, Richard’s lectures are a treat and I can not recommend them enough.
Before we start, here are the libraries I will use throughout the rest of the code:
library(ggdag) # uses dagitty for DAG logic and creates nice plots from them
library(brms) # used to fit bayesian models instead of raw stan
library(GGally) # used for the nice pairs plot
set.seed(1235813)
The DAG
The paper shows examples for good, neutral and bad controls. Good controls being those where asymptotic bias is reduced by controlling for the respective variable and bad controls being those where bias is increased. There are two kinds of neutral controls in the paper: Those that increase and those that decrease precision when controlled for. By combining models 1, 8, 9 and 17 from the paper (in case you want to look them up) we get a very compact DAG that allows us to include all four kinds of controls presented in the paper at the same time. It’s also a cool dag for parameter recovery simulation studies, as there is a single effect we are interested in.
The DAG below shows the outcome y
, the treatment/exposure x
we’d like to
estimate the average causal effect of (see the paper for the difference between
average and direct effect) and the four additional control variables z1, z2, z3, z4
.
tidy_ggdag <- tidy_dagitty(
dagify(
y ~ x + z1 + z2,
x ~ z1 + z3,
z4 ~ x + y,
outcome = "y",
exposure = "x"
)
)
ggdag(tidy_ggdag, layout = "circle") + theme_dag()
Something that dagitty
or ggdag
allow us to do automatically is to calculate
the so called adjustment set. Given an exposure and outcome, the adjustment set
tells us which additional variables to add to a model, to receive an unbiased
estimator for the treatment effect.
ggdag_adjustment_set(tidy_ggdag, node_size = 14) +
theme(legend.position = "bottom")
In our case, the only necessary control for an unbiased estimation is z1
. Note
however, that there is more to an estimation besides being unbiased as we will see
in a second.
Data Generation
Alright, let’s generate some data for this DAG. To do this I simply used the formulas from the DAGs definition as a basis for the data generation. To keep things simple, I used normal distributions and simple additive linear combinations of the variables. However DAGs do not rely on the type of model, so you can do the same for more complex model types. The chosen coefficients are just for convenience. While they influence the results, they do not change the overall behavior. The sample size was set rather low, as the observed differences in precision decrease with sample size and I wanted to make them more visible.
N = 100
z1 = rnorm(N)
z2 = rnorm(N)
z3 = rnorm(N)
x = rnorm(N, mean = (z1 + z3), sd = 1)
y = rnorm(N, mean = (x + z1 + z2), sd = 1)
z4 = rnorm(N, mean=(y + x), sd = 1)
data = data.frame(x = x,
z1 = z1,
z2 = z2,
z3 = z3,
z4 = z4,
y = y)
ggpairs(data)
You can see some quite high correlations in the pairs plot, as we would expect from the way the data were generated. However, the correlations alone would not tell us which variables to include or exclude from our models. Only the causal model of the DAG can help us with that.
Modeling
After the data generation let’s inspect how including or excluding the different
variables changes our model estimations.
One thing to remember is that we are interested in estimating the effect of x
on y
and not on the effects of the four control variables.
These are also the results of single models fit to a single dataset. For proper
parameter recovery we would have to repeat these many times over, something I
might be working on but felt a little much for this post.
True Model
As a baseline, we will start with the true model. True in the sense, that it
recreates the original data generating process (of y
). I refrained from adding
models for x
and z4
to keep the modeling simpler.
As we would expect from the true model, it recovers the effect for x
quite well.
true_model <- brm(
y ~ x + z1 + z2,
family = gaussian(),
data=data,
cores = 4
)
fixef(true_model, pars = c("x"))
## Estimate Est.Error Q2.5 Q97.5
## x 1.017021 0.07421161 0.8687428 1.160975
Z1 Model
For the fist bad model, we leave out z1
from the controls. This opens up the
backdoor path x -> z1 -> y
and should bias the estimation of the effect x
has
on y
.
And lo and behold, that is exactly what we can observe from the summary and
posterior intervals. The model is quite certain about the effect of x
however,
it is rather off target. There is nothing in the summary that would hint at a
sampling problem or some kind of misspecification. So without our DAG, we might
just roll with it and be confident in an effect that is roughly overestimated
by one third.
bad_model1 <- brm(
y ~ x + z2,
family = gaussian(),
data=data,
cores = 4
)
fixef(bad_model1, pars = c("x"))
## Estimate Est.Error Q2.5 Q97.5
## x 1.270522 0.07936353 1.116095 1.432161
Z2 Model
Next, we leave out z2
from the controls. While this does not open any
backdoor paths, it increases the variation of y
and thus should reduce the
precision of the estimated effect of x
. This is one of the differences that
reduces with growing sample sizes.
And again, we can observe just that. While the true effect of 1 is close to the mean of the posterior interval, the interval itself is wider than for the true model.
bad_model2 <- brm(
y ~ x + z1,
family = gaussian(),
data=data,
cores = 4
)
fixef(bad_model2, pars = c("x"))
## Estimate Est.Error Q2.5 Q97.5
## x 1.008441 0.1010231 0.8123737 1.209088
Z3 Model
Onward we go to include z3
, which again does not open any backdoor paths.
This time, it decreases the variation of x
and should reduce the
precision of the estimated effect of x
that way.
Again, this is one of the differences that reduces with growing sample sizes.
As with the model for z2
the interval mean is close to the true effect of 1 but
the interval itself is wider again.
bad_model3 <- brm(
y ~ x + z1 + z2 + z3,
family = gaussian(),
data=data,
cores = 4
)
fixef(bad_model3, pars = c("x"))
## Estimate Est.Error Q2.5 Q97.5
## x 0.9835386 0.1048379 0.7787038 1.190063
Z4 Model
Finally we arrive at the final boss of the DAG, the collider z4
. By controlling
for it we open the backdoor path x -> z4 -> y
.
While there is slightly more to this, you’ll have to read the paper for that.
Noticing the backdoor path will be enough for our case here.
As with opening backdoor paths, we would expect the resulting estimation to be biased and again, we get what we were hoping for. This time, a lot of the posterior interval even lies on the wrong side of 0. And as before, there is nothing in the summary and diagnostics to warn us of our grave error.
bad_model4 <- brm(
y ~ x + z1 + z2 + z4,
family = gaussian(),
data=data,
cores = 4
)
fixef(bad_model4, pars = c("x"))
## Estimate Est.Error Q2.5 Q97.5
## x -0.03079065 0.1093182 -0.2445922 0.1817436
Model Comparison
One last thing we can do with the five models we have on hand is to use some kind of model comparison to figure out, which of those models is the best. For this we will use the elpd based loo function to compare the models on their out of sample predictions.
loo_compare(
loo(true_model),
loo(bad_model1),
loo(bad_model2),
loo(bad_model3),
loo(bad_model4)
)
## elpd_diff se_diff
## bad_model4 0.0 0.0
## true_model -40.9 7.7
## bad_model3 -42.2 7.8
## bad_model1 -61.1 7.7
## bad_model2 -72.0 8.1
And, we have a clear winner. bad_model4
has a significant advantage over all
other models.
I guess that settles it.
Turns out we have been all wrong, x
has barely any effect on y
and if any,
it is probably negative.
And with that we will go and publish our findings for fame and fortune. But for some reason, when it gets really quiet, there is this wailing from beyond the veil. It almost sounds like the tormented souls of a million ignored DAGs.
Don’t ignore your DAGs, it makes them sad.
References
Cinelli, Carlos, Andrew Forney, and Judea Pearl. 2020. “A Crash Course in Good and Bad Controls.” SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3689437.