Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Determining direction of causality from FE/FD

    Hi everyone,

    I am trying to identify the causal effect that X has on y.
    I am running a pooled OLS for my unbalanced panel data with years 2011, 2012, 2013, 2016, 2017, 2018.

    y_{it} = a + BX_{it} + Controls + d_{t} + u_{it}

    I am aware that there are many reasons that the Pooled OLS might be biased but I want to focus on one in particular - reverse causality. I hypothesise that y_{it} does not contemporaneously cause X_{it} (i.e. current y will not have a causal effect on current X). But I think lagged values of y will have a causal effect on current values of X. Specifically, I think that distantly lagged values of y_{i, t-20} will affect X_{it} and lagged values of y will be highly correlated with current y causing bias.

    In summary: I am trying to test if X has a causal effect on y. Current y will not have a causal effect on current X, but lagged y (by 15-20 years) will, and this is highly correlated with current y.

    My first question: is this reverse causality? Or is it in fact omitted variable bias (I have omitted lagged y)

    My proposed solution is to use a FE/FD model. This might sort out other problems with the Pooled OLS but will it remove the problem described above?

    My logic is as follows: FE/FD will measure changes in variables within a person. If I find a positive correlation between a change in X and a change in y (using a fixed effects/first differenced model), I can assume that the change in X has caused the change in y; because y can only affect X in the long-term (say 20 years) whereas X can contemporaneously affect y.

    In theory if I could run the following regression:

    y_{it} = a + BX_{it} + Controls + u_{it} + y_{i,t-20} for t=2018.

    I think this would remove the problem, but in practice I do not have any lagged values beyond 7 years, which is unlikely to be long enough.
    Also, the assumption that y does not contemporaneously affect X, but that lagged y does is based on intuitive/theoretical arguments.

    Any thoughts on this question would be much appreciated.












  • #2
    If the feedback from y to x happens with a long delay, then you can just apply FE and FD. I would do both, and I'd describe them as "estimators" not models. Both require versions of strict exogeneity, but if it takes a long time for a shock u_{it} to feed back into X_{it} then you should be fine. As a third check, you can first difference and then apply instrumental variables, as in Arellano-Bond, but without a lagged dependent variable.

    JW

    Comment


    • #3
      Dear Professor Wooldridge,

      Thank you very much for your response!

      Comment

      Working...
      X