Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reverse Causality - control variable

    Hi all,

    Hi All,

    I am carrying out research for my bachelor thesis looking at the effect of gin consumption on health outcomes, using a regional-level panel data set. I am using two measures of health for robustness: ARD (alcohol-related deaths) and BADSAH (bad self-assessed health). I am aware it is an extremely small dataset but that is due to its novelty.


    When running fixed effects regression --> DV: ARD
    Controlling for:
    TEH - total expenditure on healthcare
    GDHI - income
    dbbinge - binge drinking behaviour

    xtreg ARD consumpgin GDHI TEH dbbinge, fe cluster(region2)

    Fixed-effects (within) regression Number of obs = 85
    Group variable: region2 Number of groups = 10

    R-sq: Obs per group:
    within = 0.7920 min = 6
    between = 0.9934 avg = 8.5
    overall = 0.9066 max = 9

    F(4,9) = 799.43
    corr(u_i, Xb) = 0.9085 Prob > F = 0.0000

    (Std. Err. adjusted for 10 clusters in region2)
    --------------------------------------------------------------------------------
    | Robust
    ARD | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
    consumpgin | -2.359009 1.911085 -1.23 0.248 -6.682183 1.964165
    GDHI | .8595775 1.656665 0.52 0.616 -2.888059 4.607214
    TEH | 23.39898 1.606058 14.57 0.000 19.76582 27.03213
    dbbinge | 3.038557 1.332999 2.28 0.049 .0231037 6.054011
    _cons | 1099.962 109.1143 10.08 0.000 853.1284 1346.796
    ---------------+----------------------------------------------------------------
    sigma_u | 1671.7861
    sigma_e | 155.96611
    rho | .9913715 (fraction of variance due to u_i)
    --------------------------------------------------------------------------------

    TEH is significantly correlated with alcohol-related deaths - which could suggest reverse causality, as it may be regions are spending more on healthcare due to an increased number of alcohol-related deaths.

    Please, if possible, can I have some advice on how to test for this!

    Many thanks,
    Carys Wright



  • #2
    I doubt you can do much to resolve this issue with this data set. Your intuition that total healthcare expenditures is reverse caused, not by alcohol related deaths per se but by alcohol related illnesses, is almost certainly correct. Many alcohol related deaths are preceded by lengthy and expensive alcohol related illnesses. But I doubt your data set has the information needed to tease this out.

    I do notice that you have multiple observations per region. Are these at different time periods? Is there a time variable in your data set? If so you might want to do something like regress TEH as an outcome variable using one or two lagged values of ginconsumption and ARD as predictors. A strong association in this analysis would be suggestive that your assumption about reverse causation is correct. But to really do this well you would need a larger data set so that you could throw in several lags and still have enough data left to talk about.

    By the way, you should not be using clustered standard errors here. You have only 10 regions. Clustered standard errors are only valid with a large number of clusters. While there is no universally accepted number that defines "large" enough for this purpose, I think nearly everyone agrees that 10 is too few.

    Comment


    • #3
      Hi Clyde,

      Thank you so much for your response.

      Yes I do have a time variable (year). Would I lag ARD as well as ginconsumption?

      'xtreg TEH consumptiongin l.consumptiongin ARD, fe'

      When using l2. (two lags) there are insufficient observations - so you are correct I would need a larger data set to do 2 lags.

      Regarding the clustered standard errors - thank you for letting me know. I was advised to use them by a Professor at my University.

      Many thanks,

      Carys Wright

      Comment


      • #4
        Oh, yes, sorry. I didn't realize my sentence was syntactically ambiguous about the scope of "lagged values of." Yes you would lag the ARD and the ginconsumption. The idea is to see whether current total health expenditures are predicted by earlier values of alcohol related indices. Since effects follow, and don't precede, causes, this could shed light on the existence of such a "reverse causality" effect.

        Comment

        Working...
        X