Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Testing for self-selection bias after differences-in-differences

    I have ran a differences-in-differences (DID) model to measure the impact of a policy on breastfeeding rates on the state-level with multiple time periods (staggered policy introduction) using panel data. I control for state fixed effects and include time dummies as well as my time varying and time invarying variables. I would like to test the main assumption behind DID, the parallel trend assumption, and I have seen some papers test for that by introducing leads and lags. I have done this and it shows no anticipatory effects but I'm still concerned about self-selection. For example, whether states that need the policy are more likely to adopt a policy state-level or are more breastfeeding friendly.
    I want to run a probit model with the outcome being the policy variable (=0 for control states without policy and treatment states before the policy change, = 1 in the year of adoption, = . after adoption) with the explanatory variables as the same in my DID model including breastfeeding. I don't know, however, if I should include year dummies and state fixed effects ( i. state).
    I have ran pooled, clustered by state, and panel probit models and I can't seem to get any clear results (I get "convergence not achieved" or some variable perfectly predicts the outcome) unless I do not include state fixed effects and year dummies. In this case, breastfeeding is signficant in my clustered probit model meaning that there might be some self-selection, I presume?

    Does anyone have any insight or recommendations on this?
    Should I include state fixed effects and year dummies in my probit models or is my lags/leads model enough to prove absence of self-selection?

    I apologize if it doesn't make sense but I'm happy to explain more!

    Thank you for your help and time.

  • #2
    Dear Surya,

    Let me see if I understand this. You are going to run a probabilistic regression (a probit) of whether somebody is elected for the program or not. That would be the appropriate technique. If there is no self-selection, then, most of the variables will not be statistically significant. I think that you should run several regressions with this. I usually run a logit, but a probit is fine. I think there is a big discussion on this mostly everywhere (here, stackexchange, books, etc). I would run many logits/probits. Where you constantly increase the number of variables you add. I think that fixed effects should be interesting to look at, but in the end what you are trying to explain is whether somebody makes the decision or are more prone to join the program. The year can also be important, because maybe that year something happened in that region and that is why the results are like that.

    I would do several regressions and report them all. The regressions from 1 to n (maybe 4 or 5) should differ from each other in the number of variables and number 5 should include all the ones that you think make the person want to join the program. If the R-2 is very low and the variables provide results that are not statistically significant, you are good to go and you have reasonable evidence to believe there is no self-selection.

    For example, I wrote my master's paper on the "Impact of Capital Expenditure on the Probability of Reelection of Mayors at the District level in Peru". I ran a logit in order to estimate the probability of them being reelection for office, based on how well they managed their budget. My dependent variable was "1 for reelected and 0 otherwise".

    In order to control for self-selection, to model the decision of running again for mayor, I ran another logit where I included the same variables and my dependent variable was 1 if the incumbent decided to RUN for reelection and 0 otherwise. As you can see, the difference between the two regressions was the dependent variable I chose. One indicated whether they were reelected or not and the second regression indicated whether they decided to run or not for reelection.

    Here is a link to my paper if you want to take a look at it.

    https://www.dropbox.com/s/gy6x0qlydi...hesis.pdf?dl=0

    Kind regards,

    Jorge


    Last edited by Jorge L. Guzman; 21 Jan 2016, 08:25.

    Comment


    • #3
      Hi Jorge,

      Thanks for the help! I see that now including state fixed effects and year dummies are needed in my model. I have ran many logits/probits but the problem I am now getting is that when I include my year dummies, the panel logit model does not converge. Any ideas why or suggestions?

      Thanks,
      Surya

      Comment

      Working...
      X