Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • year fixed effects in a difference in differences model

    Hello everyone, this is my first post here and I hope to do everything well to have some help about my issue from you.
    I am writing my MSc dissertation and I have a cross section dataset (ISTAT labour force survey from 2014 to 2020) and I am interested to asses the impact of a government regulation to the income of the Italian pharmacists.
    I am using a difference in differences because I know for each region when it starts to be treated and so I created a variable that is 1 if the region in a specific year is treated and 0 otherwise. Therefore this is my variable of interest, because it tells me if the pharmacist in a specific year and in a specific region is treated or not.
    Moreover, this regulation is not implemented in the same year in each region, in fact I have that in 2016 (the first year of implementation) 5 regions start opening new pharmacies, in the 2017 some of the others followed these regions and so on. In 2020 I have that every region is treated.
    I am using a model in which the dependent variable is the logarithm of the income of the pharmacist i, T is my treatment variable (explained above), then I add year fixed effects and region fixed effects, then I add some covariates (X).
    When I estimated for the first time this model I noticed that the coefficient of the treatment variable was not significant, therefore I tried some alternative regressions (adding one fixed effect each time) to find out that if I omit the first two year and the last one (2014, 2015 and 2020) the coefficient of interest become significant.
    I have read many times that I have to include k-1 dummies of a categorical variable in my regression and so I am not able to find an explanation of these results. What do you suggest to do ?
    It could be due to the fact that in the first two years the treatment dummy is 0 for every pharmacist and in 2020 is 1 for every pharmacist ?
    Thanks in advance for your help.

  • #2
    When I estimated for the first time this model I noticed that the coefficient of the treatment variable was not significant, therefore I tried some alternative regressions (adding one fixed effect each time) to find out that if I omit the first two year and the last one (2014, 2015 and 2020) the coefficient of interest become significant.
    First, that isn't science. It's p-hacking. It does not produce valid results. Some people even consider it scientific misconduct.

    Next, you cannot validly interpret your treatment variable as a generalized DID estimator of the law's effect unless you have the full (i.e. all but one) representation of the time periods and regions in your model.

    You did your analysis. The results were not statistically significant. So that is the result. End of story.

    Now, if you want to explore whether your model was badly specified, it is reasonable to do that. But picking particular years to remove is not a good way to do it. It is reasonable to consider, for example, the possibility that the effects of the regulation are delayed--most regulations take a bit of time to produce results. So you could do additional analysis where the treatment variable does not become 1 until the year after the regulation is adopted. Less likely, but also realistic, is the possibility that the pharmacists anticipated the adoption of the regulation--there is usually advance notice that such regulations are being considered. And so it may be that steps were taken ahead of time to modify the impact of the regulation. This may be reflected in change in the pre-regulation outcomes so that nothing much appears to happen when the regulation kicks in. Looking at a modified treatment variable that becomes 1 the year before the regulation is adopted might capture that. Another possibility is that log-transforming your outcome variable has defeated you. Why do you expect the relationship between pharmacist income and your various explanatory variables to be logarithmic--is there some empirical support for that assumption? Maybe the real relationship is simply linear. Did you explore the data graphically before choosing your model?

    The point is, that any exploratory analyses you do at this point should be motivated by a credible theory of the real world data generating process--not just tampering with the model until you finally squeeze out a p < 0.05. And when you do finally present your results, scientific integrity requires you to report your original analysis with the "not significant" results along with any subsequent explorations.

    Comment


    • #3
      Hi Clyde and thank you very much for your help and your suggestions.
      I understand perfectly what you have said and I really appreciate your comment because this is my first real economic analysis and I don't have enough experience to avoid these kind of mistakes.
      I will continue to read economic papers to understand better what is the theory behind this model that I adopted in my analysis and I will try to implement your suggestions.
      Thank you again.

      Comment

      Working...
      X