Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is clustering with only 2 clusters legitimate in Difference in Difference regression?

    Dear Statalist,

    I have 2 questions.

    (1) I am trying to run a difference-in-difference regression, where the dependent variable is wealth, and the treatment is the death of both parents during my treatment time period. Due to the small sample size (735 people x 2 years, out of which 53 people are in the treatment group), my sample consists of people from 40 to 64 years of age, and my "before" (2006) and "after" (2014) time periods are 8 years apart.

    I have 1 more time period (2002) to check for parallel trends for control and treatment groups before the treatment, and the group means of wealth levels indeed look parallel before and diverging after the treatment (sorry, cannot insert my very convincing graph from Word here). The group means of such variables as age, marital status, gender, immigrants look similar for the control and treatment groups. So, the setup for difference-in-difference estimation seems quite promising.

    In my difference-in-difference estimation I use factor variable notations and try different specifications: ols with just 3 standard difference-in-difference regressors (i.after##i.treatment), fixed effect panel data regression, I also try to include additional controls in my ols or fixed effects (age, marital status, # siblings, race, immigrant dummy, etc.).

    In all the cases my coefficient of interest, the interaction term i.after#i.treatment, is totally insignificant (the P-value is around 0.5, so no hope really) UNLESS I cluster my standard errors by the treatment dummy (which is 1 for treatment group and 0 for control). With clustering I receive small standard errors and, hence, a significant coefficient of interest. I thought this was legitimate after reading an opinion that clustering should reflect the way we sample. Since this is exactly how I sampled my observations, based on the fact whether they have been treated (both parents died) or not (the last parent is still alive), I thought using clustering in my case was valid.

    Recently, however, I started reading Mostly Harmless Econometrics, and in this book they say that clustering is only valid if the number of clusters is large enough, and in my case I have only 2 clusters.
    On the other hand, I read an opinion here, on Statalist, that interaction terms have less "power" than other regressors in terms of statistical significance.


    I really appreciate your thoughts on this problem.


    (2) I also have another smaller question: Given my small sample size (a panel of 2 years and 735 people, with 53 people in the treatment group and 682 in control, ), can I legitimately do quantile regression difference-in-difference estimation, in other words, run the same regression as above in question (1) for the median and deciles? Or does my small sample size warrant against it?


    Many thanks


  • #2
    I think you have misunderstood what clustering is used for. Clustering is used to account for non-independence of observations in the sampling process. But there is no reason to think that the error terms for the controls and for the treatment groups are not independent from each other: they are completely different people. Perhaps if the treatment group were sampled from certain communities and the controls all came from different communities, you could argue for dependence based on the community sampling units. But then you would cluster on the community, not the treatment group.

    In any case, even if there were reason to consider treatment vs control to represent non-independent sampling, it is also true that the cluster robust VCE is only valid for large numbers of clusters. While you can find different opinions about just how large the number of clusters has to be, I don't know anybody who would say that it is valid to use with only 2 clusters.

    So for both reasons, you cannot use vce(cluster treatment).

    It is true that the statistical power to detect interactions is lower than the statistical power to detect "main" effects. The implication of that is that when planning your study, assuming you have control over the sample sizes, you want to sample enough people that you have adequate power to detect the minimum substantively important interaction coefficient. It doesn't alter the way you choose to analyze the data, however.

    Concerning your second question, the sample size that works for ordinary (mean) regression will usually be sufficient for median regression as well. For decile regression, it's a different matter as the location of the first or 9th decile is determined by only 10% of the observations in the data, so your effective sample sizes are 73 and 5! So if you try decile regression I think the results will show very little precision.

    All of that said, the overall impression of your post is that you are groping after different analytic approaches in order to stumble onto a statistically significant finding. That's not science. That's p-hacking. In some circles it's even considered scientific misconduct if done with knowledge that it's statistically invalid, and negligence if you don't know that. The analysis, in principle, should be decided upon before you even look at the data based on the sampling design and the meaning of the variables and how they work in the underlying scientific theory. Sometimes it is necessary to change the analytic plan once the data are in hand because the original analysis fails to converge, or the data are found to severely violate important assumptions underlying the planned analysis. But it is not OK to look for a different analysis because you don't like the result you got from the original planned one. FWIW, based on your description of your study, it sounds to me like the most appropriate analysis would be the panel fixed-effects regression, perhaps including the time-varying covariates.

    You need to accept that your hypothesis was not supported by the data. Instead of trying to torture the data into confessing what you want to here, you should start working on understanding why you got the results you got. Was your hypothesis perhaps not well-grounded scientifically in the first place? Is your sample size too small to detect a small but nevertheless meaningful effect? Is your outcome measure too noisy? Is there error in your ascertainment of the predictor variables? Did you measure the outcomes at the wrong times pre- and post? Or not frequently enough? Perhaps, since your variable is economic in nature, the effect you were looking for was overwhelmed by bigger effects in the normal economic cycles or the Great Recession and its aftermath. Focus your efforts on answering these questions.

    Comment

    Working...
    X