Hello,
I'm having some trouble deciding on which level to cluster my regressions. I have a repeated cross section of individuals and a dataset that includes only individuals who are observed in the year that they have a newborn child, therefore there may be repeat observations if they have more than 1 child in the study period of 1990-2011. Approximately 17% have more than 1 child in the dataset with 1 individual having 7 children in that study period.
The method that I am using is differences-in-differences with staggered policy introduction (25 states have introduced a policy; and 26 states have no policy) and I have state-fixed effects as i.state and year-fixed effects as i.year including my policy variable. My outcome variable is a non-negative overdispersed count variable therefore I'm using a negative binomial model.
I am wondering if I should be clustering on the state level or the individual level to control for serial correlation? and if I do cluster on the state level, can I include state fixed effects as well?
The problem is that I have low observations, therefore we do not have many clusters for this dataset, for example we only have 1 individual in Alabama in the year 1993 and so my Wald statistic is missing when I run the regression: nbreg y policy x i.state i.year (vce cluster ID_person) and also when I run the regression: nbreg y policy x i.state i.year (vce cluster state).
If anyone could offer advice as to what I should do in this case? Which level should I cluster on and how does it affect my interpretation if the Wald statistic is missing?
Is there another test I could do post-estimation to determine the joint significance of the variables after nbreg if the Wald statistic is missing?
Or is 17% repeated observations not enough to warrant clustering?
Thank you in advance for any help! I'm completely lost here as what is appropriate in this case.
Surya
I'm having some trouble deciding on which level to cluster my regressions. I have a repeated cross section of individuals and a dataset that includes only individuals who are observed in the year that they have a newborn child, therefore there may be repeat observations if they have more than 1 child in the study period of 1990-2011. Approximately 17% have more than 1 child in the dataset with 1 individual having 7 children in that study period.
The method that I am using is differences-in-differences with staggered policy introduction (25 states have introduced a policy; and 26 states have no policy) and I have state-fixed effects as i.state and year-fixed effects as i.year including my policy variable. My outcome variable is a non-negative overdispersed count variable therefore I'm using a negative binomial model.
I am wondering if I should be clustering on the state level or the individual level to control for serial correlation? and if I do cluster on the state level, can I include state fixed effects as well?
The problem is that I have low observations, therefore we do not have many clusters for this dataset, for example we only have 1 individual in Alabama in the year 1993 and so my Wald statistic is missing when I run the regression: nbreg y policy x i.state i.year (vce cluster ID_person) and also when I run the regression: nbreg y policy x i.state i.year (vce cluster state).
If anyone could offer advice as to what I should do in this case? Which level should I cluster on and how does it affect my interpretation if the Wald statistic is missing?
Is there another test I could do post-estimation to determine the joint significance of the variables after nbreg if the Wald statistic is missing?
Or is 17% repeated observations not enough to warrant clustering?
Thank you in advance for any help! I'm completely lost here as what is appropriate in this case.
Surya
Comment