Dear Statalists,
Hi, I've just joined the forum so apologies in advance if I accidentally break any of the rules.
I have a general question about including time-fixed effects in a logistic regression analysis. A problem with my data set is that the sample size is small while time range is long: Specifically, I have around 600 observations for each model, of which 100-200 observations have y=1 value, while the observation period ranges from 1985 to 2016.
I intended to include annual time dummies at first, but I found out that including +30 time dummies in my model would result in an over-fitting.
The code I used was :
Code:
logit y x1 x2 x3 . . . x9 i.year
Next thing I tried was creating a 5-year interval dummies, that is,
Code:
gen yr5_dummy=1 forval i=2/7 { replace yr5_dummy=`i' if year>=1985+5*(`i'-1) & year<=1989+5*(`i'-1) & !missing(year) }
and then do the regression with a clustered SE option :
Code:
logit y x1 x2 x3 . . . x9 i.yr5_dummy, vce(cluster yr5_dummy)
In sum, due to an overfitting problem, I created 5-year interval time dummies instead of annual time dummies, and then specified clustered SE option.
Would there be any issues if I handle the problem this way?
Any hints or references to the literature would be appreciated. Thank you.
Comment