Dear Statalist users, I have a question with regard to model overfitting.
I ran nbreg for a set of 30 countries, for which I have observations for 14 years, for two different types of events (30x15x2=840 observations). I use 5 independent variables, plus country dummies and year dummies for fixed effects.
I am thinking of using two more dummies with triple interactions . My command looks like this.
nbreg outcome ftb##event##(c.mkt_size_country c.mkt_satur_country c.labor c.skills c.invest) i.country i.year, vce(cluster countryid)
My first question is whether overfitting is a problem at a first glance, given the number of variables involved (especially after so many interactions).
My second question is more fundamental about overfitting. I read that the real problem is that using too many predictors gives you correlations that exist in the sample but not in the general population. In my case, my sample represents more than 90% of the population for the given time period. Would it be safe to ignore overfitting?
Thank you in advance.
I ran nbreg for a set of 30 countries, for which I have observations for 14 years, for two different types of events (30x15x2=840 observations). I use 5 independent variables, plus country dummies and year dummies for fixed effects.
I am thinking of using two more dummies with triple interactions . My command looks like this.
nbreg outcome ftb##event##(c.mkt_size_country c.mkt_satur_country c.labor c.skills c.invest) i.country i.year, vce(cluster countryid)
My first question is whether overfitting is a problem at a first glance, given the number of variables involved (especially after so many interactions).
My second question is more fundamental about overfitting. I read that the real problem is that using too many predictors gives you correlations that exist in the sample but not in the general population. In my case, my sample represents more than 90% of the population for the given time period. Would it be safe to ignore overfitting?
Thank you in advance.
Comment