I would appreciate some advice on the modeling of my panel fixed effects with two cross-sectional dimensions.
Data structure dimensions:
Regions (1...3000)
Companies (1...300)
Time (1...1000)
Obs : ~ 1 000 000
Regions are not nested within companies, i.e., companies are observed in more than one region, but never in all regions. The panels are unbalanced.
I need FEs for regions and for companies. FEs for any combination of regions and companies are not necessary, but it is also fine to have them by the underlying economic theory.
I tried the following models:
1a)
1b)
I also get the same point estimates by
2)
2) reports different point estimates than 1a) and 1b)
My questions
a) Is 2) to be preferred as regions are not nested within companies resulting in correlation between fixed effects in 1a) and 1b) or does such correlation not bias my coefficient point estimates?
b) Model 2) results in repeated time values within panels (which xt... commands do not allow ). Is this to be avoided in general?
I would like to use the -reg- command if necessary only as it requires a few hours to run it on my dataset.
Any help is highly appreciated.
Thanks.
Christian
Data structure dimensions:
Regions (1...3000)
Companies (1...300)
Time (1...1000)
Obs : ~ 1 000 000
Regions are not nested within companies, i.e., companies are observed in more than one region, but never in all regions. The panels are unbalanced.
I need FEs for regions and for companies. FEs for any combination of regions and companies are not necessary, but it is also fine to have them by the underlying economic theory.
I tried the following models:
1a)
Code:
egen idreg_comp = group(regions companies) xtset idreg_comp time xtreg y x1 x2 x3 x4, fe
I also get the same point estimates by
Code:
reg y x1 x2 x3 x4 i.regions#i.companies
Code:
reg y x1 x2 x3 x4 i.regions i.companies
My questions
a) Is 2) to be preferred as regions are not nested within companies resulting in correlation between fixed effects in 1a) and 1b) or does such correlation not bias my coefficient point estimates?
b) Model 2) results in repeated time values within panels (which xt... commands do not allow ). Is this to be avoided in general?
I would like to use the -reg- command if necessary only as it requires a few hours to run it on my dataset.
Any help is highly appreciated.
Thanks.
Christian
Comment