reghdfe multiple fixed effects help

chloe reis

Join Date: Aug 2016
Posts: 14

reghdfe multiple fixed effects help

13 Aug 2016, 07:16

Hi, new to stata!

I am trying to find out, after controlling for industry, country, and year, the effect that internet usage rates have had on exports, and I want to understand how this effect differs according to how technology-intensive the industry is. In particular I want to control for country-year year-industry and industry-country fixed effects.

I have export data for every country, over 5 years broken down by industry (99 industries) - and for each industry I also have a corresponding industry R&D intensity variable (1-4). I also have data on %internet users by country for each year.

sample: country_code 4:afghanistan. country_code 8:albania.

year	country_code	industry_code	intensity	exports_usd	internet_users
1998	4	19	2	209823	.15
1998	4	20	4	23423	.15
1998	4	21	3	988474	.15
1998	4	22	2	3344	.15
1998	4	23	1	134523	.15
1998	8	19	2	46578435	.22
1998	8	20	4	555675	.22
1998	8	21	3	3837	.22
1998	8	22	2	863522	.22
1998	8	23	1	43355	.22
2002	4	19	2	435246	.18
2002	4	20	4	445554	.18

Again, trying to control for fixed country, time, and industry effects, and see w

Y_TCI =alpha_CT + beta_IT+ gamma_CI + (D_intensity * delta_{users_CT)

where:}
alpha_{CT is the term for country year fixed effects, which I generated using egen c_y = group (country_code year), label}
beta_{IT is the term for industry year fixed effects, which I generated using egen i_y = group (industry_code year), label}
gamma_CI_{is the term for country industry fixed effects, which I generated using egen c_i = group (country_code industry_code), label}
D_{intensity is a dummy for industry intensity}
delta_{users_CT is country time investment in IT}

I am trying to run it using reghdfe, and absorbing country-industry year-industry, etc. But is it a problem with too many degrees of freedom? I have generated the c_i2, y_i2 and y_c2 variables using

egen c_i2 = group(country_code industry_code)

then

reghdfe log_exports c.internet_users#1.intensity, absorb(c_i2 y_i2 y_c2)

the results of the regression are listed below. They don't make sense, as the coefficient values should all be positive. The issue if I am not mistaken is that absorbed c_y fixed effects runs

Click image for larger version

Name: Screen Shot 2016-08-13 at 1.37.31 PM.png
Views: 1
Size: 65.4 KB
ID: 1353059

However, if I run the regression using

reghdfe log_exports c.internet_users#1.intensity, absorb(year country_code industry_code)

I get results that do make sense.

Click image for larger version

Name: Screen Shot 2016-08-13 at 1.43.38 PM.png
Views: 1
Size: 61.8 KB
ID: 1353060

Question: can someone either show me a better way of including c-i, i-y and y-c fixed effects? or can someone comment on how much a difference running it with just fixed year, country and industry effects would make to the validity of the results?

Could it even be acceptable for me to run it with year-industry and then just country fixed effects [absorb(y_i2 country_code)] ?

Thanks so much in advance!

Tags: dummy, fixed effects, panel data, reghdfe, regression

Sergio Correia

Join Date: Apr 2014

Posts: 420
#2

15 Aug 2016, 11:49

Maybe you are asking too much to your data; if you regress the first model against only the FEs, you'll probably see that the R2 is close to 0.94. This means that you only have 0.06 of the variation in exports left to be explained by internet usage (plus the error term).

Also important (and related) is to ask what variation are you using to identify the coefficients. For instance, if you only have country FEs you would exploit within-country variation to identify the FEs. For your case, you are absorbing changes time variation for each country and for each industry and then country-industry pairs, which is a lot.

The -answer- to your question is all in all a bit tricky. If you absorb too much you end up with lots of variance in your estimates, but if you absorb too little you might have biased/incons estimators. So this feels a lot like the old bias-variance tradeoff (e.g. see here: http://www.cs.uu.nl/docs/vakken/lfd/biasvar.pdf )
Comment
chloe reis

Join Date: Aug 2016

Posts: 14
#3

17 Aug 2016, 05:26

Sergio Correia thank you so much for your response. I was worried I might be asking too much from my data, and you are correct about the r squared. Is it not then acceptable for me to run it just with fixed country effects, fixed industry effects and fixed times effects instead of the pairing of country-time time-industry country-industry?
Comment
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#4

17 Aug 2016, 08:42

Originally posted by chloe reis View Post

Sergio Correia thank you so much for your response. I was worried I might be asking too much from my data, and you are correct about the r squared. Is it not then acceptable for me to run it just with fixed country effects, fixed industry effects and fixed times effects instead of the pairing of country-time time-industry country-industry?

The best thing to do, imo, is to have a look at the related literature. Usually there's at least one person who has done vaguely related research. Do they use the singular fixed effect, or do they use pairings (country-indsutry etc)? I would personally guess they stick to the singular effect because, as Sergio Correia mentioned, you have very little identifying variation left if you include the pairings. This does leave you suspectible to critique (as every research project is), but it's simply the limit of what can be done with the available data. In my experience, few academics have issues with this. Indeed, many will think that you are controlling for industry, country and time fixed effect, without even considering that's not quite true if there's a large degree of heterogeneity.

If you are really really worried, consider looking into Pesaran's CCEP/CMG estimators.
Comment

Announcement

reghdfe multiple fixed effects help

Comment

Comment

Comment