Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • reghdfe multiple fixed effects help

    Hi, new to stata!

    I am trying to find out, after controlling for industry, country, and year, the effect that internet usage rates have had on exports, and I want to understand how this effect differs according to how technology-intensive the industry is. In particular I want to control for country-year year-industry and industry-country fixed effects.


    I have export data for every country, over 5 years broken down by industry (99 industries) - and for each industry I also have a corresponding industry R&D intensity variable (1-4). I also have data on %internet users by country for each year.

    sample: country_code 4:afghanistan. country_code 8:albania.
    year country_code industry_code intensity exports_usd internet_users
    1998 4 19 2 209823 .15
    1998 4 20 4 23423 .15
    1998 4 21 3 988474 .15
    1998 4 22 2 3344 .15
    1998 4 23 1 134523 .15
    1998 8 19 2 46578435 .22
    1998 8 20 4 555675 .22
    1998 8 21 3 3837 .22
    1998 8 22 2 863522 .22
    1998 8 23 1 43355 .22
    2002 4 19 2 435246 .18
    2002 4 20 4 445554 .18
    Again, trying to control for fixed country, time, and industry effects, and see w

    YTCI =alphaCT + betaIT + gammaCI + (Dintensity * deltausers_CT)

    where:

    alphaCT is the term for country year fixed effects, which I generated using egen c_y = group (country_code year), label
    betaIT is the term for industry year fixed effects, which I generated using egen i_y = group (industry_code year), label
    gammaCI is the term for country industry fixed effects, which I generated using egen c_i = group (country_code industry_code), label
    Dintensity is a dummy for industry intensity
    deltausers_CT is country time investment in IT

    I am trying to run it using reghdfe, and absorbing country-industry year-industry, etc. But is it a problem with too many degrees of freedom? I have generated the c_i2, y_i2 and y_c2 variables using

    egen c_i2 = group(country_code industry_code)

    then

    reghdfe log_exports c.internet_users#1.intensity, absorb(c_i2 y_i2 y_c2)

    the results of the regression are listed below. They don't make sense, as the coefficient values should all be positive. The issue if I am not mistaken is that absorbed c_y fixed effects runs

    Click image for larger version

Name:	Screen Shot 2016-08-13 at 1.37.31 PM.png
Views:	1
Size:	65.4 KB
ID:	1353059


    However, if I run the regression using

    reghdfe log_exports c.internet_users#1.intensity, absorb(year country_code industry_code)

    I get results that do make sense.

    Click image for larger version

Name:	Screen Shot 2016-08-13 at 1.43.38 PM.png
Views:	1
Size:	61.8 KB
ID:	1353060




    Question: can someone either show me a better way of including c-i, i-y and y-c fixed effects? or can someone comment on how much a difference running it with just fixed year, country and industry effects would make to the validity of the results?

    Could it even be acceptable for me to run it with year-industry and then just country fixed effects [absorb(y_i2 country_code)] ?

    Thanks so much in advance!

  • #2
    Maybe you are asking too much to your data; if you regress the first model against only the FEs, you'll probably see that the R2 is close to 0.94. This means that you only have 0.06 of the variation in exports left to be explained by internet usage (plus the error term).

    Also important (and related) is to ask what variation are you using to identify the coefficients. For instance, if you only have country FEs you would exploit within-country variation to identify the FEs. For your case, you are absorbing changes time variation for each country and for each industry and then country-industry pairs, which is a lot.

    The -answer- to your question is all in all a bit tricky. If you absorb too much you end up with lots of variance in your estimates, but if you absorb too little you might have biased/incons estimators. So this feels a lot like the old bias-variance tradeoff (e.g. see here: http://www.cs.uu.nl/docs/vakken/lfd/biasvar.pdf )

    Comment


    • #3
      Sergio Correia thank you so much for your response. I was worried I might be asking too much from my data, and you are correct about the r squared. Is it not then acceptable for me to run it just with fixed country effects, fixed industry effects and fixed times effects instead of the pairing of country-time time-industry country-industry?

      Comment


      • #4
        Originally posted by chloe reis View Post
        Sergio Correia thank you so much for your response. I was worried I might be asking too much from my data, and you are correct about the r squared. Is it not then acceptable for me to run it just with fixed country effects, fixed industry effects and fixed times effects instead of the pairing of country-time time-industry country-industry?
        The best thing to do, imo, is to have a look at the related literature. Usually there's at least one person who has done vaguely related research. Do they use the singular fixed effect, or do they use pairings (country-indsutry etc)? I would personally guess they stick to the singular effect because, as Sergio Correia mentioned, you have very little identifying variation left if you include the pairings. This does leave you suspectible to critique (as every research project is), but it's simply the limit of what can be done with the available data. In my experience, few academics have issues with this. Indeed, many will think that you are controlling for industry, country and time fixed effect, without even considering that's not quite true if there's a large degree of heterogeneity.

        If you are really really worried, consider looking into Pesaran's CCEP/CMG estimators.

        Comment

        Working...
        X