Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • collinear dropped multiple fixed effects and dummy (reghdfe)

    I am trying to find out, after controlling for industry, country, and year, the effect that internet usage rates have had on exports, and I want to understand how this effect differs according to how technology-intensive the industry is. In particular I want to control for country-year year-industry and industry-country fixed effects.


    I have export data for every country, over 5 years broken down by industry (99 industries) - and for each industry I also have a corresponding industry R&D intensity variable (1-4). I also have data on %internet users by country for each year.
    year country_code industry_code intensity exports_usd internet_users
    1998 4 19 2 209823 .15
    1998 4 20 4 23423 .15
    1998 4 21 3 988474 .15
    1998 4 22 2 3344 .15
    1998 4 23 1 134523 .15
    1998 8 19 2 46578435 .22
    1998 8 20 4 555675 .22
    1998 8 21 3 3837 .22
    1998 8 22 2 863522 .22
    1998 8 23 1 43355 .22
    2002 4 19 2 435246 .18
    2002 4 20 4 445554 .18
    Again, trying to control for fixed country, time, and industry effects, and see w

    Ycti= (Intensity_i * IT_ct) + FEct + FEit + FEci
    reghdfe log_exports i.intensity#c.internet_users, absorb(y_c c_i y_i)

    where intensity_i is a dummy of R&D intensity (from 1-4) for each industry I have
    where IT_ct is internet usage data for each country and time input I have
    and FEs are the fixed effects (fixed country-time, industry-time, country-industry)

    The above experiment asked to much of the data, so my professor said I could control for just country, year and time fixed effects, so long as I included IT_ct again in the regression
    Ycti= (Intensity_i * IT_ct) + IT_ct + FEc + FEt+ FEi
    reghdfe log_exports i.intensity#c.internet_users internet_users, absorb(year country_code industry_code)



    So my questions are
    a) can anyone explain why this makes sense to add back in internet_users again to the regression? To me it doesn't make sense why i have to re-add it in...


    b) I am using reghdfe, but with one of the samples I run (only using data on developed countries), I get that dummy for intensity 1 was omitted because of collinearity, and for another regression (only using data on developed countries) intensity dummy 2 was omitted. Is it because these omitted variables are co-linear with the internet_users variable? or is this just stata using one of them arbitrarily as a dropped dummy against which the other dummies are compared?

    bellow is an image of my regression results.

    Click image for larger version

Name:	Screen Shot 2016-08-19 at 12.14.32 PM.png
Views:	1
Size:	54.6 KB
ID:	1353689





    Thank you in advance!

  • #2
    a) The general rule for models with interaction terms is that the "main effects" of the interaction must also be included n the model in order to have a proper model specification. The exceptions arise when one of those main effects is colinear with a fixed effect that is being absorbed. In that case, that main effect can be omitted. In your data, internet_users is constant within country-year combinations, and in the first model you were absorbing country-year, so internet_users could not be included. (And if you tried to include it, Stata would have omitted it automatically anyway.) In your new model, you are no longer absorbing country-year, and internet_users does vary within country, within year, and within industry. So internet_users is no longer colinear with any of the fixed effects, so it must stay in the model.

    In fact, while it is important to understand what is going on here, it is actually somewhat silly to make these decisions explicitly. It is simplest to simply model interactions using the ## operator of Stata's factor variable notation. Then Stata will automatically include both the interaction term and the "main effects," and will then omit the main effects exactly when appropriate, that is, exactly when one of them is colinear with one of the absorbed fixed effects. Stata never get this wrong if you do it that way.

    b) Stata will always omit one level of a categorical variable as the reference category. Because that is "normal" behavior, Stata does not provide any warning messages about doing so. If you are getting a message saying that something is omitted because of colinearity, then that is above and beyond the normal omission of reference categories. It is not uncommon for colinearity relationships to arise in a subset of the data when they don't exist in the data set as a whole. But you should verify that you understand what is colinear with what and assure yourself that the occurrence of that particular colinearity in that subset makes sense. If not, it could indicate that there is an error in your data set.

    By the way, going forward please use -dataex- to post example data, and show Stata output in code blocks, as is requested of all users in FAQ #12. It makes things easier for those who want to help you to read and work with your information.

    Comment

    Working...
    X