Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collinearity issue in regression with interaction terms, STATA drops variables, how to specify base category?

    Dear STATA list,

    happy Monday.

    I am interested in estimating the effect of language proficiency (measured by "very good german command") for immigrants in Germany. To this end, I run linear regression models of wages on demographics, including immigrant status, a dummy that captures "very good German proficiency" and other demographic characteristics. To enrich the specification, I also include interaction terms, subsequently interacting immigrant status x education, immigrant status x very_good_german, education x very_good_german and finally, also including a triple interaction effect.

    My code is as follows:

    Code:
    local Y ln_wages_gro
    
        global x1 immigrant##i.educ_level sex age age_sq married no_children i.educ_level years_work_exp i.occup_combined
        global x2 immigrant##i.educ_level immigrant##very_good_ger_command sex age age_sq married no_children i.educ_level years_work_exp i.occup_combined
        global x3 immigrant##i.educ_level immigrant##very_good_ger_command educ_level#very_good_ger_command sex age age_sq married no_children i.educ_level years_work_exp i.occup_combined
        global x4 immigrant##i.educ_level##very_good_ger_command sex age age_sq married no_children i.educ_level years_work_exp i.occup_combined // triple interaction effect
        
        local X x1 x2 x3 x4
    
    
        *Loop regressions using reghdfe
        foreach y of local Y {
            foreach x of local X {
                eststo reg_`y'_`x': reghdfe `y' ${`x'} , ///
                absorb(cluster_var syear) vce(cluster cluster_var)
            }
        }
    The problem I am facing is the following. In my data, I have natives and immigrants. Natives do not report their language proficiency, hence, I impute the value "very_good_ger_command == 1" for natives. That is, I assume they have a very good german proficiency. When running the regression loop as above, STATA drops the interaction effects because of collinearity.

    For example, when running the fourth specification (reg log wages on x4), I get:
    Code:
    note: 0b.immigrant#0b.very_good_ger_command omitted because of collinearity
    note: 0b.immigrant#1b.educ_level#0b.very_good_ger_command omitted because of collinearity
    note: 1o.immigrant#3o.educ_level#0b.very_good_ger_command omitted because of collinearity
    note: 889.occup_combined omitted because of collinearity
    
    HDFE Linear regression                            Number of obs   =    367,217
    Absorbing 2 HDFE groups                           F(  30,     83) =    2930.43
    Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                      R-squared       =     0.4914
                                                      Adj R-squared   =     0.4912
                                                      Within R-sq.    =     0.4422
    Number of clusters (cluster_var) =         84     Root MSE        =     0.6110
    
                                                             (Std. err. adjusted for 84 clusters in cluster_var)
    ------------------------------------------------------------------------------------------------------------
                                               |               Robust
                                  ln_wages_gro | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------------------------------------+----------------------------------------------------------------
                                   1.immigrant |   .0204051    .025504     0.80   0.426    -.0303214    .0711315
                                               |
                                    educ_level |
                                            2  |   .1357691   .0266024     5.10   0.000      .082858    .1886801
                                            3  |   .3297254   .0292414    11.28   0.000     .2715655    .3878854
                                               |
                          immigrant#educ_level |
                                          1 2  |  -.0371757   .0263592    -1.41   0.162     -.089603    .0152516
                                          1 3  |  -.1328072   .0353139    -3.76   0.000    -.2030451   -.0625693
                                               |
                       1.very_good_ger_command |  -.1049976   .0209216    -5.02   0.000    -.1466099   -.0633854
                                               |
               immigrant#very_good_ger_command |
                                          0 0  |          0  (empty)
                                          1 1  |          0  (omitted)
                                               |
              educ_level#very_good_ger_command |
                                          2 1  |   .1636651   .0269149     6.08   0.000     .1101325    .2171977
                                          3 1  |   .2293729   .0270932     8.47   0.000     .1754856    .2832602
                                               |
    immigrant#educ_level#very_good_ger_command |
                                        0 1 0  |          0  (empty)
                                        0 2 0  |          0  (empty)
                                        0 3 0  |          0  (empty)
                                        1 2 1  |          0  (omitted)
                                        1 3 1  |          0  (omitted)
                                               |
                                           sex |   .3147661   .0136618    23.04   0.000     .2875934    .3419388
                                           age |   .0699323    .003348    20.89   0.000     .0632732    .0765913
                                        age_sq |  -.0974664   .0034906   -27.92   0.000     -.104409   -.0905238
                                       married |  -.0496023   .0130072    -3.81   0.000    -.0754732   -.0237315
                                   no_children |  -.0253526   .0028548    -8.88   0.000    -.0310306   -.0196746
                                years_work_exp |    .029932   .0006777    44.17   0.000     .0285841    .0312799
                                               |
                                occup_combined |
                                           82  |  -.1641463   .0176071    -9.32   0.000    -.1991662   -.1291265
                                           83  |  -.4081211   .0177827   -22.95   0.000    -.4434902    -.372752
                                           84  |  -.5554699   .0182121   -30.50   0.000     -.591693   -.5192467
                                           85  |  -.8531703   .0193025   -44.20   0.000    -.8915621   -.8147785
                                           86  |  -.7304313   .0402182   -18.16   0.000    -.8104237   -.6504389
                                           87  |   -.725796   .0225927   -32.13   0.000    -.7707321     -.68086
                                           88  |   -.745428   .0276917   -26.92   0.000    -.8005057   -.6903503
                                           89  |  -1.116623   .0284847   -39.20   0.000    -1.173278   -1.059968
                                          881  |   .7054542   .0220244    32.03   0.000     .6616486    .7492599
                                          882  |   .7558083   .0269996    27.99   0.000     .7021071    .8095094
                                          883  |   .5254447   .0224478    23.41   0.000      .480797    .5700924
                                          884  |   .4204903   .0270373    15.55   0.000     .3667142    .4742664
                                          885  |   .1750119   .0280122     6.25   0.000     .1192968     .230727
                                          886  |   .1494267   .0320236     4.67   0.000      .085733    .2131205
                                          887  |   .2900686   .0192637    15.06   0.000     .2517539    .3283832
                                          888  |   .3160587   .0176544    17.90   0.000     .2809449    .3511726
                                          889  |          0  (omitted)
                                               |
                                         _cons |   5.479861    .083596    65.55   0.000     5.313592     5.64613
    I believe this problem arises because of my imputation for all natives, the category "immigrant = 0, very_good_ger_command = 0" does not exist.

    I would like to set the base category as natives with good german command, that is, immig = 0, very_good_ger_command = 1. However, doing this using the method ib#. does not work for me.

    Is my problem a syntax issue (of specifying base levels)?
    Or is there something fundamentally wrong with my specification, and do I perhaps need to drop the variable "very_good_ger_command" as a stand-alone so my regression can be identified?

    I am grateful for any input or tips. Thank you!

Working...
X