Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Factor variable interactions providing a base category that is not specified with the ib notation

    Dear Statalist,
    I would appreciate your help on the following.
    I use factor variable notation in the following way and I specify a base category; however, Stata drops the wrong, in this example, year.
    Code:
    regress temp ib2013.year#i.ukb ib1000.country ib2013.year if year<=2020
    For whatever reason, I do not obtain the right base category.
    When I run it with the full dataset, I get all interactions of year with ukb, but 2020.year#1.ukb is omitted rather than 2013.year#1.ukb.
    I am not sure why this is the case and would greatly appreciate your help.
    Best,
    Nico
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(temp year ukb country)
    0 2015 0    5
    0 2017 0   10
    1 2015 0   14
    0 2016 0   18
    0 2012 0   19
    0 2012 0   22
    0 2012 0   26
    0 2010 0   26
    1 2013 0   27
    0 2020 0  265
    0 2011 0  309
    0 2012 0  386
    0 2014 1 1000
    0 2014 1 1000
    0 2009 1 1000
    0 2019 1 1000
    0 2011 1 1000
    0 2011 1 1000
    0 2010 1 1000
    0 2013 1 1000
    1 2011 1 1000
    0 2015 1 1000
    0 2013 1 1000
    0 2018 1 1000
    0 2019 1 1000
    0 2011 1 1000
    0 2015 1 1000
    0 2018 1 1000
    0 2015 1 1000
    0 2016 1 1000
    0 2020 1 1000
    0 2013 1 1000
    0 2017 1 1000
    0 2011 1 1000
    1 2015 1 1000
    0 2012 1 1000
    0 2014 1 1000
    1 2012 1 1000
    0 2009 1 1000
    0 2019 1 1000
    0 2012 1 1000
    0 2013 1 1000
    0 2013 1 1000
    0 2010 1 1000
    0 2016 1 1000
    0 2020 1 1000
    0 2019 1 1000
    0 2011 1 1000
    0 2010 1 1000
    0 2016 1 1000
    0 2014 1 1000
    0 2013 1 1000
    1 2009 1 1000
    0 2020 1 1000
    0 2017 1 1000
    0 2020 1 1000
    0 2015 1 1000
    0 2015 1 1000
    0 2014 1 1000
    0 2012 1 1000
    0 2017 1 1000
    0 2018 1 1000
    0 2012 1 1000
    0 2011 1 1000
    0 2010 1 1000
    0 2015 1 1000
    0 2010 1 1000
    0 2018 1 1000
    0 2012 1 1000
    0 2010 1 1000
    0 2019 1 1000
    0 2014 1 1000
    0 2017 1 1000
    0 2019 1 1000
    0 2019 1 1000
    0 2015 1 1000
    0 2010 1 1000
    0 2012 1 1000
    0 2010 1 1000
    0 2014 1 1000
    0 2019 1 1000
    0 2011 1 1000
    0 2012 1 1000
    0 2010 1 1000
    0 2016 1 1000
    0 2012 1 1000
    0 2017 1 1000
    0 2011 1 1000
    0 2010 1 1000
    0 2021 1 1000
    0 2012 1 1000
    0 2012 1 1000
    0 2011 1 1000
    0 2017 1 1000
    0 2018 1 1000
    0 2010 1 1000
    1 2010 1 1000
    0 2009 1 1000
    0 2015 .    .
    0 2012 .    .
    end

  • #2
    when I estimate your model on your example data, I get numerous notes about terms being omitted due to collinearity - do you get any notes when you estimate on the full data? here is what I mean:
    Code:
    . regress temp ib2013.year#i.ukb ib1000.country ib2013.year if year<=2020
    note: 2009.year#0b.ukb identifies no observations in the sample.
    note: 2009.year#1.ukb omitted because of collinearity.
    note: 2011.year#1.ukb omitted because of collinearity.
    note: 2012.year#1.ukb omitted because of collinearity.
    note: 2013b.year#1.ukb omitted because of collinearity.
    note: 2014.year#0b.ukb identifies no observations in the sample.
    note: 2014.year#1.ukb omitted because of collinearity.
    note: 2015.year#1.ukb omitted because of collinearity.
    note: 2016.year#1.ukb omitted because of collinearity.
    note: 2017.year#1.ukb omitted because of collinearity.
    note: 2018.year#0b.ukb identifies no observations in the sample.
    note: 2018.year#1.ukb omitted because of collinearity.
    note: 2019.year#0b.ukb identifies no observations in the sample.
    note: 2019.year#1.ukb omitted because of collinearity.
    note: 2020.year#1.ukb omitted because of collinearity.
    note that I have omitted the rest of the output as it is lengthy and appears to be uninformative

    Comment


    • #3

      Here is what I get from my Stata/SE 18. I added option
      allbaselevels to get Stata to show the factor and interaction
      levels it identifies as base levels.
      Code:
      . regress temp ib2013.year#i.ukb ib1000.country ib2013.year if year<=2020, allbaselevels
      note: 2009.year#0b.ukb identifies no observations in the sample.
      note: 2009.year#1.ukb omitted because of collinearity.
      note: 2011.year#1.ukb omitted because of collinearity.
      note: 2012.year#1.ukb omitted because of collinearity.
      note: 2013b.year#1.ukb omitted because of collinearity.
      note: 2014.year#0b.ukb identifies no observations in the sample.
      note: 2014.year#1.ukb omitted because of collinearity.
      note: 2015.year#1.ukb omitted because of collinearity.
      note: 2016.year#1.ukb omitted because of collinearity.
      note: 2017.year#1.ukb omitted because of collinearity.
      note: 2018.year#0b.ukb identifies no observations in the sample.
      note: 2018.year#1.ukb omitted because of collinearity.
      note: 2019.year#0b.ukb identifies no observations in the sample.
      note: 2019.year#1.ukb omitted because of collinearity.
      note: 2020.year#1.ukb omitted because of collinearity.
      
            Source |       SS           df       MS      Number of obs   =        97
      -------------+----------------------------------   F(23, 73)       =      1.56
             Model |  2.14408779        23  .093221208   Prob > F        =    0.0775
          Residual |  4.35075758        73  .059599419   R-squared       =    0.3301
      -------------+----------------------------------   Adj R-squared   =    0.1191
             Total |  6.49484536        96  .067654639   Root MSE        =    .24413
      
      ------------------------------------------------------------------------------
              temp | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
          year#ukb |
           2009 0  |          0  (empty)
           2009 1  |          0  (omitted)
           2010 0  |          0  (base)
           2010 1  |  -.0075758   .3599772    -0.02   0.983    -.7250093    .7098578
           2011 0  |          0  (base)
           2011 1  |          0  (omitted)
           2012 0  |          0  (base)
           2012 1  |          0  (omitted)
           2013 0  |          0  (base)
           2013 1  |          0  (omitted)
           2014 0  |          0  (empty)
           2014 1  |          0  (omitted)
           2015 0  |          0  (base)
           2015 1  |          0  (omitted)
           2016 0  |          0  (base)
           2016 1  |          0  (omitted)
           2017 0  |          0  (base)
           2017 1  |          0  (omitted)
           2018 0  |          0  (empty)
           2018 1  |          0  (omitted)
           2019 0  |          0  (empty)
           2019 1  |          0  (omitted)
           2020 0  |          0  (base)
           2020 1  |          0  (omitted)
                   |
           country |
                5  |      -.125   .2589389    -0.48   0.631    -.6410645    .3910645
               10  |  -1.30e-17   .2636905    -0.00   1.000    -.5255343    .5255343
               14  |       .875   .2589389     3.38   0.001     .3589355    1.391064
               18  |  -2.46e-17   .2729456    -0.00   1.000    -.5439797    .5439797
               19  |  -.0909091   .2549854    -0.36   0.722    -.5990942    .4172761
               22  |  -.0909091   .2549854    -0.36   0.722    -.5990942    .4172761
               26  |  -.0909091   .2549854    -0.36   0.722    -.5990942    .4172761
               27  |          1   .2636905     3.79   0.000     .4744657    1.525534
              265  |  -2.74e-17   .2729456    -0.00   1.000    -.5439797    .5439797
              309  |        -.1   .2560456    -0.39   0.697    -.6102982    .4102982
              386  |  -.0909091   .2549854    -0.36   0.722    -.5990942    .4172761
             1000  |          0  (base)
                   |
              year |
             2009  |        .25   .1575852     1.59   0.117    -.0640668    .5640668
             2010  |   .0909091   .3668109     0.25   0.805    -.6401439    .8219621
             2011  |         .1   .1260681     0.79   0.430    -.1512535    .3512535
             2012  |   .0909091   .1239006     0.73   0.465    -.1560245    .3378427
             2013  |          0  (base)
             2014  |   2.62e-16   .1358214     0.00   1.000    -.2706916    .2706916
             2015  |       .125   .1318452     0.95   0.346    -.1377672    .3877672
             2016  |   2.82e-16   .1575852     0.00   1.000    -.3140668    .3140668
             2017  |   2.70e-16   .1409485     0.00   1.000    -.2809099    .2809099
             2018  |   2.74e-16    .147828     0.00   1.000    -.2946208    .2946208
             2019  |   2.82e-16   .1318452     0.00   1.000    -.2627672    .2627672
             2020  |   2.82e-16   .1575852     0.00   1.000    -.3140668    .3140668
                   |
             _cons |  -2.64e-16   .0996656    -0.00   1.000    -.1986333    .1986333
      ------------------------------------------------------------------------------
      There are 4 empty cells in the interaction between year and ukb.
      Code:
      . tab year ukb
      
                 |          ukb
            year |         0          1 |     Total
      -----------+----------------------+----------
            2009 |         0          4 |         4 
            2010 |         1         12 |        13 
            2011 |         1         10 |        11 
            2012 |         4         11 |        15 
            2013 |         1          6 |         7 
            2014 |         0          7 |         7 
            2015 |         2          8 |        10 
            2016 |         1          4 |         5 
            2017 |         1          6 |         7 
            2018 |         0          5 |         5 
            2019 |         0          8 |         8 
            2020 |         1          4 |         5 
            2021 |         0          1 |         1 
      -----------+----------------------+----------
           Total |        12         86 |        98
      Given that the main effects of ukb (i.e. i.ukb) are not
      present in this model, it is not surprising that 2013.year#1.ukb
      is not flagged as a base.

      Yet, given the pattern of levels in this dataset, this interaction level
      is still omitted because of collinearity (at least in my Stata/SE
      session).

      When a predictor is omitted because of collinearity, it is also not
      surprising that the same data run on a different computer or with
      Stata/MP on the same computer might omit a different predictor.
      Collinear variables are, by their nature, indistinguishable for
      identifying the model; and thus, it is up to the order of operations and
      accumulation of tiny errors in finite precision computers that
      determines which variable gets omitted.

      Comment


      • #4
        Dear Rich and Jeff,
        Thanks for your reply, I very much appreciate your help. The sample I gave was not very informative, my apologies. Below I copied and pasted what I get with the full dataset.
        Now Jeff writes that I need to have i.ukb in the regression for 2013.year#1.ukb to be flagged as base.
        I appreciate that point now, but I wonder if there is a way around this since I need to use the country dummies instead of (i.ukb) in the regression.
        Or shall I just abandon the factor notation in this case and define the dummy variables from scratch.
        I would appreciate your input.
        Thanks so much,
        Nico
        Code:
        year#ukb    
        2009 1    -.0234836    .0096554    -2.43    0.015    -.042408    -.0045593
        2010 1    -.0124545    .0084261    -1.48    0.139    -.0289695    .0040604
        2011 1    -.0077198    .0085642    -0.90    0.367    -.0245053    .0090657
        2012 1    -.0021349    .008822    -0.24    0.809    -.0194257    .0151559
        2013 1    .0018051    .008885    0.20    0.839    -.0156093    .0192195
        2014 1    .0064946    .0090016    0.72    0.471    -.0111484    .0241376
        2015 1    -.0056152    .0086047    -0.65    0.514    -.0224801    .0112498
        2016 1    .0036982    .008692    0.43    0.670    -.013338    .0207344
        2017 1    .0009898    .0089316    0.11    0.912    -.016516    .0184955
        2018 1    -.0056581    .0092139    -0.61    0.539    -.0237171    .0124009
        2019 1    .0031394    .0093968    0.33    0.738    -.0152781    .0215568
        2020 1    0    (omitted)
            
        year    
        2009    .0278519    .0080459    3.46    0.001    .0120823    .0436216
        2010    .0181434    .0068035    2.67    0.008    .0048086    .0314781
        2011    .0115418    .0069579    1.66    0.097    -.0020954    .025179
        2012    .0027301    .0072467    0.38    0.706    -.0114732    .0169335
        2013    0    (base)
        2014    -.0009319    .007442    -0.13    0.900    -.015518    .0136542
        2015    .0092916    .0069567    1.34    0.182    -.0043434    .0229265
        2016    .0114922    .0070546    1.63    0.103    -.0023346    .0253191
        2017    .0179823    .007314    2.46    0.014    .0036471    .0323175
        2018    .0183142    .0076382    2.40    0.016    .0033436    .0332848
        2019    .0159713    .0078403    2.04    0.042    .0006045    .031338
        2020    .0193868    .008296    2.34    0.019    .0031269    .0356466

        Comment

        Working...
        X