Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Getting the different variables in VIF analyse compared with variables used for regression analyse

    Hi Everyone,

    After my regression results I want to test for multicollinearity with the function VIF. However, VIF does not show the same variables as used in my regression analysis. The variables under six2fixed in the VIF analyse does not even exist. Does someone know what is going on?

    I start with the following regression
    Code:
    regress abs_dca earlyea size loss growth inventory sd_sales sd_cfo bm fees i.fyear i.sic2fixed, robust
    The output looks like this:
    Code:
    Linear regression                               Number of obs     =      1,127
                                                    F(36, 1090)       =      42.90
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.4906
                                                    Root MSE          =     .16089
    
    ------------------------------------------------------------------------------
                 |               Robust
         abs_dca |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
         earlyea |   .0139611   .0126574     1.10   0.270    -.0108746    .0387968
            size |    .003257   .0098638     0.33   0.741    -.0160972    .0226112
            loss |   -.009179   .0191993    -0.48   0.633    -.0468508    .0284927
          growth |  -.0008725   .0004332    -2.01   0.044    -.0017224   -.0000226
       inventory |   .1039879   .1085149     0.96   0.338    -.1089338    .3169097
        sd_sales |   4.60e-06   1.77e-06     2.59   0.010     1.12e-06    8.07e-06
          sd_cfo |  -.0000155   8.05e-06    -1.92   0.055    -.0000313    3.39e-07
              bm |   .0306091   .0246821     1.24   0.215    -.0178208     .079039
            fees |  -.0080096   .0120266    -0.67   0.506    -.0316076    .0155884
                 |
           fyear |
           2016  |  -.0323195   .0138803    -2.33   0.020    -.0595546   -.0050844
           2017  |   .0107309   .0148073     0.72   0.469    -.0183233     .039785
           2018  |  -.0263443   .0151248    -1.74   0.082    -.0560214    .0033327
           2019  |  -.0986281   .0472746    -2.09   0.037    -.1913876   -.0058687
                 |
       sic2fixed |
             20  |    .007519   .0289802     0.26   0.795    -.0493444    .0643823
             23  |  -.1068286   .0290454    -3.68   0.000    -.1638199   -.0498374
             26  |  -.1332961   .0282909    -4.71   0.000    -.1888069   -.0777853
             28  |   .4013773   .0316068    12.70   0.000     .3393602    .4633944
             29  |  -.2151065   .0419823    -5.12   0.000    -.2974818   -.1327311
             30  |  -.1281859   .0324364    -3.95   0.000    -.1918307   -.0645411
             34  |  -.1044339   .0339218    -3.08   0.002    -.1709933   -.0378745
             35  |   .0051912   .0283465     0.18   0.855    -.0504288    .0608112
             36  |   .0531267   .0342207     1.55   0.121    -.0140192    .1202726
             37  |  -.1002941   .0274196    -3.66   0.000    -.1540953   -.0464929
             38  |   .0828333   .0294362     2.81   0.005     .0250752    .1405914
             39  |  -.0913435   .0315329    -2.90   0.004    -.1532156   -.0294713
             45  |   -.125567   .0239606    -5.24   0.000    -.1725812   -.0785529
             48  |   .0575326   .0362255     1.59   0.113    -.0135469    .1286122
             49  |  -.1038844   .0238229    -4.36   0.000    -.1506284   -.0571404
             50  |   .1548413   .0853878     1.81   0.070    -.0127017    .3223843
             51  |   .1588181   .0921888     1.72   0.085    -.0220694    .3397056
             55  |  -.1566967   .0429625    -3.65   0.000    -.2409953   -.0723982
             56  |  -.1292806   .0329217    -3.93   0.000    -.1938778   -.0646834
             58  |  -.0861951   .0426981    -2.02   0.044    -.1699748   -.0024153
             59  |  -.0584408   .0446967    -1.31   0.191    -.1461421    .0292605
             73  |   .1492332   .0274652     5.43   0.000     .0953425     .203124
             80  |   .1703843   .0710612     2.40   0.017      .030952    .3098166
                 |
           _cons |   .2450955   .1252664     1.96   0.051    -.0006952    .4908861
    ------------------------------------------------------------------------------
    Now I want to test for multicollinearity, but I get other sic2fixed number than shown in my regression.

    Code:
    . vif
    
        Variable |       VIF       1/VIF
    -------------+----------------------
         earlyea |      1.40    0.714037
            size |      5.83    0.171586
            loss |      1.53    0.652766
          growth |      1.29    0.776751
       inventory |      3.46    0.288736
        sd_sales |      3.56    0.280609
          sd_cfo |      2.71    0.368631
              bm |      1.64    0.610074
            fees |      3.99    0.250430
           fyear |
           2016  |      1.50    0.667212
           2017  |      1.58    0.633030
           2018  |      1.61    0.619921
           2019  |      1.10    0.908733
       sic2fixed |
              5  |      2.65    0.377903
              6  |      1.67    0.598078
              8  |      1.56    0.639988
             10  |      3.50    0.285662
             11  |      2.04    0.491027
             12  |      1.37    0.728192
             14  |      1.44    0.693666
             15  |      2.68    0.373003
             16  |      2.46    0.406148
             17  |      2.14    0.466498
             18  |      3.04    0.328443
             19  |      1.28    0.780565
             21  |      1.44    0.693522
             22  |      1.71    0.583440
             23  |      3.46    0.289082
             24  |      1.92    0.522031
             25  |      2.00    0.501147
             26  |      1.99    0.503408
             27  |      2.20    0.454065
             28  |      1.37    0.731688
             29  |      2.00    0.499786
             30  |      3.80    0.263003
             33  |      1.44    0.692617
    -------------+----------------------
        Mean VIF |      2.23
    Why are there other sic2fixed numbers displayed? Thanks in advance
    Last edited by Laura Witlox; 26 May 2020, 13:29.

  • #2
    Welcome to Statalist. Thank you for the clear presentation of your problem and well-formatted sample output, sorrowfully infrequent occurrences in first posts.

    I had some guesses as to what is going wrong, but the guesses didn't hold up when I tested them. Your question has been up for a while and nobody else has had any success, either, apparently. So maybe we need a little more information from you.

    First, did Stata produce any output between the regress command and the beginning of the results table? If so it would be useful to see that.

    Secondly, can you post the results of
    Code:
    describe sic2fixed
    codebook sic2fixed
    to give us a little more insight into your variable?

    Finally, what version of Stata are you using?

    Let me post one of my dead ends to save others some time. The output of help vif tells us that the vif command, though it continues to work, has been out-of-date since Stata 9, and tells us that estat vif is the replacement, and to see the output of help regress postestimation for more details. With that said, on my toy example I found that estat vif gave results identical to those from vif in all my tests, which led me to look behind the scenes and find that the estat command for regress postestimation just calls the ("out-of-date") vif command.

    Comment


    • #3
      Hi William,

      Thanks for your response!

      To your first question: Stata did not produce any output between the regress command and the beginning of the result tables. Everything works fine.
      To your second question:
      Code:
      . describe sic2fixed
      
                    storage   display    value
      variable name   type    format     label      variable label
      --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      sic2fixed       long    %8.0g      sic2fixed
      Code:
      . codebook sic2fixed
      
      --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      sic2fixed                                                                                                                                                                              (unlabeled)
      --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      
                        type:  numeric (long)
                       label:  sic2fixed
      
                       range:  [2,33]                       units:  1
               unique values:  24                       missing .:  0/1,127
      
                    examples:  10    28
                               16    36
                               22    48
                               27    56
      Furthermore, I tried the following code. What gives me levels of sic2fixed which do not exist when I browse my sample.
      Code:
      . levelsof sic2fixed
      2 5 6 8 10 11 12 14 15 16 17 18 19 21 22 23 24 25 26 27 28 29 30 33

      Comment


      • #4
        Laura:
        what happens if you type:
        Code:
        regress abs_dca earlyea size loss growth inventory sd_sales sd_cfo bm fees i.fyear i.sic2fixed, robust
        estat vif
        My guess is that the old-fashioned -vif- command (that I was not aware of, admittedly) internally renumbers the level of some categorical variables and give back those renumbered levels in its output.
        I did not experience such an issue with -estat vif-.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5

          Hi William,

          I tried something more and I kind of solved the problem. sic2 is my original variable, what was a string variable. However to do my regression I have destringed sic2 and created a new variable: sic2fixed. However, I got problems with labeling during my VIF analysis with sic2fixed (as you saw in my previous post).

          Now I started again and destringed sic2 (instead of creating a new variable).
          Code:
          destring sic2, replace
          followed by my regression analyses, what is still gives the same outcome as previous:
          Code:
          . regress abs_dca earlyea size loss growth inventory sd_sales sd_cfo bm fees i.fyear i.sic2, robust
          
          Linear regression                               Number of obs     =      1,127
                                                          F(36, 1090)       =      42.90
                                                          Prob > F          =     0.0000
                                                          R-squared         =     0.4906
                                                          Root MSE          =     .16089
          
          ------------------------------------------------------------------------------
                       |               Robust
               abs_dca |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
               earlyea |   .0139611   .0126574     1.10   0.270    -.0108746    .0387968
                  size |    .003257   .0098638     0.33   0.741    -.0160972    .0226112
                  loss |   -.009179   .0191993    -0.48   0.633    -.0468508    .0284927
                growth |  -.0008725   .0004332    -2.01   0.044    -.0017224   -.0000226
             inventory |   .1039879   .1085149     0.96   0.338    -.1089338    .3169097
              sd_sales |   4.60e-06   1.77e-06     2.59   0.010     1.12e-06    8.07e-06
                sd_cfo |  -.0000155   8.05e-06    -1.92   0.055    -.0000313    3.39e-07
                    bm |   .0306091   .0246821     1.24   0.215    -.0178208     .079039
                  fees |  -.0080096   .0120266    -0.67   0.506    -.0316076    .0155884
                       |
                 fyear |
                 2016  |  -.0323195   .0138803    -2.33   0.020    -.0595546   -.0050844
                 2017  |   .0107309   .0148073     0.72   0.469    -.0183233     .039785
                 2018  |  -.0263443   .0151248    -1.74   0.082    -.0560214    .0033327
                 2019  |  -.0986281   .0472746    -2.09   0.037    -.1913876   -.0058687
                       |
                  sic2 |
                   20  |    .007519   .0289802     0.26   0.795    -.0493444    .0643823
                   23  |  -.1068286   .0290454    -3.68   0.000    -.1638199   -.0498374
                   26  |  -.1332961   .0282909    -4.71   0.000    -.1888069   -.0777853
                   28  |   .4013773   .0316068    12.70   0.000     .3393602    .4633944
                   29  |  -.2151065   .0419823    -5.12   0.000    -.2974818   -.1327311
                   30  |  -.1281859   .0324364    -3.95   0.000    -.1918307   -.0645411
                   34  |  -.1044339   .0339218    -3.08   0.002    -.1709933   -.0378745
                   35  |   .0051912   .0283465     0.18   0.855    -.0504288    .0608112
                   36  |   .0531267   .0342207     1.55   0.121    -.0140192    .1202726
                   37  |  -.1002941   .0274196    -3.66   0.000    -.1540953   -.0464929
                   38  |   .0828333   .0294362     2.81   0.005     .0250752    .1405914
                   39  |  -.0913435   .0315329    -2.90   0.004    -.1532156   -.0294713
                   45  |   -.125567   .0239606    -5.24   0.000    -.1725812   -.0785529
                   48  |   .0575326   .0362255     1.59   0.113    -.0135469    .1286122
                   49  |  -.1038844   .0238229    -4.36   0.000    -.1506284   -.0571404
                   50  |   .1548413   .0853878     1.81   0.070    -.0127017    .3223843
                   51  |   .1588181   .0921888     1.72   0.085    -.0220694    .3397056
                   55  |  -.1566967   .0429625    -3.65   0.000    -.2409953   -.0723982
                   56  |  -.1292806   .0329217    -3.93   0.000    -.1938778   -.0646834
                   58  |  -.0861951   .0426981    -2.02   0.044    -.1699748   -.0024153
                   59  |  -.0584408   .0446967    -1.31   0.191    -.1461421    .0292605
                   73  |   .1492332   .0274652     5.43   0.000     .0953425     .203124
                   80  |   .1703843   .0710612     2.40   0.017      .030952    .3098166
                       |
                 _cons |   .2450955   .1252664     1.96   0.051    -.0006952    .4908861
          ------------------------------------------------------------------------------
          If I use the VIF code now, I don't get problems with the wrong labeling anymore.

          Code:
          . vif
          
              Variable |       VIF       1/VIF 
          -------------+----------------------
               earlyea |      1.40    0.714037
                  size |      5.83    0.171586
                  loss |      1.53    0.652766
                growth |      1.29    0.776751
             inventory |      3.46    0.288736
              sd_sales |      3.56    0.280609
                sd_cfo |      2.71    0.368631
                    bm |      1.64    0.610074
                  fees |      3.99    0.250430
                 fyear |
                 2016  |      1.50    0.667212
                 2017  |      1.58    0.633030
                 2018  |      1.61    0.619921
                 2019  |      1.10    0.908733
                  sic2 |
                   20  |      2.65    0.377903
                   23  |      1.67    0.598078
                   26  |      1.56    0.639988
                   28  |      3.50    0.285662
                   29  |      2.04    0.491027
                   30  |      1.37    0.728192
                   34  |      1.44    0.693666
                   35  |      2.68    0.373003
                   36  |      2.46    0.406148
                   37  |      2.14    0.466498
                   38  |      3.04    0.328443
                   39  |      1.28    0.780565
                   45  |      1.44    0.693522
                   48  |      1.71    0.583440
                   49  |      3.46    0.289082
                   50  |      1.92    0.522031
                   51  |      2.00    0.501147
                   55  |      1.99    0.503408
                   56  |      2.20    0.454065
                   58  |      1.37    0.731688
                   59  |      2.00    0.499786
                   73  |      3.80    0.263003
                   80  |      1.44    0.692617
          -------------+----------------------
              Mean VIF |      2.23
          I still do not know what went wrong with sic2fixed. However the outcome of the VIF numbers is the same for sic2 and sic2fixed.

          I forgot to answer your last question: my version of Stata is: 16.0. Thanks for helping me anyway!

          Comment


          • #6
            Hi Carlo,

            When I use the code that you suggested I still get the same labels as with VIF.

            However, I am happy that I now have the VIF results I wanted (see the last post). Unfortunately, I still do not know what went wrong with sic2fixed.

            Kind regards,
            Laura

            Last edited by Laura Witlox; 27 May 2020, 03:33.

            Comment


            • #7
              I am glad that the exercise I suggested led you to discover that you had a problem with the creation of sic2fixed that you were able to solve.

              My hypothesis is that to create sic2fixed you did
              Code:
              encode sic2, generate(sic2fixed)
              As the output of help encode tells us,
              Do not use encode if varname contains numbers that merely happen to be stored as strings; instead, use generate newvar = real(varname) or destring; see real() or [D] destring.
              The encode command takes the string variable, assigns a number to each distinct value, and then labels the values so that the original string values are displayed rather than the numbers they are assigned. So the string "28" was assigned the value 10, but when you browse your data you see "28" rather than 10. But note that sic2fixed is shown in a different color than either a string variable or a numerical variable without value labels; that is a clue that what you see is not what you get.

              Your original results can be explained by regress displaying the value labels but vif displaying the encoded values themselves. I see now that that is exactly what happens, which seems a bit cavalier of the vif command.

              Comment


              • #8
                Hi William,

                You are right. I have used encode to create sic2fixed:
                Code:
                encode sic2, generate(sic2fixed)
                In the future I will use destring instead of encode.

                Thanks for helping me out!

                Kind regards,
                Laura

                Comment

                Working...
                X