Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Variance inflation factor problem

    Hello everyone,
    I am facing a problem with -estat vif- after an OLS regression which includes continous, categorical and factor variables.
    I never had any problems with computing the vif of a model before. However, this time something weird happened.
    I obtained the follwoing output:
    Code:
    . estat vif
    
        Variable |       VIF       1/VIF  
    -------------+----------------------
        villages |
              1  |      2.07    0.483266
              3  |         .           .
              4  |         .           .
              5  |      2.30    0.434749
              6  |         .           .
              7  |      1.91    0.522719
              8  |         .           .
              9  |      2.28    0.439065
             10  |         .           .
             age |      1.89    0.527762
             sex |         .           .
          income |      2.19    0.456496
    accesscredit |         .           .
            educ |
              1  |      2.09    0.479603
              3  |         .           .
       healthHoH |      2.71    0.368845
          HHsize |         .           .
      hhaffected |      2.69    0.371174
      fjaddsteps |      1.41    0.710444
      rolemodels |      1.18    0.850576
    optimistic~s |      1.31    0.762461
    -------------+----------------------
        Mean VIF |         .
    Why are some cells empty? Could anyone please pint me into the right direction here?

    Tank you,
    Andreas

  • #2
    I would suggest showing the regression command and output that came before this.

    Also sometimes if you click on a dot you get a message that may clarify things.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Thanks Richard. I just ran the same lines again in order to copy the command and output to this thread. I am feeling a bit silly, because the earlier mentiond issue -for any reason- is not appearing any more.

      Hence, this thread can be closed.

      Comment


      • #4
        The poblem occured again. I have no idea what causes that.
        As Richard recommended, I copy the command and output of the regression model here.
        Code:
        .   fvset base 3 educ
        
        .   fvset base 2 villages
        
        .   reg adaptappraisal i.villages age sex income  accesscredit  i.educ healthHoH  HHsize fjaddsteps  rolemodels   
        
              Source |       SS       df       MS              Number of obs =     213
        -------------+------------------------------           F( 19,   193) =    3.01
               Model |   143.61416    19     7.55864           Prob > F      =  0.0001
            Residual |  484.937483   193  2.51262945           R-squared     =  0.2285
        -------------+------------------------------           Adj R-squared =  0.1525
               Total |  628.551643   212  2.96486624           Root MSE      =  1.5851
        
        ------------------------------------------------------------------------------
        adaptappra~l |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
            villages |
                  1  |    .140429   .5234871     0.27   0.789    -.8920613    1.172919
                  3  |  -1.334822   .5133613    -2.60   0.010     -2.34734   -.3223028
                  4  |  -.6341599   .5502348    -1.15   0.251    -1.719405    .4510857
                  5  |  -.3851816   .5143835    -0.75   0.455    -1.399717    .6293533
                  6  |  -.5212758   .5997871    -0.87   0.386    -1.704255    .6617032
                  7  |  -.4006783   .5610591    -0.71   0.476    -1.507273    .7059162
                  8  |   -.079398   .5258514    -0.15   0.880    -1.116551    .9577555
                  9  |  -.8271979   .5531408    -1.50   0.136    -1.918175    .2637793
                 10  |   .0762667   .4926919     0.15   0.877    -.8954851    1.048019
                     |
                 age |   .0031305   .0089679     0.35   0.727    -.0145573    .0208182
                 sex |   .2275542   .2349708     0.97   0.334    -.2358861    .6909944
              income |   .1332306   .0919382     1.45   0.149    -.0481021    .3145632
        accesscredit |  -.3557261   .3297639    -1.08   0.282     -1.00613    .2946777
                     |
                educ |
                  1  |  -1.042849   .4142495    -2.52   0.013    -1.859887    -.225812
                  2  |  -.6375354   .3734279    -1.71   0.089    -1.374059    .0989883
                     |
           healthHoH |  -.0556691   .0988497    -0.56   0.574    -.2506334    .1392952
              HHsize |   .1121931   .0467182     2.40   0.017     .0200493    .2043369
          fjaddsteps |   .4352882   .1629628     2.67   0.008     .1138716    .7567048
          rolemodels |   .0109248   .0059042     1.85   0.066    -.0007202    .0225699
               _cons |    .967726   1.018594     0.95   0.343     -1.04128    2.976732
        ------------------------------------------------------------------------------
        
        .
        end of do-file
        
        . estat vif
        
            Variable |       VIF       1/VIF  
        -------------+----------------------
            villages |
                  1  |      1.98    0.505951
                  3  |         .           .
                  4  |         .           .
                  5  |      2.23    0.447702
                  6  |         .           .
                  7  |      1.88    0.530526
                  8  |         .           .
                  9  |      2.24    0.445924
                 10  |         .           .
                 age |      1.87    0.533990
                 sex |         .           .
              income |      2.17    0.461249
        accesscredit |         .           .
                educ |
                  1  |      2.08    0.480022
                  2  |         .           .
           healthHoH |      2.59    0.385624
              HHsize |         .           .
          fjaddsteps |      2.63    0.380652
          rolemodels |      1.39    0.721139
        -------------+----------------------
            Mean VIF |         .
        Any advise or hint what could provoke this vif output?

        Thank you,
        Andreas

        Comment


        • #5
          I was able to narrow done the "failure source" that causes the VIF output to be strange.
          I ran the same model without changing the reference categories of the two factor variables (fvset), and suddenly
          the vif output appeared without dots. Still, I am wondering why changes to the reference category
          leads to "failure" in the vif-output.
          Has anyone any explanation for this?

          Comment


          • #6
            I'm guessing that the Ns for some of the categories are very small. I suggest running frequencies for your categorical variables, preferably limited to the 213 cases in your analysis.

            Also, given the small N, you may want to reduce the number of categories. For example, if it makes substantive sense, you might want to create a dichotomy that is village 3 versus not village 3.

            As a sidelight, you don't have to do fvset. You could instead say something like

            reg adaptappraisal ib2.villages age sex income accesscredit ib3.educ healthHoH HHsize fjaddsteps rolemodels

            Normally fvset would save you a little trouble, but given that you are having problems you might want to try different reference categories.
            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            StataNow Version: 19.5 MP (2 processor)

            EMAIL: [email protected]
            WWW: https://www3.nd.edu/~rwilliam

            Comment


            • #7
              Thank you Richard for your helpful opinion. The education variable is indeed unequally distributed (primary N=70; seondary N=117; tertiary=26). The village variable is relative homogenously distributed, only small variations in the N exist but of course all these categories are small (as you guessed). Is it that reference categories should be preferably larger categories compared against the other variable categories?

              Your codes works fine and seems to be an excellent alternative, thanks for that.

              Comment

              Working...
              X