Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Gravity panel model: How to deal with high colinearity in important variables without having to remove them?

    Good morning,

    I use Stata 13, I believe.

    I have a gravity model, and I am estimating the effects of Free trade Agreements (FTAs) on trade/exports between the US states and the rest of the world (ROW). My independent variables include GDP in the exporting and importing countries, Population (both partners Exp and Imp), Production
    (both partners Exp and Imp), farm income by state, Distance... and the FTAs (NAFTA, ASEAN...). There are two dummies for each of the FTAs where mm is a subscript for trade between members, and mn is for trade from a state to the ROW.

    I am really interested in NAFTA signs here: Expected sign NAFTAmm is>0 and NAFTAmn is<0).

    I am using both static (OLS, Fixed Effects, Two State gravity) and GMM. for the issue I am facing I am only going to show commands and output of OLS regression);
    I have first run a model with only NAFTA and obtain the correct expected signs, then when I add all the other independent variables in every estimation from OLS to GMM, the signs of coefficients NAFTA are negative (see output below). . WHY?
    Why does NAFTAmm goes from positive to Negative with other variables? Collinearity problem (continue below)

    1- OLS with NAFTA only

    Code:
    reg lnStateExports NAFTAmm NAFTAmn, robust
    Linear regression Number of obs = 232755
    F( 2,232752) = 171.17
    Prob > F = 0.0000
    R-squared = 0.0019
    Root MSE = 2.264
    ------------------------------------------------------------------------------
    | Robust
    lnStateExp~s | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    NAFTAmm | .0694875 .0399313 1.74 0.082 -.0087768 .1477517
    NAFTAmn | -.4145053 .022455 -18.46 0.000 -.4585166 -.3704941
    _cons | 13.52012 .0219329 616.43 0.000 13.47714 13.56311
    ------------------------------------------------------------------------------


    2-OLS NAFTA and other independent vars
    Code:
    reg lnStateExports NAFTAmm NAFTAmn lnExpGDP lnImpGDP lnExpPop lnImpPop lnExpProd lnImpProd lnFarmIn
    > c lnDistance BorderDummy, robust
    Linear regression Number of obs = 46477
    F( 11, 46465) = 1497.94
    Prob > F = 0.0000
    R-squared = 0.2393
    Root MSE = 1.9851
    ------------------------------------------------------------------------------
    | Robust
    lnStateExp~s | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    NAFTAmm | -1.46886 .1077491 -13.63 0.000 -1.68005 -1.25767
    NAFTAmn | -.0150688 .0482547 -0.31 0.755 -.1096486 .0795111
    lnExpGDP | -.1744397 .0581043 -3.00 0.003 -.288325 -.0605544
    lnImpGDP | .4816379 .0085027 56.65 0.000 .4649725 .4983034
    lnExpPop | .877131 .0642196 13.66 0.000 .7512596 1.003002
    lnImpPop | .1129621 .0124843 9.05 0.000 .0884926 .1374315
    lnExpProd | -.3711653 .0177531 -20.91 0.000 -.4059615 -.336369
    lnImpProd | -.0925821 .0098372 -9.41 0.000 -.1118632 -.0733009
    lnFarmInc | .6124517 .0176502 34.70 0.000 .577857 .6470464
    lnDistance | -.3987482 .0200808 -19.86 0.000 -.4381069 -.3593895
    BorderDummy | 2.299835 .2741551 8.39 0.000 1.762487 2.837183
    _cons | -10.49608 .3624592 -28.96 0.000 -11.2065 -9.785652
    ------------------------------------------------------------------------------



    I have decided to run some multicolinearity diagnostics and below
    Code:
    vif
    Variable | VIF 1/VIF
    -------------+----------------------
    lnExpPop | 37.17 0.026906 high multicolinearity
    lnExpGDP | 35.37 0.028275 high multicolinearity
    lnImpProd | 4.53 0.220672
    lnImpPop | 3.98 0.251494
    lnFarmInc | 3.72 0.268734
    lnExpProd | 3.69 0.270983
    lnImpGDP | 2.37 0.421065
    lnDistance | 1.96 0.511224
    ASEANm | 1.76 0.567868
    NAFTAmn | 1.56 0.640402
    NAFTAmm | 1.25 0.802299
    MERCOSURmn | 1.13 0.881311
    -------------+----------------------
    Mean VIF | 8.21


    Now I removed the variables above just to see what will happen (I have to say all these variables are really important here, hence I do not want to remove them.) the regression output seem ok. I also did the correlation matrix and it gives the output below.
    I would like to know if there is a way to solve this issue without having to remove any of my variables?
    Could you please tell me if there is another regression diagnostic I should do instead of focusing only on high collinearity? Thanks for you help.

    Code:
    running regression without EXpPOP and EXPGDP
    
    . reg lnStateExports NAFTAmm NAFTAmn lnImpGDP lnImpPop lnExpProd lnImpProd lnFarmInc lnDistance ASEAN
    > m MERCOSURmn, robust
    
    Linear regression                                      Number of obs =   46477
                                                           F( 10, 46466) = 1005.83
                                                           Prob > F      =  0.0000
                                                           R-squared     =  0.1950
                                                           Root MSE      =   2.042
    
    ------------------------------------------------------------------------------
                 |               Robust
    lnStateExp~s |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
         NAFTAmm |    -1.5044   .1081805   -13.91   0.000    -1.716436   -1.292365
         NAFTAmn |  -.3608108    .048148    -7.49   0.000    -.4551817     -.26644
        lnImpGDP |   .4777262   .0088465    54.00   0.000     .4603869    .4950655
        lnImpPop |   .0328538    .014152     2.32   0.020     .0051156     .060592
    ......
    
    
     vif
    
        Variable |       VIF       1/VIF
    -------------+----------------------
       lnImpProd |      4.53    0.220731
        lnImpPop |      3.98    0.251540
       lnExpProd |      3.33    0.300253
       lnFarmInc |      3.01    0.331741
        lnImpGDP |      2.37    0.422257
      lnDistance |      1.95    0.513197
          ASEANm |      1.76    0.568594
         NAFTAmn |      1.45    0.688263
         NAFTAmm |      1.25    0.802492
      MERCOSURmn |      1.13    0.881512
    -------------+----------------------
        Mean VIF |      2.48

    Code:
    
    . pwcorr lnStateExports NAFTAmm NAFTAmn lnExpGDP lnImpGDP lnExpPop lnImpPop lnExpProd lnImpProd lnFar
    > mInc lnDistance ASEANm MERCOSURmn
    
                 | lnStat..  NAFTAmm  NAFTAmn lnExpGDP lnImpGDP lnExpPop lnImpPop
    -------------+---------------------------------------------------------------
    lnStateExp~s |   1.0000
         NAFTAmm |   0.0029   1.0000
         NAFTAmn |  -0.0431   0.0497   1.0000
        lnExpGDP |   0.2308   0.0097   0.1563   1.0000
        lnImpGDP |  -0.0581  -0.0556  -0.3479  -0.0943   1.0000
        lnExpPop |   0.1712  -0.0143  -0.1683   0.4509  -0.0825   1.0000
        lnImpPop |   0.1491   0.1121  -0.0085  -0.0397   0.5758  -0.0197   1.0000
       lnExpProd |   0.1366  -0.0053   0.1189   0.5315   0.2612   0.1680  -0.0250
       lnImpProd |  -0.1147  -0.0573  -0.2563  -0.0496   0.9835  -0.0531   0.7958
       lnFarmInc |   0.2375  -0.0009  -0.0101   0.4262  -0.0502   0.2373  -0.0137
      lnDistance |   0.0508  -0.2313   0.0126  -0.0374   0.0361  -0.0254   0.3061
          ASEANm |   0.0590  -0.0601  -0.0097  -0.0306   0.1801  -0.0218   0.1661
      MERCOSURmn |  -0.0604  -0.0638   0.0002   0.0234   0.0070   0.0165  -0.0144
    
                 | lnExpP~d lnImpP~d lnFarm~c lnDist~e   ASEANm MERCOS~n
    -------------+------------------------------------------------------
       lnExpProd |   1.0000
       lnImpProd |   0.3098   1.0000
       lnFarmInc |   0.7343  -0.0186   1.0000
      lnDistance |   0.0124  -0.0258   0.0450   1.0000
          ASEANm |   0.0059   0.0586  -0.0119   0.3608   1.0000
      MERCOSURmn |   0.0094   0.0217   0.0126  -0.1854  -0.1162   1.0000
    Thanks for help


    Last edited by Cynthia Hourizene; 20 Feb 2018, 09:54.

  • #2
    The way to solve this problem is to ignore the VIF results. In fact, you never should have done them in the first place. See Arthur Goldberger's textbook A Course in Econometrics. He has an entire chapter on why multicollinearity is, for the most part (and in your situation in particular), a bogus problem and why you should not look for it, and should ignore it if you find it.

    Comment


    • #3
      Thank you M. Clyde Schechter.
      I have checked the textbook online and unfortunately I cannot afford it now. in the meantime, do you have any suggestion to solve the problem I am facing? what explains the change of signs in my regression as I add more variable? what can I do to solve it?

      Comment


      • #4
        what explains the change of signs in my regression as I add more variable? what can I do to solve it?
        There are two separate issues. If you look at the N's for the two models, you will see that the sample sizes are very different. Due to missing values on variables, a lot of observations have been dropped from the model without when you go to the model with the extra variables. In fact, barely 1 of 6 observations survives the transition! It is possible that the resulting sample is very biased as a result. So the first step is to compare apples to apples: they have to be run on the same sample. The easiest way to do this is to run the model with the additional variables first. Any observation that is retained in this analysis will also be retainable when analyzing the smaller model. Next run the model omitting those variables, adding -if e(sample)- to the regression command (before the comma). You may well find that, when you run them on the same sample, the two models are in substantial agreement. The question, then, becomes whether this biased, reduced sample is suitable for answering your research questions.

        Or perhaps the discrepancy between the models will persist. There is nothing unusual or surprising about estimates changing radically, including changing sign, when new variables are added to the model. This is known as Simpson's paradox, or the Yule-Simpson paradox, or, as applied to continuous variables, is sometimes called Lord's paradox. Wikipedia has a really good page on Simpson's paradox, which I recommend you read.

        The change in signs, or other large difference, if it persists when running both models on the same sample, is not the problem you have to solve. It is what it is, and it is not a problem, nor does it require, nor admit, a solution. Your problem is that you need to decide whether it is the model with or without the additional variables that is appropriate to answering your research questions. This will depend on an in-depth understanding of your research questions and the roles the different variables play. It's not a statistical issue: it's a substantive one and you will benefit most from advice from other people in your discipline. In any case, one thing you should not do is choose a model because it gives you the results you like. Another thing you should not do is let a bogus problem like multicollinearity affect your choice of analysis.

        Regarding Goldberger's textbook, it might be available in your institution's library. If you can borrow a copy (from there, or from a colleague) it's well worth a read. It's not only eye-opening it's written in a very entertaining style.

        Comment


        • #5
          Thank you sir, I will follow your instructions now

          Comment

          Working...
          X