Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • PPML with many warnings...

    Dear all,

    I've been struggling with a ppml-application for a long while. Now it seems to give quite good results - but I am a little confused by all the warnings on "large-values" and "overfitting".
    Also, my RESET test fails, so can I trust these results at all? I copy my output below.

    What about dummies, should I try to avoid them?

    As a robustness check I run a simple OLS (logged dependent variable) - and some of the results really change. So I assume something is not quit ecorrect, right?

    Thanks so much for your help!
    Klaus

    -----------------------------------------------------------------------------------------------------------------------
    ppml cum_sizes_concludedb $explan_var1, keep

    note: checking the existence of the estimates
    WARNING: log_GDP_h has very large values, consider rescaling or recentering
    WARNING: log_GDP_i has very large values, consider rescaling or recentering
    (many more warnings)

    Number of regressors excluded to ensure that the estimates exist: 0
    Number of observations excluded: 0

    note: starting ppml estimation
    note: cum_sizes_concludedb has noninteger values

    Iteration 1: deviance = 252057.3
    Iteration 2: deviance = 141818
    Iteration 3: deviance = 113417.2
    Iteration 4: deviance = 107601.2
    Iteration 5: deviance = 106889.8
    Iteration 6: deviance = 106856.1
    Iteration 7: deviance = 106855.3
    Iteration 8: deviance = 106855.3
    Iteration 9: deviance = 106855.3

    Number of parameters: 29
    Number of observations: 16940
    Pseudo log-likelihood: -54214.637
    R-squared: .19015636
    Option strict is: off
    WARNING: The model appears to overfit some observations with cum_sizes_concludedb=0
    -------------------------------------------------------------------------------------
    | Robust
    cum_sizes_conclu~db | Coef. Std. Err. z P>|z| [95% Conf. Interval]
    --------------------+----------------------------------------------------------------
    log_distance | -.9528056 .1971872 -4.83 0.000 -1.339285 -.5663257
    common_language | 1.455636 .3174794 4.58 0.000 .8333883 2.077884
    log_GDP_h | .6108688 .0999441 6.11 0.000 .414982 .8067556
    log_GDP_i | .5874121 .0686404 8.56 0.000 .4528793 .7219449
    log_GDP_pc_h | -.6919959 .2293051 -3.02 0.003 -1.141426 -.2425662
    log_GDP_pc_i | .2883472 .1847046 1.56 0.118 -.0736673 .6503616
    net_agric_imports~h | -3.315186 2.274837 -1.46 0.145 -7.773785 1.143414
    net_agric_imports~i | -.3133903 .723114 -0.43 0.665 -1.730668 1.103887
    log_agric_area_pc_h | 1.388383 .275652 5.04 0.000 .8481147 1.928651
    log_agric_area_pc_i | .277707 .1605227 1.73 0.084 -.0369118 .5923258
    share_available_l~h | -.0401704 .0126402 -3.18 0.001 -.0649447 -.0153961
    share_available_l~i | -.0149193 .0077438 -1.93 0.054 -.0300969 .0002583
    log_water_resourc~h | .9159727 .1532531 5.98 0.000 .6156022 1.216343
    log_water_resourc~i | .198225 .1343857 1.48 0.140 -.0651662 .4616162
    log_agric_product~h | .0325299 .1453248 0.22 0.823 -.2523014 .3173612
    log_agric_product~i | .011202 .1072409 0.10 0.917 -.1989862 .2213903
    corruption_h | -.0028488 .0084842 -0.34 0.737 -.0194776 .01378
    corruption_i | .0162642 .0096734 1.68 0.093 -.0026954 .0352237
    n_deals_h | .0446845 .0057439 7.78 0.000 .0334265 .0559424
    rta | 1.01743 .4402138 2.31 0.021 .1546272 1.880233
    Africa | -.1126869 .3363213 -0.34 0.738 -.7718645 .5464907
    Asia | -3.619532 .7332476 -4.94 0.000 -5.056671 -2.182394
    America | -1.455312 .5510493 -2.64 0.008 -2.535348 -.3752749
    Europe | -.4128112 .4677604 -0.88 0.377 -1.329605 .5039823
    America_i | 2.13523 .6506217 3.28 0.001 .8600344 3.410425
    Asia_i | 4.363523 .9203239 4.74 0.000 2.559722 6.167325
    Africa_i | 2.683462 .8665773 3.10 0.002 .9850013 4.381922
    Europe_i | 2.383969 .7021059 3.40 0.001 1.007867 3.760071
    _cons | -36.71535 4.897823 -7.50 0.000 -46.3149 -27.11579
    -------------------------------------------------------------------------------------

    . outreg2 using "Results/ppml_compare", excel append
    Results/ppml_compare.xml
    dir : seeout

    .
    .
    . predict fit, xb //Get fitted values - RESET test according to http://privatewww.esse
    > x.ac.uk/~jmcss/reset.do
    (13866 missing values generated)

    . gen fit2=fit^2 //Square the fitted values
    (13866 missing values generated)

    . qui ppml cum_sizes_concludedb $explan_var1 fit2, keep // Estimate the model with th
    > e additional regressor
    WARNING: log_GDP_h has very large values, consider rescaling or recentering
    WARNING: log_GDP_i has very large values, consider rescaling or recentering
    WARNING: share_available_land_h has very large values, consider rescaling or recenter
    > ing
    WARNING: share_available_land_i has very large values, consider rescaling or recenter
    > ing
    WARNING: log_water_resources_h has very large values, consider rescaling or recenteri
    > ng
    WARNING: log_water_resources_i has very large values, consider rescaling or recenteri
    > ng
    WARNING: log_agric_productivity_i has very large values, consider rescaling or recent
    > ering
    WARNING: corruption_h has very large values, consider rescaling or recentering
    WARNING: corruption_i has very large values, consider rescaling or recentering
    WARNING: n_deals_h has very large values, consider rescaling or recentering
    WARNING: fit2 has very large values, consider rescaling or recentering
    Number of regressors excluded to ensure that the estimates exist: 0
    Number of observations excluded: 0
    WARNING: The model appears to overfit some observations with cum_sizes_concludedb=0

    . test fit2=0 //Test the significance of the additional regressor (this is equivalent
    > to a t-test on fit2)

    ( 1) fit2 = 0

    chi2( 1) = 5.06
    Prob > chi2 = 0.0245



  • #2
    Dear Klaus,

    For warning messages, pls refer to ppml command's help file
    Code:
     help ppml
    Your results is not readable. Pls follow advice here that we can assist you .
    This might help for the Reset test.
    And results from ols almost never similar to ppml. The Log of Gravity paper may help in this.

    Best.

    Comment


    • #3
      Dear Klaus,

      Just to add to Dias's excellent advice, since you were able to get convergence, you can ignore the warnings about large values.
      The warning about overfitting, however, is important. You should estimate the model using a different base category for your dummies. Better, you should estimate the model including the dummies for all categories and let Stata choose which category to exclude.

      All the best,

      Joao

      Comment


      • #4
        Dear Dias, dear Joao,

        Thanks so much for your excellent advice - I'll try to stick to the advice as best as I can.

        I followed your advice and used all dummies and let Stata chose which ones to exclude (also nicely described here: http://personal.lse.ac.uk/tenreyro/Pisch.do)

        Concerning this:
        • Should I drop the constant then? I read here that this is recommended, but I don't understand why?
        • My results get much better by doing this - but only if I also include the "strict" option. The ppml help file howvere says one should be very careful when using it. When is it ok to use?
        • For one of my specifications I still get the overfitting warning - this is the specification with very few variables. What does it mean, should I add more variables, or would clustering help (I thought about clustering, but using my Country-Pairs is probably too many (16,786), only investors (155) or the receivers (109) does not make sense). I also get the overfitting warning when I do the RESET test and include the additional regressor. Is this a problem?
        Thanks so much for your help, it is much appreciated!!

        Best wishes,
        Klaus
        Last edited by Klaus Schmidt; 03 Jun 2016, 02:52.

        Comment


        • #5
          Dear Klaus,

          Please post the results you get with and without the -strict- option so that we can see what is going on.

          All the best,

          Joao

          Comment


          • #6
            Dear Joao,

            Thanks so much - I admit I am a bit lost as the ppml is still unfamiliar to me.

            The results with the strict option (and noconstant) are as follows:

            Code:
            .  ppml cum_sizes_concludedb $explan_var _D* _E*, keep noconst strict
            
            note: checking the existence of the estimates
            WARNING: log_GDP_h has very large values, consider rescaling  or recentering
            WARNING: log_GDP_i has very large values, consider rescaling  or recentering
            WARNING: n_deals_h has very large values, consider rescaling  or recentering
            
            Number of regressors excluded to ensure that the estimates exist: 2
            Excluded regressors:  _EGEO_i_2 _EGEO_i_5
            Number of observations excluded: 0
            
            note: starting ppml estimation
            note: cum_sizes_concludedb has noninteger values
            
            Iteration 1:   deviance =  359096.2
            Iteration 2:   deviance =  194441.8
            Iteration 3:   deviance =  150712.6
            Iteration 4:   deviance =  142496.2
            Iteration 5:   deviance =  141789.4
            Iteration 6:   deviance =    141780
            Iteration 7:   deviance =    141780
            Iteration 8:   deviance =    141780
            
            Number of parameters: 16
            Number of observations: 21250
            Pseudo log-likelihood: -71726.238
            R-squared: .15738383
            Option strict is: on
            ---------------------------------------------------------------------------------
                            |               Robust
            cum_sizes_co~db |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            ----------------+----------------------------------------------------------------
               log_distance |  -.7644989   .1797362    -4.25   0.000    -1.116775   -.4122224
            common_language |   1.294402   .3216139     4.02   0.000     .6640507    1.924754
                  log_GDP_h |   .3859054   .1050693     3.67   0.000     .1799734    .5918374
                  log_GDP_i |   .5502893   .0625846     8.79   0.000     .4276257    .6729529
               log_GDP_pc_h |   -.407014   .2978928    -1.37   0.172    -.9908731    .1768452
               log_GDP_pc_i |    .436156   .1170062     3.73   0.000     .2068281    .6654838
                  n_deals_h |   .0450022   .0041509    10.84   0.000     .0368667    .0531378
                        rta |   1.056946     .34619     3.05   0.002     .3784261    1.735466
                  _DGEO_h_1 |   -18.7301   2.174117    -8.62   0.000    -22.99129   -14.46891
                  _DGEO_h_2 |  -18.35923   2.333426    -7.87   0.000    -22.93266    -13.7858
                  _DGEO_h_3 |  -21.51532   2.341257    -9.19   0.000     -26.1041   -16.92654
                  _DGEO_h_4 |  -17.91337   2.290921    -7.82   0.000     -22.4035   -13.42325
                  _DGEO_h_5 |  -17.06806   2.515984    -6.78   0.000     -21.9993   -12.13683
                  _EGEO_i_1 |  -.0388593   .4431417    -0.09   0.930     -.907401    .8296824
                  _EGEO_i_3 |   1.505537   .3115474     4.83   0.000     .8949157    2.116159
                  _EGEO_i_4 |   .0420963   .3682248     0.11   0.909    -.6796111    .7638036
            ---------------------------------------------------------------------------------
            
            . outreg2 using "Results/ppml_compare", excel replace
            Results/ppml_compare.xml
            dir : seeout
            
            .
            .
            . predict fit, xb //Get fitted values - RESET test according to http://privatewww.essex.ac.uk/
            > ~jmcss/reset.do
            (9556 missing values generated)
            
            . gen fit2=fit^2 //Square the fitted values
            (9556 missing values generated)
            
            . qui ppml cum_sizes_concludedb $explan_var _D* _E* fit2, keep  noconst strict  // Estimate th
            > e model with the additional regressor
            WARNING: log_GDP_h has very large values, consider rescaling  or recentering
            WARNING: log_GDP_i has very large values, consider rescaling  or recentering
            WARNING: n_deals_h has very large values, consider rescaling  or recentering
            WARNING: fit2 has very large values, consider rescaling  or recentering
            Number of regressors excluded to ensure that the estimates exist: 2
            Number of observations excluded: 0
            
            . test fit2=0 //Test the significance of the additional regressor (this is equivalent to a t-t
            > est on fit2)
            
             ( 1)  fit2 = 0
            
                       chi2(  1) =    9.53
                     Prob > chi2 =    0.0020
            
            
            . ppml cum_sizes_concludedb $explan_var1 _D* _E*, keep noconst strict
            
            note: checking the existence of the estimates
            WARNING: log_GDP_h has very large values, consider rescaling  or recentering
            WARNING: log_GDP_i has very large values, consider rescaling  or recentering
            WARNING: share_available_land_h has very large values, consider rescaling  or recentering
            WARNING: share_available_land_i has very large values, consider rescaling  or recentering
            WARNING: log_water_resources_h has very large values, consider rescaling  or recentering
            WARNING: log_water_resources_i has very large values, consider rescaling  or recentering
            WARNING: log_agric_productivity_i has very large values, consider rescaling  or recentering
            WARNING: corruption_h has very large values, consider rescaling  or recentering
            WARNING: corruption_i has very large values, consider rescaling  or recentering
            WARNING: n_deals_h has very large values, consider rescaling  or recentering
            
            Number of regressors excluded to ensure that the estimates exist: 2
            Excluded regressors:  _DGEO_h_5 _EGEO_i_5
            Number of observations excluded: 0
            
            note: starting ppml estimation
            note: cum_sizes_concludedb has noninteger values
            
            Iteration 1:   deviance =  296038.8
            Iteration 2:   deviance =  165420.1
            Iteration 3:   deviance =  132075.3
            Iteration 4:   deviance =  125898.1
            Iteration 5:   deviance =  125295.4
            Iteration 6:   deviance =  125284.3
            Iteration 7:   deviance =  125284.3
            Iteration 8:   deviance =  125284.3
            
            Number of parameters: 28
            Number of observations: 16720
            Pseudo log-likelihood: -63420.144
            R-squared: .17288283
            Option strict is: on
            ------------------------------------------------------------------------------------------
                                     |               Robust
                cum_sizes_concludedb |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------------------+----------------------------------------------------------------
                        log_distance |  -1.201793   .1801519    -6.67   0.000    -1.554884   -.8487016
                     common_language |   1.138324   .4179556     2.72   0.006     .3191461    1.957502
                           log_GDP_h |   .3913882   .1393914     2.81   0.005     .1181861    .6645903
                           log_GDP_i |   .4823699   .0643716     7.49   0.000     .3562039     .608536
                        log_GDP_pc_h |  -.6416376   .3619771    -1.77   0.076      -1.3511    .0678245
                        log_GDP_pc_i |  -.1496204   .2003057    -0.75   0.455    -.5422123    .2429716
              net_agric_imports_pc_h |  -.2959744    2.40482    -0.12   0.902    -5.009335    4.417386
              net_agric_imports_pc_i |  -.7711362   .8465933    -0.91   0.362    -2.430429    .8881562
                 log_agric_area_pc_h |   1.357242   .2203435     6.16   0.000     .9253767    1.789107
                 log_agric_area_pc_i |  -.0698345   .2100094    -0.33   0.739    -.4814454    .3417763
              share_available_land_h |  -.0265334   .0123975    -2.14   0.032    -.0508319   -.0022348
              share_available_land_i |  -.0011298   .0060209    -0.19   0.851    -.0129305    .0106709
               log_water_resources_h |   .4094063   .1347031     3.04   0.002     .1453931    .6734194
               log_water_resources_i |  -.1177472   .1148511    -1.03   0.305    -.3428512    .1073567
            log_agric_productivity_h |  -.4160468   .1383597    -3.01   0.003    -.6872268   -.1448669
            log_agric_productivity_i |  -.1575241   .1354979    -1.16   0.245    -.4230951    .1080469
                        corruption_h |  -.0111685   .0088038    -1.27   0.205    -.0284236    .0060866
                        corruption_i |   .0340047   .0099084     3.43   0.001     .0145845    .0534248
                           n_deals_h |   .0473115    .006525     7.25   0.000     .0345227    .0601003
                                 rta |   .5814468   .5075564     1.15   0.252    -.4133453    1.576239
                           _DGEO_h_1 |  -2.315515   .9760781    -2.37   0.018    -4.228593   -.4024373
                           _DGEO_h_2 |  -1.866438   .6921475    -2.70   0.007    -3.223022   -.5098537
                           _DGEO_h_3 |   -4.78573   .9438335    -5.07   0.000     -6.63561   -2.935851
                           _DGEO_h_4 |  -1.629991   .8536095    -1.91   0.056    -3.303035    .0430527
                           _EGEO_i_1 |  -4.033844   .8303639    -4.86   0.000    -5.661327    -2.40636
                           _EGEO_i_2 |  -2.812537    .660103    -4.26   0.000    -4.106315   -1.518759
                           _EGEO_i_3 |  -1.216558   .8424364    -1.44   0.149    -2.867703     .434587
                           _EGEO_i_4 |  -2.868658   .7179761    -4.00   0.000    -4.275865   -1.461451
            ------------------------------------------------------------------------------------------
            
            . outreg2 using "Results/ppml_compare", excel append
            Results/ppml_compare.xml
            dir : seeout
            
            .
            .
            . predict fit, xb //Get fitted values - RESET test according to http://privatewww.essex.ac.uk/
            > ~jmcss/reset.do
            (14086 missing values generated)
            
            . gen fit2=fit^2 //Square the fitted values
            (14086 missing values generated)
            
            . qui ppml cum_sizes_concludedb $explan_var1 _D* _E* fit2, keep noconst strict // Estimate the
            >  model with the additional regressor
            WARNING: log_GDP_h has very large values, consider rescaling  or recentering
            WARNING: log_GDP_i has very large values, consider rescaling  or recentering
            WARNING: share_available_land_h has very large values, consider rescaling  or recentering
            WARNING: share_available_land_i has very large values, consider rescaling  or recentering
            WARNING: log_water_resources_h has very large values, consider rescaling  or recentering
            WARNING: log_water_resources_i has very large values, consider rescaling  or recentering
            WARNING: log_agric_productivity_i has very large values, consider rescaling  or recentering
            WARNING: corruption_h has very large values, consider rescaling  or recentering
            WARNING: corruption_i has very large values, consider rescaling  or recentering
            WARNING: n_deals_h has very large values, consider rescaling  or recentering
            WARNING: fit2 has very large values, consider rescaling  or recentering
            Number of regressors excluded to ensure that the estimates exist: 2
            Number of observations excluded: 0
            WARNING: The model appears to overfit some observations with cum_sizes_concludedb=0
            
            . test fit2=0 //Test the significance of the additional regressor (this is equivalent to a t-t
            > est on fit2)
            
             ( 1)  fit2 = 0
            
                       chi2(  1) =    3.06
                     Prob > chi2 =    0.0800
            And without the strict option:

            Code:
            
            .  ppml cum_sizes_concludedb $explan_var _D* _E*, keep noconst
            
            note: checking the existence of the estimates
            WARNING: log_GDP_h has very large values, consider rescaling  or recentering
            WARNING: log_GDP_i has very large values, consider rescaling  or recentering
            WARNING: n_deals_h has very large values, consider rescaling  or recentering
            
            Number of regressors excluded to ensure that the estimates exist: 1
            Excluded regressors:  _EGEO_i_5
            Number of observations excluded: 0
            
            note: starting ppml estimation
            note: cum_sizes_concludedb has noninteger values
            
            Iteration 1:   deviance =    365271
            Iteration 2:   deviance =  196395.2
            Iteration 3:   deviance =  151197.4
            Iteration 4:   deviance =  142551.7
            Iteration 5:   deviance =  141759.5
            Iteration 6:   deviance =  141732.9
            Iteration 7:   deviance =    141727
            Iteration 8:   deviance =  141724.9
            Iteration 9:   deviance =  141724.1
            Iteration 10:  deviance =  141723.8
            Iteration 11:  deviance =  141723.7
            Iteration 12:  deviance =  141723.7
            Iteration 13:  deviance =  141723.6
            Iteration 14:  deviance =  141723.6
            Iteration 15:  deviance =  141723.6
            Iteration 16:  deviance =  141723.6
            Iteration 17:  deviance =  141723.6
            Iteration 18:  deviance =  141723.6
            Iteration 19:  deviance =  141723.6
            Iteration 20:  deviance =  141723.6
            Iteration 21:  deviance =  141723.6
            Iteration 22:  deviance =  141723.6
            Warning:  variance matrix is nonsymmetric or highly singular
            
            Number of parameters: 17
            Number of observations: 21250
            Pseudo log-likelihood: -71698.063
            R-squared: .15788584
            Option strict is: off
            WARNING: The model appears to overfit some observations with cum_sizes_concludedb=0
            ---------------------------------------------------------------------------------
                            |               Robust
            cum_sizes_co~db |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            ----------------+----------------------------------------------------------------
               log_distance |   -.763162          .        .       .            .           .
            common_language |   1.293424          .        .       .            .           .
                  log_GDP_h |   .3859217          .        .       .            .           .
                  log_GDP_i |   .5476548          .        .       .            .           .
               log_GDP_pc_h |  -.4063192          .        .       .            .           .
               log_GDP_pc_i |   .4349815          .        .       .            .           .
                  n_deals_h |   .0450038          .        .       .            .           .
                        rta |   1.053489          .        .       .            .           .
                  _DGEO_h_1 |  -36.47161          .        .       .            .           .
                  _DGEO_h_2 |  -36.10625          .        .       .            .           .
                  _DGEO_h_3 |  -39.25592          .        .       .            .           .
                  _DGEO_h_4 |  -35.65588          .        .       .            .           .
                  _DGEO_h_5 |  -34.80601          .        .       .            .           .
                  _EGEO_i_1 |   17.76028          .        .       .            .           .
                  _EGEO_i_2 |   17.81901          .        .       .            .           .
                  _EGEO_i_3 |   19.31081          .        .       .            .           .
                  _EGEO_i_4 |   17.85156          .        .       .            .           .
            ---------------------------------------------------------------------------------
            
            . outreg2 using "Results/ppml_compare", excel replace
            Results/ppml_compare.xml
            dir : seeout
            
            .
            .
            . predict fit, xb //Get fitted values - RESET test according to http://privatewww.essex.ac.uk/
            > ~jmcss/reset.do
            (9556 missing values generated)
            
            . gen fit2=fit^2 //Square the fitted values
            (9556 missing values generated)
            
            . qui ppml cum_sizes_concludedb $explan_var _D* _E* fit2, keep  noconst   // Estimate the mode
            > l with the additional regressor
            WARNING: log_GDP_h has very large values, consider rescaling  or recentering
            WARNING: log_GDP_i has very large values, consider rescaling  or recentering
            WARNING: n_deals_h has very large values, consider rescaling  or recentering
            WARNING: fit2 has very large values, consider rescaling  or recentering
            Number of regressors excluded to ensure that the estimates exist: 1
            Number of observations excluded: 0
            Warning:  variance matrix is nonsymmetric or highly singular
            WARNING: The model appears to overfit some observations with cum_sizes_concludedb=0
            
            . test fit2=0
            
             ( 1)  fit2 = 0
                   Constraint 1 dropped
            
                       chi2(  0) =       .
                     Prob > chi2 =         .
            
            
             ppml cum_sizes_concludedb $explan_var1 _D* _E*, keep noconst
            
            note: checking the existence of the estimates
            WARNING: log_GDP_h has very large values, consider rescaling  or recentering
            WARNING: log_GDP_i has very large values, consider rescaling  or recentering
            WARNING: share_available_land_h has very large values, consider rescaling  or recentering
            WARNING: share_available_land_i has very large values, consider rescaling  or recentering
            WARNING: log_water_resources_h has very large values, consider rescaling  or recentering
            WARNING: log_water_resources_i has very large values, consider rescaling  or recentering
            WARNING: log_agric_productivity_i has very large values, consider rescaling  or recentering
            WARNING: corruption_h has very large values, consider rescaling  or recentering
            WARNING: corruption_i has very large values, consider rescaling  or recentering
            WARNING: n_deals_h has very large values, consider rescaling  or recentering
            
            Number of regressors excluded to ensure that the estimates exist: 1
            Excluded regressors:  _EGEO_i_5
            Number of observations excluded: 0
            
            note: starting ppml estimation
            note: cum_sizes_concludedb has noninteger values
            
            Iteration 1:   deviance =  273234.5
            Iteration 2:   deviance =  152042.2
            Iteration 3:   deviance =  120736.9
            Iteration 4:   deviance =  114467.6
            Iteration 5:   deviance =  113764.5
            Iteration 6:   deviance =  113741.3
            Iteration 7:   deviance =  113738.5
            Iteration 8:   deviance =  113737.5
            Iteration 9:   deviance =  113737.1
            Iteration 10:  deviance =  113736.9
            Iteration 11:  deviance =  113736.9
            Iteration 12:  deviance =  113736.9
            Iteration 13:  deviance =  113736.9
            Iteration 14:  deviance =  113736.9
            Iteration 15:  deviance =  113736.9
            Iteration 16:  deviance =  113736.9
            Iteration 17:  deviance =  113736.9
            Iteration 18:  deviance =  113736.9
            Iteration 19:  deviance =  113736.9
            Iteration 20:  deviance =  113736.9
            Iteration 21:  deviance =  113736.9
            Iteration 22:  deviance =  113736.9
            Warning:  variance matrix is nonsymmetric or highly singular
            
            Number of parameters: 29
            Number of observations: 16720
            Pseudo log-likelihood: -57646.44
            R-squared: .19437534
            Option strict is: off
            WARNING: The model appears to overfit some observations with cum_sizes_concludedb=0
            ------------------------------------------------------------------------------------------
                                     |               Robust
                cum_sizes_concludedb |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------------------+----------------------------------------------------------------
                        log_distance |  -.9636159          .        .       .            .           .
                     common_language |   1.380642          .        .       .            .           .
                           log_GDP_h |   .6158404          .        .       .            .           .
                           log_GDP_i |   .5687462          .        .       .            .           .
                        log_GDP_pc_h |  -.6418756          .        .       .            .           .
                        log_GDP_pc_i |   .3423702          .        .       .            .           .
              net_agric_imports_pc_h |  -3.089397          .        .       .            .           .
              net_agric_imports_pc_i |  -.4997463          .        .       .            .           .
                 log_agric_area_pc_h |   1.494319          .        .       .            .           .
                 log_agric_area_pc_i |   .2836532          .        .       .            .           .
              share_available_land_h |  -.0432753          .        .       .            .           .
              share_available_land_i |  -.0165432          .        .       .            .           .
               log_water_resources_h |   .9540423          .        .       .            .           .
               log_water_resources_i |   .2379842          .        .       .            .           .
            log_agric_productivity_h |   .0275815          .        .       .            .           .
            log_agric_productivity_i |  -.0219905          .        .       .            .           .
                        corruption_h |   -.005127          .        .       .            .           .
                        corruption_i |   .0174166          .        .       .            .           .
                           n_deals_h |   .0450861          .        .       .            .           .
                                 rta |   .9991179          .        .       .            .           .
                           _DGEO_h_1 |  -53.00853          .        .       .            .           .
                           _DGEO_h_2 |  -54.51339          .        .       .            .           .
                           _DGEO_h_3 |  -56.62995          .        .       .            .           .
                           _DGEO_h_4 |  -53.55057          .        .       .            .           .
                           _DGEO_h_5 |  -52.69731          .        .       .            .           .
                           _EGEO_i_1 |   17.73117          .        .       .            .           .
                           _EGEO_i_2 |   17.43558          .        .       .            .           .
                           _EGEO_i_3 |   19.76491          .        .       .            .           .
                           _EGEO_i_4 |   17.65751          .        .       .            .           .
            ------------------------------------------------------------------------------------------
            
            . outreg2 using "Results/ppml_compare", excel append
            Results/ppml_compare.xml
            dir : seeout
            
            .
            .
            . predict fit, xb
            (14086 missing values generated)
            
            . gen fit2=fit^2 //Square the fitted values
            (14086 missing values generated)
            
            . qui ppml cum_sizes_concludedb $explan_var1 _D* _E* fit2, keep noconst // Estimate the model
            > with the additional regressor
            WARNING: log_GDP_h has very large values, consider rescaling  or recentering
            WARNING: log_GDP_i has very large values, consider rescaling  or recentering
            WARNING: share_available_land_h has very large values, consider rescaling  or recentering
            WARNING: share_available_land_i has very large values, consider rescaling  or recentering
            WARNING: log_water_resources_h has very large values, consider rescaling  or recentering
            WARNING: log_water_resources_i has very large values, consider rescaling  or recentering
            WARNING: log_agric_productivity_i has very large values, consider rescaling  or recentering
            WARNING: corruption_h has very large values, consider rescaling  or recentering
            WARNING: corruption_i has very large values, consider rescaling  or recentering
            WARNING: n_deals_h has very large values, consider rescaling  or recentering
            WARNING: fit2 has very large values, consider rescaling  or recentering
            Number of regressors excluded to ensure that the estimates exist: 1
            Number of observations excluded: 0
            WARNING: The model appears to overfit some observations with cum_sizes_concludedb=0
            
            . test fit2=0
            
             ( 1)  fit2 = 0
            
                       chi2(  1) =    3.30
                     Prob > chi2 =    0.0692
            Hope this time it is sort of understandable. Thanks so much!

            All the best,
            Klaus

            Comment


            • #7
              Thanks, Klaus. Are you using the option "keep" for any special reason? Please run the regressions without this option, OK? Also, are variables like _EGEO_i_2 dummies?

              All the best,

              Joao

              Comment


              • #8
                Thanks for your patience, Joao! I am using the "keep" option because otherwise many observations are dropped (about 1500) . Is there any way to find out which ones? I have very few positive values and worried that dropping observations would mean I drop too many of these non-zero observations....

                The EGEO-variables are dummies (regional dummies), yes. Results don't change much without the keep options - still overfitting without the strict option (with the strict option it is fine), and low p-values in the RESET test (about 0.07).

                Best wishes,
                Klaus
                Last edited by Klaus Schmidt; 04 Jun 2016, 11:24.

                Comment


                • #9
                  Klaus,

                  The observations that are dropped are all equal to zero and have no information about the parameters of the model because they are perfectly predicted. So, please do not use "keep".

                  All the best,

                  Joao

                  Comment


                  • #10
                    Ok, fine, then I do it without keep - sorry, I wasn't aware of this!
                    But can I sue the strict option? And should I use the noconst with all dummy variables?

                    Thanks so much!!

                    Comment


                    • #11
                      Klaus,

                      Do not use the -strict- option either; just include all the dummies with or without constant.

                      All the best,

                      Joao

                      Comment


                      • #12
                        Thanks Joao, I have run the regressions without keep and strict and my results seem fine - the only remaining problem is the overfitting warning for some of the specifications. I assume I have to work on the specifications, then.

                        Thanks again and all the best,
                        Klaus

                        Comment


                        • #13
                          Dear Klaus,

                          Please post the results for one of the cases with the warning.

                          Best regards,

                          Joao

                          Comment


                          • #14
                            Thanks Joao, here it is (including the RESET tes):

                            Code:
                            ppml cum_sizes_concludedb $explan_var1 _D* _E*
                            
                            note: checking the existence of the estimates
                            WARNING: log_GDP_h has very large values, consider rescaling  or recentering
                            WARNING: log_GDP_i has very large values, consider rescaling  or recentering
                            WARNING: share_available_land_h has very large values, consider rescaling  or recentering
                            WARNING: share_available_land_i has very large values, consider rescaling  or recentering
                            WARNING: log_water_resources_h has very large values, consider rescaling  or recentering
                            WARNING: log_water_resources_i has very large values, consider rescaling  or recentering
                            WARNING: log_agric_productivity_i has very large values, consider rescaling  or recentering
                            WARNING: corruption_h has very large values, consider rescaling  or recentering
                            WARNING: corruption_i has very large values, consider rescaling  or recentering
                            WARNING: n_deals_h has very large values, consider rescaling  or recentering
                            
                            Number of regressors excluded to ensure that the estimates exist: 0
                            Number of observations excluded: 0
                            
                            note: _DGEO_h_5 omitted because of collinearity
                            note: _EGEO_i_1 omitted because of collinearity
                            
                            note: starting ppml estimation
                            note: cum_sizes_concludedb has noninteger values
                            
                            Iteration 1:   deviance =  276914.3
                            Iteration 2:   deviance =  153590.1
                            Iteration 3:   deviance =  121413.3
                            Iteration 4:   deviance =  114845.7
                            Iteration 5:   deviance =  114107.1
                            Iteration 6:   deviance =  114089.3
                            Iteration 7:   deviance =  114089.2
                            Iteration 8:   deviance =  114089.2
                            
                            Number of parameters: 29
                            Number of observations: 16940
                            Pseudo log-likelihood: -57832.251
                            R-squared: .19419177
                            Option strict is: off
                            WARNING: The model appears to overfit some observations with cum_sizes_concludedb=0
                            ------------------------------------------------------------------------------------------
                                                     |               Robust
                                cum_sizes_concludedb |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                            -------------------------+----------------------------------------------------------------
                                        log_distance |  -.9621258   .2204523    -4.36   0.000    -1.394204   -.5300471
                                     common_language |   1.373658   .3556246     3.86   0.000     .6766465    2.070669
                                           log_GDP_h |   .6187184   .1024241     6.04   0.000     .4179707     .819466
                                           log_GDP_i |   .5694416   .0714125     7.97   0.000     .4294755    .7094076
                                        log_GDP_pc_h |  -.6477244    .260485    -2.49   0.013    -1.158266   -.1371832
                                        log_GDP_pc_i |   .3404469   .1890304     1.80   0.072    -.0300458    .7109396
                              net_agric_imports_pc_h |  -3.072139   2.371134    -1.30   0.195    -7.719476    1.575199
                              net_agric_imports_pc_i |  -.4902605   .7200062    -0.68   0.496    -1.901447    .9209258
                                 log_agric_area_pc_h |   1.499046   .2665528     5.62   0.000     .9766118     2.02148
                                 log_agric_area_pc_i |   .2834167   .1867459     1.52   0.129    -.0825985    .6494318
                              share_available_land_h |  -.0434304   .0123642    -3.51   0.000    -.0676637   -.0191971
                              share_available_land_i |  -.0165234   .0080678    -2.05   0.041    -.0323359   -.0007109
                               log_water_resources_h |   .9564134   .1871224     5.11   0.000     .5896602    1.323167
                               log_water_resources_i |   .2367187   .1422818     1.66   0.096    -.0421485    .5155858
                            log_agric_productivity_h |   .0262156   .1456809     0.18   0.857    -.2593137     .311745
                            log_agric_productivity_i |  -.0219723   .1122832    -0.20   0.845    -.2420433    .1980987
                                        corruption_h |  -.0049471   .0083976    -0.59   0.556    -.0214061    .0115119
                                        corruption_i |   .0174745   .0099878     1.75   0.080    -.0021012    .0370502
                                           n_deals_h |   .0450622   .0055958     8.05   0.000     .0340946    .0560298
                                                 rta |    1.00482   .4719735     2.13   0.033      .079769    1.929871
                                           _DGEO_h_1 |    -.30352   .9718875    -0.31   0.755    -2.208384    1.601344
                                           _DGEO_h_2 |  -1.804123   .7840516    -2.30   0.021    -3.340835   -.2674096
                                           _DGEO_h_3 |  -3.925264   .9891404    -3.97   0.000    -5.863944   -1.986585
                                           _DGEO_h_4 |  -.8457818    .903183    -0.94   0.349    -2.615988    .9244245
                                           _EGEO_i_2 |  -.2960225   .6884029    -0.43   0.667    -1.645267    1.053223
                                           _EGEO_i_3 |   2.031076    .571698     3.55   0.000     .9105687    3.151583
                                           _EGEO_i_4 |  -.0746455   .5813259    -0.13   0.898    -1.214023    1.064732
                                           _EGEO_i_5 |  -2.756632   1.241112    -2.22   0.026    -5.189166   -.3240981
                                               _cons |  -35.01173   5.870927    -5.96   0.000    -46.51854   -23.50493
                            ------------------------------------------------------------------------------------------
                            
                            . outreg2 using "Results/ppml_compare", excel append
                            Results/ppml_compare.xml
                            dir : seeout
                            
                            .
                            .
                            . predict fit, xb
                            (13866 missing values generated)
                            
                            . gen fit2=fit^2 //Square the fitted values
                            (13866 missing values generated)
                            
                            . qui ppml cum_sizes_concludedb $explan_var1 _D* _E* fit2 
                            WARNING: log_GDP_h has very large values, consider rescaling  or recentering
                            WARNING: log_GDP_i has very large values, consider rescaling  or recentering
                            WARNING: share_available_land_h has very large values, consider rescaling  or recentering
                            WARNING: share_available_land_i has very large values, consider rescaling  or recentering
                            WARNING: log_water_resources_h has very large values, consider rescaling  or recentering
                            WARNING: log_water_resources_i has very large values, consider rescaling  or recentering
                            WARNING: log_agric_productivity_i has very large values, consider rescaling  or recentering
                            WARNING: corruption_h has very large values, consider rescaling  or recentering
                            WARNING: corruption_i has very large values, consider rescaling  or recentering
                            WARNING: n_deals_h has very large values, consider rescaling  or recentering
                            WARNING: fit2 has very large values, consider rescaling  or recentering
                            Number of regressors excluded to ensure that the estimates exist: 0
                            Number of observations excluded: 0
                            WARNING: The model appears to overfit some observations with cum_sizes_concludedb=0
                            
                            . test fit2=0 //Test the significance of the additional regressor (this is equivalent to a t-t
                            > est on fit2)
                            
                             ( 1)  fit2 = 0
                            
                                       chi2(  1) =    3.30
                                     Prob > chi2 =    0.0692
                            Thanks a million!
                            Klaus

                            Comment


                            • #15
                              It looks to me as if the base category of your dummies is a perfect predictor; are you generating the dummies using xi with the noomit option, and including all of them in the model? If you are doing that, then just run it without the constant.

                              All the best,

                              Joao

                              Comment

                              Working...
                              X