Dear all,
I've been struggling with a ppml-application for a long while. Now it seems to give quite good results - but I am a little confused by all the warnings on "large-values" and "overfitting".
Also, my RESET test fails, so can I trust these results at all? I copy my output below.
What about dummies, should I try to avoid them?
As a robustness check I run a simple OLS (logged dependent variable) - and some of the results really change. So I assume something is not quit ecorrect, right?
Thanks so much for your help!
Klaus
-----------------------------------------------------------------------------------------------------------------------
ppml cum_sizes_concludedb $explan_var1, keep
note: checking the existence of the estimates
WARNING: log_GDP_h has very large values, consider rescaling or recentering
WARNING: log_GDP_i has very large values, consider rescaling or recentering
(many more warnings)
Number of regressors excluded to ensure that the estimates exist: 0
Number of observations excluded: 0
note: starting ppml estimation
note: cum_sizes_concludedb has noninteger values
Iteration 1: deviance = 252057.3
Iteration 2: deviance = 141818
Iteration 3: deviance = 113417.2
Iteration 4: deviance = 107601.2
Iteration 5: deviance = 106889.8
Iteration 6: deviance = 106856.1
Iteration 7: deviance = 106855.3
Iteration 8: deviance = 106855.3
Iteration 9: deviance = 106855.3
Number of parameters: 29
Number of observations: 16940
Pseudo log-likelihood: -54214.637
R-squared: .19015636
Option strict is: off
WARNING: The model appears to overfit some observations with cum_sizes_concludedb=0
-------------------------------------------------------------------------------------
| Robust
cum_sizes_conclu~db | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------------+----------------------------------------------------------------
log_distance | -.9528056 .1971872 -4.83 0.000 -1.339285 -.5663257
common_language | 1.455636 .3174794 4.58 0.000 .8333883 2.077884
log_GDP_h | .6108688 .0999441 6.11 0.000 .414982 .8067556
log_GDP_i | .5874121 .0686404 8.56 0.000 .4528793 .7219449
log_GDP_pc_h | -.6919959 .2293051 -3.02 0.003 -1.141426 -.2425662
log_GDP_pc_i | .2883472 .1847046 1.56 0.118 -.0736673 .6503616
net_agric_imports~h | -3.315186 2.274837 -1.46 0.145 -7.773785 1.143414
net_agric_imports~i | -.3133903 .723114 -0.43 0.665 -1.730668 1.103887
log_agric_area_pc_h | 1.388383 .275652 5.04 0.000 .8481147 1.928651
log_agric_area_pc_i | .277707 .1605227 1.73 0.084 -.0369118 .5923258
share_available_l~h | -.0401704 .0126402 -3.18 0.001 -.0649447 -.0153961
share_available_l~i | -.0149193 .0077438 -1.93 0.054 -.0300969 .0002583
log_water_resourc~h | .9159727 .1532531 5.98 0.000 .6156022 1.216343
log_water_resourc~i | .198225 .1343857 1.48 0.140 -.0651662 .4616162
log_agric_product~h | .0325299 .1453248 0.22 0.823 -.2523014 .3173612
log_agric_product~i | .011202 .1072409 0.10 0.917 -.1989862 .2213903
corruption_h | -.0028488 .0084842 -0.34 0.737 -.0194776 .01378
corruption_i | .0162642 .0096734 1.68 0.093 -.0026954 .0352237
n_deals_h | .0446845 .0057439 7.78 0.000 .0334265 .0559424
rta | 1.01743 .4402138 2.31 0.021 .1546272 1.880233
Africa | -.1126869 .3363213 -0.34 0.738 -.7718645 .5464907
Asia | -3.619532 .7332476 -4.94 0.000 -5.056671 -2.182394
America | -1.455312 .5510493 -2.64 0.008 -2.535348 -.3752749
Europe | -.4128112 .4677604 -0.88 0.377 -1.329605 .5039823
America_i | 2.13523 .6506217 3.28 0.001 .8600344 3.410425
Asia_i | 4.363523 .9203239 4.74 0.000 2.559722 6.167325
Africa_i | 2.683462 .8665773 3.10 0.002 .9850013 4.381922
Europe_i | 2.383969 .7021059 3.40 0.001 1.007867 3.760071
_cons | -36.71535 4.897823 -7.50 0.000 -46.3149 -27.11579
-------------------------------------------------------------------------------------
. outreg2 using "Results/ppml_compare", excel append
Results/ppml_compare.xml
dir : seeout
.
.
. predict fit, xb //Get fitted values - RESET test according to http://privatewww.esse
> x.ac.uk/~jmcss/reset.do
(13866 missing values generated)
. gen fit2=fit^2 //Square the fitted values
(13866 missing values generated)
. qui ppml cum_sizes_concludedb $explan_var1 fit2, keep // Estimate the model with th
> e additional regressor
WARNING: log_GDP_h has very large values, consider rescaling or recentering
WARNING: log_GDP_i has very large values, consider rescaling or recentering
WARNING: share_available_land_h has very large values, consider rescaling or recenter
> ing
WARNING: share_available_land_i has very large values, consider rescaling or recenter
> ing
WARNING: log_water_resources_h has very large values, consider rescaling or recenteri
> ng
WARNING: log_water_resources_i has very large values, consider rescaling or recenteri
> ng
WARNING: log_agric_productivity_i has very large values, consider rescaling or recent
> ering
WARNING: corruption_h has very large values, consider rescaling or recentering
WARNING: corruption_i has very large values, consider rescaling or recentering
WARNING: n_deals_h has very large values, consider rescaling or recentering
WARNING: fit2 has very large values, consider rescaling or recentering
Number of regressors excluded to ensure that the estimates exist: 0
Number of observations excluded: 0
WARNING: The model appears to overfit some observations with cum_sizes_concludedb=0
. test fit2=0 //Test the significance of the additional regressor (this is equivalent
> to a t-test on fit2)
( 1) fit2 = 0
chi2( 1) = 5.06
Prob > chi2 = 0.0245
I've been struggling with a ppml-application for a long while. Now it seems to give quite good results - but I am a little confused by all the warnings on "large-values" and "overfitting".
Also, my RESET test fails, so can I trust these results at all? I copy my output below.
What about dummies, should I try to avoid them?
As a robustness check I run a simple OLS (logged dependent variable) - and some of the results really change. So I assume something is not quit ecorrect, right?
Thanks so much for your help!
Klaus
-----------------------------------------------------------------------------------------------------------------------
ppml cum_sizes_concludedb $explan_var1, keep
note: checking the existence of the estimates
WARNING: log_GDP_h has very large values, consider rescaling or recentering
WARNING: log_GDP_i has very large values, consider rescaling or recentering
(many more warnings)
Number of regressors excluded to ensure that the estimates exist: 0
Number of observations excluded: 0
note: starting ppml estimation
note: cum_sizes_concludedb has noninteger values
Iteration 1: deviance = 252057.3
Iteration 2: deviance = 141818
Iteration 3: deviance = 113417.2
Iteration 4: deviance = 107601.2
Iteration 5: deviance = 106889.8
Iteration 6: deviance = 106856.1
Iteration 7: deviance = 106855.3
Iteration 8: deviance = 106855.3
Iteration 9: deviance = 106855.3
Number of parameters: 29
Number of observations: 16940
Pseudo log-likelihood: -54214.637
R-squared: .19015636
Option strict is: off
WARNING: The model appears to overfit some observations with cum_sizes_concludedb=0
-------------------------------------------------------------------------------------
| Robust
cum_sizes_conclu~db | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------------+----------------------------------------------------------------
log_distance | -.9528056 .1971872 -4.83 0.000 -1.339285 -.5663257
common_language | 1.455636 .3174794 4.58 0.000 .8333883 2.077884
log_GDP_h | .6108688 .0999441 6.11 0.000 .414982 .8067556
log_GDP_i | .5874121 .0686404 8.56 0.000 .4528793 .7219449
log_GDP_pc_h | -.6919959 .2293051 -3.02 0.003 -1.141426 -.2425662
log_GDP_pc_i | .2883472 .1847046 1.56 0.118 -.0736673 .6503616
net_agric_imports~h | -3.315186 2.274837 -1.46 0.145 -7.773785 1.143414
net_agric_imports~i | -.3133903 .723114 -0.43 0.665 -1.730668 1.103887
log_agric_area_pc_h | 1.388383 .275652 5.04 0.000 .8481147 1.928651
log_agric_area_pc_i | .277707 .1605227 1.73 0.084 -.0369118 .5923258
share_available_l~h | -.0401704 .0126402 -3.18 0.001 -.0649447 -.0153961
share_available_l~i | -.0149193 .0077438 -1.93 0.054 -.0300969 .0002583
log_water_resourc~h | .9159727 .1532531 5.98 0.000 .6156022 1.216343
log_water_resourc~i | .198225 .1343857 1.48 0.140 -.0651662 .4616162
log_agric_product~h | .0325299 .1453248 0.22 0.823 -.2523014 .3173612
log_agric_product~i | .011202 .1072409 0.10 0.917 -.1989862 .2213903
corruption_h | -.0028488 .0084842 -0.34 0.737 -.0194776 .01378
corruption_i | .0162642 .0096734 1.68 0.093 -.0026954 .0352237
n_deals_h | .0446845 .0057439 7.78 0.000 .0334265 .0559424
rta | 1.01743 .4402138 2.31 0.021 .1546272 1.880233
Africa | -.1126869 .3363213 -0.34 0.738 -.7718645 .5464907
Asia | -3.619532 .7332476 -4.94 0.000 -5.056671 -2.182394
America | -1.455312 .5510493 -2.64 0.008 -2.535348 -.3752749
Europe | -.4128112 .4677604 -0.88 0.377 -1.329605 .5039823
America_i | 2.13523 .6506217 3.28 0.001 .8600344 3.410425
Asia_i | 4.363523 .9203239 4.74 0.000 2.559722 6.167325
Africa_i | 2.683462 .8665773 3.10 0.002 .9850013 4.381922
Europe_i | 2.383969 .7021059 3.40 0.001 1.007867 3.760071
_cons | -36.71535 4.897823 -7.50 0.000 -46.3149 -27.11579
-------------------------------------------------------------------------------------
. outreg2 using "Results/ppml_compare", excel append
Results/ppml_compare.xml
dir : seeout
.
.
. predict fit, xb //Get fitted values - RESET test according to http://privatewww.esse
> x.ac.uk/~jmcss/reset.do
(13866 missing values generated)
. gen fit2=fit^2 //Square the fitted values
(13866 missing values generated)
. qui ppml cum_sizes_concludedb $explan_var1 fit2, keep // Estimate the model with th
> e additional regressor
WARNING: log_GDP_h has very large values, consider rescaling or recentering
WARNING: log_GDP_i has very large values, consider rescaling or recentering
WARNING: share_available_land_h has very large values, consider rescaling or recenter
> ing
WARNING: share_available_land_i has very large values, consider rescaling or recenter
> ing
WARNING: log_water_resources_h has very large values, consider rescaling or recenteri
> ng
WARNING: log_water_resources_i has very large values, consider rescaling or recenteri
> ng
WARNING: log_agric_productivity_i has very large values, consider rescaling or recent
> ering
WARNING: corruption_h has very large values, consider rescaling or recentering
WARNING: corruption_i has very large values, consider rescaling or recentering
WARNING: n_deals_h has very large values, consider rescaling or recentering
WARNING: fit2 has very large values, consider rescaling or recentering
Number of regressors excluded to ensure that the estimates exist: 0
Number of observations excluded: 0
WARNING: The model appears to overfit some observations with cum_sizes_concludedb=0
. test fit2=0 //Test the significance of the additional regressor (this is equivalent
> to a t-test on fit2)
( 1) fit2 = 0
chi2( 1) = 5.06
Prob > chi2 = 0.0245
Comment