Gravity model with ppml command

Ainhoa Oses replied

05 Feb 2019, 08:52
It is a bilateral model. Country fixed effects are dummies for all the origin countries that migrate to the country I am analysing.

The struggle is that as soon as I add population variables, the corresponding coefficients of these demographic variables are really high and make the model explosive. This happens namely for the demographic variables that refer to the destination country that I'm analysing. I'm thinking there might be collinearity issues. I need to get an insight on this since it's really important for my research...Thanks again.

Best regards,

Ainhoa

Last edited by Ainhoa Oses; 05 Feb 2019, 09:52.
Leave a comment:
Joao Santos Silva replied

05 Feb 2019, 08:27
Sorry, what exactly do you mean by country FE? Are these origin and destination FE or pair FE?
Leave a comment:
Ainhoa Oses replied

05 Feb 2019, 04:18
Thanks a million for that, Joao. I've dropped the lag. Just an additional point related to the model getting explosive. My X variables include country FE, GDP per capita of origin and destination, and population structures by age groups at origin and destination. When I exclude the population structure part, the model looks very reasonable, including the predictions. However, when I include them, the model becomes really unstable. Would you think dropping these population structures could be justified? In a way, GDP per capita is partly feeding from population assumptions.

Best wishes,

Ainhoa
Leave a comment:
Joao Santos Silva replied

05 Feb 2019, 02:54
Dear Ainhoa,

If I understand it correctly, you are explaining the stock of migrants by the stock in the previous period. Because the stock of migrants is likely to vary slowly, you are essentially using something to explain itself. Also, I do not know what kind of fixed effects are using but these are likely to be very collinear with the lagged stock, and this may make the model very unstable.

Best wishes,

Joao
Leave a comment:
Ainhoa Oses replied

05 Feb 2019, 02:16
Dear Joao,

Could you give me some more insight on why you think the model is strange? If you could give me some advice on a specification that would make more sense, I would be really grateful.

Cheers,

Ainhoa
Leave a comment:
Joao Santos Silva replied

04 Feb 2019, 14:30
Dear Ainhoa,

I am afraid I have no suggestions, but I still think that your model is very strange and so I am not surprised by the strange results.

Best wishes,

Joao
Leave a comment:
Ainhoa Oses replied

04 Feb 2019, 08:58
Hi Joao,

Many thanks for all your help. Using the model I stated above (stocks of migrants as a function of their lag, plus other demographic/economic variables that I also included, and Fixed Effects), the predictions get really explosive in general. A small addition/subtraction of variables in the model result in non-sense (i.e., unrealisticly too high) predictions. Would you have any rationale for this? I noticed that both the constant and the country dummies get very high coefficients as compared to those for time-varying variables. I know the question is rather general, but there might be something obvious that I'm getting wrong.

Thanks again,

Ainhoa
Leave a comment:
Joao Santos Silva replied

31 Jan 2019, 12:47
Dear Ainhoa,

If you want to include the lag, it makes sense to log it. Myquestion is whether it makes sense to include thelag; I guess the answer depends on the purpose of the model.

Best wishes,

Joao
Leave a comment:
Ainhoa Oses replied

31 Jan 2019, 10:45
Hi Joao,

Just to confirm this specification is correct through ppml:

Code:

ppml Mig LOGMig(-1) DUM_COUNTRY*

Where Mig is the stock of Migrants from different origins at a given country for different periods of time. This depends on their lag and FE are included (DUM_COUNTRY*).
My only question is whether the lag of the dependent variable (LOGMig(-1)) should indeed be in LOGs or not.

Many thanks,

Ainhoa
Leave a comment:
Isabel Cour replied

01 May 2018, 18:03
Dear Joao,

First at all thanks for your help. Second, apologize if I was not enough clearly in my message.
Concern to your second comment, gdp exporter and importer were already rescaled, i did not understand why the WARNING message when the coefficient are already small, I was using log of millions of dollars.
I understood why these variables are dropped, perhaps in this case happens because I am using an small database with few countries, but the specification model is correct in this research question. You made a good point and thank ver much again for remember it. I will try to add other control variables.
Leave a comment:
Joao Santos Silva replied

01 May 2018, 13:53
Dear Isabel,

I am not sure to have understood all your questions, but here is my attempt to help:

1 - The fact that you do not have zeros does not make it OK to use OLS in logs; indeed, the zeros are just a very minor problem. Therefore, I expect that OLS and PPML results to be very different and, of course, the PPML results are much more reliable.

2 - You do not have to rescale the variables, but you can do it. For example, instead of using log of GDP in thousands of dollars, you can use log of GDP in millions of dollars. If the estimator converges, there is no need to worry about this.

3 - If you only have one importer, distance and exporter GDP will be collinear with the exporter fixed effects and need to drop; the same happens if you use OLS. There may be other variables being dropped by the same reason, again just like in OLS. You need to think carefully about what you are doing because you risk interpreting coefficients that are meaningless.

Best wishes,

Joao
Leave a comment:
Isabel Cour replied

01 May 2018, 05:21
Dear Joao,

I am running a pooled OLS, and I want to check robustness so I use PPML method. Total import or export flows do not have any zero but trade by sector level. When I use the total flow of import I got a WARNING to rescale lngdp, two independent variable lngdp_exporter(6 countries) lngdp_importer(1 country). how can I rescale such small coefficient?
I rescaled then I used again PPML, result shows distance, one dummy expo and one dummy import gdpimporter, and two dummy year are dropped. Despite is a control variables, gdp_importer is part of the research question as distance why are dropped?
Here, I copy a sample of the data,

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input double lnimpo float(lnintus lngdp1 lngdp2 lndist) byte(exporter_1 exporter_2 exporter_3 exporter_4 exporter_5 exporter_6 importer_1) 20.65069580078125 .5743148 26.37296 27.817717 9.857967 1 0 0 0 0 0 1 20.970928192138672 .9706464 26.31685 27.917883 9.857967 1 0 0 0 0 0 1 20.937944412231445 1.525122 25.34863 28.010767 9.857967 1 0 0 0 0 0 1 21.72722816467285 1.8245493 25.587696 28.13175 9.857967 1 0 0 0 0 0 1 21.903419494628906 1.9878744 25.934366 28.29461 9.857967 1 0 0 0 0 0 1 end

Please any suggestions is very welcome, thanks in advance. Kind Regards
Leave a comment:
Joao Santos Silva replied

08 Sep 2017, 15:06
Dear Felipe,

First of all, forget the FGLS estimation because that is simply inadequate.

About your model, I think you should use clustered standard errors. Also, your sample is rather small, but maybe you could try to include the usual "fixed effects".

Best wishes,

Joao
Leave a comment:
JFelipe PinedaG replied

08 Sep 2017, 14:46
Hi Joao , i wonder if you may help with some doubts that i have with an intra regional gravity model.

i have a panel with 4 periods and my dependent variable is the total kilograms trade.

This is the Stata do code and results.

ppml L_KL_TOTALES_deptos L_PIBtotal2016pr_origen L_PIBtotal2016pr_destino L_Distancia_geodésica L_remoteness_origen L_remoteness_destino frontera_pais_origen Zonas_francas_destino puert
> o_marítimo_destino puerto_marítimo_origen d_frontera_depto Zonas_francas_origen

note: checking the existence of the estimates
WARNING: Zonas_francas_destino has very large values, consider rescaling or recentering
WARNING: Zonas_francas_origen has very large values, consider rescaling or recentering

Number of regressors excluded to ensure that the estimates exist: 0
Number of observations excluded: 0

note: starting ppml estimation
note: L_KL_TOTALES_deptos has noninteger values

Iteration 1: deviance = 400.8257
Iteration 2: deviance = 400.2885
Iteration 3: deviance = 400.2885
Iteration 4: deviance = 400.2885

Number of parameters: 12
Number of observations: 2648
Pseudo log-likelihood: -6217.08
R-squared: .66772627
Option strict is: off
------------------------------------------------------------------------------------------
| Robust
L_KL_TOTALES_deptos | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------------------+----------------------------------------------------------------
L_PIBtotal2016pr_origen | .0666277 .0027448 24.27 0.000 .0612479 .0720074
L_PIBtotal2016pr_destino | .0717089 .0025906 27.68 0.000 .0666313 .0767865
L_Distancia_geodésica | -.0539089 .0044969 -11.99 0.000 -.0627226 -.0450952
L_remoteness_origen | -.0295255 .0104363 -2.83 0.005 -.0499803 -.0090707
L_remoteness_destino | .0401392 .0090682 4.43 0.000 .0223658 .0579126
frontera_pais_origen | .0257149 .0067418 3.81 0.000 .0125013 .0389285
Zonas_francas_destino | .0025479 .0004794 5.31 0.000 .0016083 .0034875
puerto_marítimo_destino | .0307702 .0074366 4.14 0.000 .0161947 .0453457
puerto_marítimo_origen | .0699515 .0074102 9.44 0.000 .0554277 .0844752
d_frontera_depto | .0413737 .0064782 6.39 0.000 .0286767 .0540707
Zonas_francas_origen | .0055721 .0004222 13.20 0.000 .0047445 .0063997
_cons | 1.539765 .1010227 15.24 0.000 1.341764 1.737766
------------------------------------------------------------------------------------------

RESET TEST

. predict u, xb

. gen u2 = u^2

. ppml L_KL_TOTALES_deptos L_PIBtotal2016pr_origen L_PIBtotal2016pr_destino L_Distancia_geodésica L_remoteness_origen L_remoteness_destino frontera_pais_origen Zonas_francas_destino puert
> o_marítimo_destino puerto_marítimo_origen d_frontera_depto Zonas_francas_origen u2

note: checking the existence of the estimates
WARNING: Zonas_francas_destino has very large values, consider rescaling or recentering
WARNING: Zonas_francas_origen has very large values, consider rescaling or recentering

Number of regressors excluded to ensure that the estimates exist: 0
Number of observations excluded: 0

note: starting ppml estimation
note: L_KL_TOTALES_deptos has noninteger values

Iteration 1: deviance = 392.748
Iteration 2: deviance = 391.5719
Iteration 3: deviance = 391.5718
Iteration 4: deviance = 391.5718

Number of parameters: 13
Number of observations: 2648
Pseudo log-likelihood: -6212.7217
R-squared: .67972867
Option strict is: off
------------------------------------------------------------------------------------------
| Robust
L_KL_TOTALES_deptos | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------------------+----------------------------------------------------------------
L_PIBtotal2016pr_origen | .2947494 .0355748 8.29 0.000 .225024 .3644748
L_PIBtotal2016pr_destino | .3167968 .0382707 8.28 0.000 .2417876 .3918059
L_Distancia_geodésica | -.2412124 .0296801 -8.13 0.000 -.2993844 -.1830404
L_remoteness_origen | -.1296204 .0187523 -6.91 0.000 -.1663742 -.0928666
L_remoteness_destino | .1754634 .0233051 7.53 0.000 .1297862 .2211406
frontera_pais_origen | .1134619 .0151337 7.50 0.000 .0838005 .1431233
Zonas_francas_destino | .0113557 .0013858 8.19 0.000 .0086395 .0140718
puerto_marítimo_destino | .1392252 .0177827 7.83 0.000 .1043718 .1740786
puerto_marítimo_origen | .3095705 .0379202 8.16 0.000 .2352483 .3838927
d_frontera_depto | .1859924 .0219579 8.47 0.000 .1429558 .229029
Zonas_francas_origen | .025091 .0029905 8.39 0.000 .0192297 .0309524
u2 | -.6285101 .0954756 -6.58 0.000 -.8156388 -.4413813
_cons | 2.184017 .128346 17.02 0.000 1.932464 2.435571
------------------------------------------------------------------------------------------

-----------------------------------------------------------------
RESULTS OF FGLS ESTIMATOR

Cross-sectional time-series FGLS regression

Coefficients: generalized least squares
Panels: heteroskedastic
Correlation: no autocorrelation

Estimated covariances = 662 Number of obs = 2,648
Estimated autocorrelations = 0 Number of groups = 662
Estimated coefficients = 12 Time periods = 4
Wald chi2(11) = 158041.37
Log likelihood = -3046.99 Prob > chi2 = 0.0000

------------------------------------------------------------------------------------------
L_KL_TOTALES_deptos | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------------------+----------------------------------------------------------------
L_PIBtotal2016pr_origen | .9365632 .0061036 153.45 0.000 .9246004 .9485259
L_PIBtotal2016pr_destino | 1.144922 .006901 165.91 0.000 1.131396 1.158448
L_Distancia_geodésica | -.9195397 .0123681 -74.35 0.000 -.9437806 -.8952987
L_remoteness_origen | -.5537694 .0335578 -16.50 0.000 -.6195415 -.4879974
L_remoteness_destino | 1.235415 .0236238 52.30 0.000 1.189113 1.281716
frontera_pais_origen | .4298494 .0238932 17.99 0.000 .3830196 .4766793
Zonas_francas_destino | .0321848 .0012136 26.52 0.000 .0298061 .0345635
puerto_marítimo_destino | .2281944 .0180926 12.61 0.000 .1927335 .2636553
puerto_marítimo_origen | 1.168645 .0224582 52.04 0.000 1.124627 1.212662
d_frontera_depto | .2722579 .0166337 16.37 0.000 .2396565 .3048593
Zonas_francas_origen | .0863585 .0006899 125.18 0.000 .0850063 .0877106
_cons | -4.642567 .3106725 -14.94 0.000 -5.251474 -4.033661
---------------------------------------------------------------------------------------

---------------------------------------------------------------------------------

Im worried about the fact that the RESET test its being rejected, should i use the FGLS estimator instead?. What do you think about the performance of that estimator?

Thank u very much.

Felipe
Leave a comment:
JJ vdB replied

22 Aug 2017, 03:31
Dear Joao and Dias,

Thank you for your helpful and quick advice, I appreciate it.

I estimated some small subsets of my dataset using the suggestion by Dias (poi2hdfe) and the ppml command, but the ppml command was faster so I am running now regressions with different subsets (where the subsets start with only 1 industry and the last subset contains all 14 industries for intermediate input trade). I did not run regressions using the ppml_panel_sg command because I believe with cross-sectional data I only need to include exporter and importer fixed effects.

Kind regards,

Joost.
Leave a comment:

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: