Dear Joao,
First at all thanks for your help. Second, apologize if I was not enough clearly in my message.
Concern to your second comment, gdp exporter and importer were already rescaled, i did not understand why the WARNING message when the coefficient are already small, I was using log of millions of dollars.
I understood why these variables are dropped, perhaps in this case happens because I am using an small database with few countries, but the specification model is correct in this research question. You made a good point and thank ver much again for remember it. I will try to add other control variables.
Announcement
Collapse
No announcement yet.
X

Dear Isabel,
I am not sure to have understood all your questions, but here is my attempt to help:
1  The fact that you do not have zeros does not make it OK to use OLS in logs; indeed, the zeros are just a very minor problem. Therefore, I expect that OLS and PPML results to be very different and, of course, the PPML results are much more reliable.
2  You do not have to rescale the variables, but you can do it. For example, instead of using log of GDP in thousands of dollars, you can use log of GDP in millions of dollars. If the estimator converges, there is no need to worry about this.
3  If you only have one importer, distance and exporter GDP will be collinear with the exporter fixed effects and need to drop; the same happens if you use OLS. There may be other variables being dropped by the same reason, again just like in OLS. You need to think carefully about what you are doing because you risk interpreting coefficients that are meaningless.
Best wishes,
Joao
Leave a comment:

Dear Joao,
I am running a pooled OLS, and I want to check robustness so I use PPML method. Total import or export flows do not have any zero but trade by sector level. When I use the total flow of import I got a WARNING to rescale lngdp, two independent variable lngdp_exporter(6 countries) lngdp_importer(1 country). how can I rescale such small coefficient?
I rescaled then I used again PPML, result shows distance, one dummy expo and one dummy import gdpimporter, and two dummy year are dropped. Despite is a control variables, gdp_importer is part of the research question as distance why are dropped?
Here, I copy a sample of the data,
Code:* Example generated by dataex. To install: ssc install dataex clear input double lnimpo float(lnintus lngdp1 lngdp2 lndist) byte(exporter_1 exporter_2 exporter_3 exporter_4 exporter_5 exporter_6 importer_1) 20.65069580078125 .5743148 26.37296 27.817717 9.857967 1 0 0 0 0 0 1 20.970928192138672 .9706464 26.31685 27.917883 9.857967 1 0 0 0 0 0 1 20.937944412231445 1.525122 25.34863 28.010767 9.857967 1 0 0 0 0 0 1 21.72722816467285 1.8245493 25.587696 28.13175 9.857967 1 0 0 0 0 0 1 21.903419494628906 1.9878744 25.934366 28.29461 9.857967 1 0 0 0 0 0 1 end
Leave a comment:

Dear Felipe,
First of all, forget the FGLS estimation because that is simply inadequate.
About your model, I think you should use clustered standard errors. Also, your sample is rather small, but maybe you could try to include the usual "fixed effects".
Best wishes,
Joao
Leave a comment:

Hi Joao , i wonder if you may help with some doubts that i have with an intra regional gravity model.
i have a panel with 4 periods and my dependent variable is the total kilograms trade.
This is the Stata do code and results.
ppml L_KL_TOTALES_deptos L_PIBtotal2016pr_origen L_PIBtotal2016pr_destino L_Distancia_geodésica L_remoteness_origen L_remoteness_destino frontera_pais_origen Zonas_francas_destino puert
> o_marítimo_destino puerto_marítimo_origen d_frontera_depto Zonas_francas_origen
note: checking the existence of the estimates
WARNING: Zonas_francas_destino has very large values, consider rescaling or recentering
WARNING: Zonas_francas_origen has very large values, consider rescaling or recentering
Number of regressors excluded to ensure that the estimates exist: 0
Number of observations excluded: 0
note: starting ppml estimation
note: L_KL_TOTALES_deptos has noninteger values
Iteration 1: deviance = 400.8257
Iteration 2: deviance = 400.2885
Iteration 3: deviance = 400.2885
Iteration 4: deviance = 400.2885
Number of parameters: 12
Number of observations: 2648
Pseudo loglikelihood: 6217.08
Rsquared: .66772627
Option strict is: off

 Robust
L_KL_TOTALES_deptos  Coef. Std. Err. z P>z [95% Conf. Interval]
+
L_PIBtotal2016pr_origen  .0666277 .0027448 24.27 0.000 .0612479 .0720074
L_PIBtotal2016pr_destino  .0717089 .0025906 27.68 0.000 .0666313 .0767865
L_Distancia_geodésica  .0539089 .0044969 11.99 0.000 .0627226 .0450952
L_remoteness_origen  .0295255 .0104363 2.83 0.005 .0499803 .0090707
L_remoteness_destino  .0401392 .0090682 4.43 0.000 .0223658 .0579126
frontera_pais_origen  .0257149 .0067418 3.81 0.000 .0125013 .0389285
Zonas_francas_destino  .0025479 .0004794 5.31 0.000 .0016083 .0034875
puerto_marítimo_destino  .0307702 .0074366 4.14 0.000 .0161947 .0453457
puerto_marítimo_origen  .0699515 .0074102 9.44 0.000 .0554277 .0844752
d_frontera_depto  .0413737 .0064782 6.39 0.000 .0286767 .0540707
Zonas_francas_origen  .0055721 .0004222 13.20 0.000 .0047445 .0063997
_cons  1.539765 .1010227 15.24 0.000 1.341764 1.737766

RESET TEST
. predict u, xb
. gen u2 = u^2
. ppml L_KL_TOTALES_deptos L_PIBtotal2016pr_origen L_PIBtotal2016pr_destino L_Distancia_geodésica L_remoteness_origen L_remoteness_destino frontera_pais_origen Zonas_francas_destino puert
> o_marítimo_destino puerto_marítimo_origen d_frontera_depto Zonas_francas_origen u2
note: checking the existence of the estimates
WARNING: Zonas_francas_destino has very large values, consider rescaling or recentering
WARNING: Zonas_francas_origen has very large values, consider rescaling or recentering
Number of regressors excluded to ensure that the estimates exist: 0
Number of observations excluded: 0
note: starting ppml estimation
note: L_KL_TOTALES_deptos has noninteger values
Iteration 1: deviance = 392.748
Iteration 2: deviance = 391.5719
Iteration 3: deviance = 391.5718
Iteration 4: deviance = 391.5718
Number of parameters: 13
Number of observations: 2648
Pseudo loglikelihood: 6212.7217
Rsquared: .67972867
Option strict is: off

 Robust
L_KL_TOTALES_deptos  Coef. Std. Err. z P>z [95% Conf. Interval]
+
L_PIBtotal2016pr_origen  .2947494 .0355748 8.29 0.000 .225024 .3644748
L_PIBtotal2016pr_destino  .3167968 .0382707 8.28 0.000 .2417876 .3918059
L_Distancia_geodésica  .2412124 .0296801 8.13 0.000 .2993844 .1830404
L_remoteness_origen  .1296204 .0187523 6.91 0.000 .1663742 .0928666
L_remoteness_destino  .1754634 .0233051 7.53 0.000 .1297862 .2211406
frontera_pais_origen  .1134619 .0151337 7.50 0.000 .0838005 .1431233
Zonas_francas_destino  .0113557 .0013858 8.19 0.000 .0086395 .0140718
puerto_marítimo_destino  .1392252 .0177827 7.83 0.000 .1043718 .1740786
puerto_marítimo_origen  .3095705 .0379202 8.16 0.000 .2352483 .3838927
d_frontera_depto  .1859924 .0219579 8.47 0.000 .1429558 .229029
Zonas_francas_origen  .025091 .0029905 8.39 0.000 .0192297 .0309524
u2  .6285101 .0954756 6.58 0.000 .8156388 .4413813
_cons  2.184017 .128346 17.02 0.000 1.932464 2.435571


RESULTS OF FGLS ESTIMATOR
Crosssectional timeseries FGLS regression
Coefficients: generalized least squares
Panels: heteroskedastic
Correlation: no autocorrelation
Estimated covariances = 662 Number of obs = 2,648
Estimated autocorrelations = 0 Number of groups = 662
Estimated coefficients = 12 Time periods = 4
Wald chi2(11) = 158041.37
Log likelihood = 3046.99 Prob > chi2 = 0.0000

L_KL_TOTALES_deptos  Coef. Std. Err. z P>z [95% Conf. Interval]
+
L_PIBtotal2016pr_origen  .9365632 .0061036 153.45 0.000 .9246004 .9485259
L_PIBtotal2016pr_destino  1.144922 .006901 165.91 0.000 1.131396 1.158448
L_Distancia_geodésica  .9195397 .0123681 74.35 0.000 .9437806 .8952987
L_remoteness_origen  .5537694 .0335578 16.50 0.000 .6195415 .4879974
L_remoteness_destino  1.235415 .0236238 52.30 0.000 1.189113 1.281716
frontera_pais_origen  .4298494 .0238932 17.99 0.000 .3830196 .4766793
Zonas_francas_destino  .0321848 .0012136 26.52 0.000 .0298061 .0345635
puerto_marítimo_destino  .2281944 .0180926 12.61 0.000 .1927335 .2636553
puerto_marítimo_origen  1.168645 .0224582 52.04 0.000 1.124627 1.212662
d_frontera_depto  .2722579 .0166337 16.37 0.000 .2396565 .3048593
Zonas_francas_origen  .0863585 .0006899 125.18 0.000 .0850063 .0877106
_cons  4.642567 .3106725 14.94 0.000 5.251474 4.033661


Im worried about the fact that the RESET test its being rejected, should i use the FGLS estimator instead?. What do you think about the performance of that estimator?
Thank u very much.
Felipe
Leave a comment:

Dear Joao and Dias,
Thank you for your helpful and quick advice, I appreciate it.
I estimated some small subsets of my dataset using the suggestion by Dias (poi2hdfe) and the ppml command, but the ppml command was faster so I am running now regressions with different subsets (where the subsets start with only 1 industry and the last subset contains all 14 industries for intermediate input trade). I did not run regressions using the ppml_panel_sg command because I believe with crosssectional data I only need to include exporter and importer fixed effects.
Kind regards,
Joost.
Leave a comment:

Dear Joost,
Just would like to add small thing to Mr Joao's excellent advice.
ppml_panel_sg does not allow only importer and exporter fixed effects in the model. The smallest fixed effects it can do is importertime and exportertime. This will drop all of your timevariant variables, including output.
But you can use another command written for the same purpose, poi2hdfe, as mentioned in the Log of Gravity webpage. Type ssc install poi2hdfe.
Best,
Dias
Leave a comment:

Dear Joost,
ppml will struggle to deal with such massive number of dummies; you will need a very fast processor and a lot of memory to be able to do it, assuming that you do not go beyond Stata's limits. For these cases I suggest you try ppml_panel_sg (avaliable form SSC), which should be much faster and also checks for the existence of the estimates. I recommend that you start with a small data set to make sure you get the same results with both commands.
About the problem with the OLS results, I prefer not to comment on that because the results are not reliable anyway.
Best wishes,
Joao
Leave a comment:

Dear Mr Santos Silva,
I have two questions about the PPML estimator and STATA. I make use of an Input Output table with trade data for the European Union at NUTSII level (249 regions), 14 different industries, 5 different final demand categories (about 14 million observations). My aim is to estimate the border effect for the whole dataset, intermediate input trade and final goods trade. My preferred estimation method is PPML. Furthermore, as a robustness check I run OLS and GPML. For all estimations I include origin and destination fixed effects. The commands for the whole dataset are:
tab(Exporting), gen(Exporting_)
tab(Importing), gen(Importing_)
reg lnTrade lnGDP_EX lnGDP_IM lnDistance_Head Home Exporting_* Importing_*, robust cluster(Distance_Head)
reg lnTrade lnGO_EX lnGO_IM lnDistance_Head Home Exporting_* Importing_*, robust cluster(Distance_Head)
ppml Trade lnGDP_EX lnGDP_IM lnDistance_Head Home Exporting_* Importing_*, robust cluster(lnDistance_Head)
ppml Trade lnGO_EX lnGO_IM lnDistance_Head Home Exporting_* Importing_*, robust cluster(lnDistance_Head)
glm Trade lnGDP_EX lnGDP_IM lnDistance_Head Home Exporting_* Importing_*, family(poisson) link(log) robust cluster(Distance_Head)
glm Trade lnGO_EX lnGO_IM lnDistance_Head Home Exporting_* Importing_*, family(poisson) link(log) robust cluster(Distance_Head)
When I run OLS with fixed effects I get the results within about an hour / two hours. However, when running the PPML estimation STATA only says "note: checking the existence of the estimates" but after many hours I haven't received any further output. When running PPML with a smaller subset (about 59 thousand observations) I also do not receive any output. Estimating PPML without fixed effects does provide me with results (after a few hours).
My questions are: [1] do you have any experience with estimating PPML (including origin and destination fixed effects) for such a large database and can I expect any results within a reasonable amount of time? Or should I search for a computer with more mathematical power? [2] In most cases I receive results in line with earlier studies and as expected. However, when I run OLS with fixed effects for a subset of the database (intermediate input trade) the mass variable for the exporting region is insignificant. The exporting mass variable is measured as gross sales for the whole region. When I run the same estimation but measuring the exporting mass variable as gross sales per region and industry the coefficient is significant and in line with my expectations. Do you have any suggestion for an explanation? I randomly checked several observations in STATA and they are all fine and the same as prior to importing.
Kind Regards,
Joost.
Leave a comment:

Dear Flora,
I think that you can use something likeCode:predict yhat if e(sample)==1
Joao
Leave a comment:

Dear Joao,
I have written you a few comments and month before regarding a research I was doing then. I am now developing the paper. In short I am studying the effect of governance indicators on inward FDI in LatinAmerika, for 12 years, 18 target and 29 source countries. I used gravity model with PPML and pair and year fixed effect, my results are consistent and accepted by the RESET test.
After estimating the model it excluded 2460 observations out of 6217. (I am aware of the problem of the nonexistence ML estimates using PPML.) I got a couple of questions regarding my results, just to make sure that I could understand well the process. My problem is that when I used the command „predict yhat”, I got the predicted values for each observation, even for the ones that have been previously excluded. Does the command „predict yhat” apply the fitted model to calculate the predicted values?
Is there any conveniente tool (command) to identify the observations that have been excluded?
Thank you!
Regards,
Flóra
Leave a comment:

PPML does not work with negative numbers, but neither does OLS in logs because you cannot take logs of negative numbers. Sorry, I cannot help much here.
Best wishes,
Joao
Leave a comment:

Dear Joao,
Thank you. I am trying to run the PPML but having a problem because some of the FDI figures are negative. How can I solve this problem?
Regards,
Muhammad Moiz
Leave a comment:

Dear Muhammad,
The problem with the OLS estimation of the log model is not the zeros but the fact that the nonlinear transformation generally leads to an inconsistent estimator. So, I would still recommend PPML, and it is as easy to use as OLS.
Best wishes,
Joao
 1 like
Leave a comment:
Leave a comment: