Gravity model with ppml command

Said Jafar replied

18 Aug 2017, 01:42
Dear Joost,

Just would like to add small thing to Mr Joao's excellent advice.

ppml_panel_sg does not allow only importer and exporter fixed effects in the model. The smallest fixed effects it can do is importer-time and exporter-time. This will drop all of your time-variant variables, including output.

But you can use another command written for the same purpose, poi2hdfe, as mentioned in the Log of Gravity webpage. Type ssc install poi2hdfe.

Best,
Dias
Leave a comment:
Joao Santos Silva replied

17 Aug 2017, 13:07
Dear Joost,

ppml will struggle to deal with such massive number of dummies; you will need a very fast processor and a lot of memory to be able to do it, assuming that you do not go beyond Stata's limits. For these cases I suggest you try ppml_panel_sg (avaliable form SSC), which should be much faster and also checks for the existence of the estimates. I recommend that you start with a small data set to make sure you get the same results with both commands.

About the problem with the OLS results, I prefer not to comment on that because the results are not reliable anyway.

Best wishes,

Joao
Leave a comment:
JJ vdB replied

17 Aug 2017, 11:49
Dear Mr Santos Silva,

I have two questions about the PPML estimator and STATA. I make use of an Input Output table with trade data for the European Union at NUTSII level (249 regions), 14 different industries, 5 different final demand categories (about 14 million observations). My aim is to estimate the border effect for the whole dataset, intermediate input trade and final goods trade. My preferred estimation method is PPML. Furthermore, as a robustness check I run OLS and GPML. For all estimations I include origin and destination fixed effects. The commands for the whole dataset are:

tab(Exporting), gen(Exporting_)
tab(Importing), gen(Importing_)

reg lnTrade lnGDP_EX lnGDP_IM lnDistance_Head Home Exporting_* Importing_*, robust cluster(Distance_Head)
reg lnTrade lnGO_EX lnGO_IM lnDistance_Head Home Exporting_* Importing_*, robust cluster(Distance_Head)
ppml Trade lnGDP_EX lnGDP_IM lnDistance_Head Home Exporting_* Importing_*, robust cluster(lnDistance_Head)
ppml Trade lnGO_EX lnGO_IM lnDistance_Head Home Exporting_* Importing_*, robust cluster(lnDistance_Head)
glm Trade lnGDP_EX lnGDP_IM lnDistance_Head Home Exporting_* Importing_*, family(poisson) link(log) robust cluster(Distance_Head)
glm Trade lnGO_EX lnGO_IM lnDistance_Head Home Exporting_* Importing_*, family(poisson) link(log) robust cluster(Distance_Head)

When I run OLS with fixed effects I get the results within about an hour / two hours. However, when running the PPML estimation STATA only says "note: checking the existence of the estimates" but after many hours I haven't received any further output. When running PPML with a smaller subset (about 59 thousand observations) I also do not receive any output. Estimating PPML without fixed effects does provide me with results (after a few hours).

My questions are: [1] do you have any experience with estimating PPML (including origin and destination fixed effects) for such a large database and can I expect any results within a reasonable amount of time? Or should I search for a computer with more mathematical power? [2] In most cases I receive results in line with earlier studies and as expected. However, when I run OLS with fixed effects for a subset of the database (intermediate input trade) the mass variable for the exporting region is insignificant. The exporting mass variable is measured as gross sales for the whole region. When I run the same estimation but measuring the exporting mass variable as gross sales per region and industry the coefficient is significant and in line with my expectations. Do you have any suggestion for an explanation? I randomly checked several observations in STATA and they are all fine and the same as prior to importing.

Kind Regards,

Joost.
Leave a comment:
Joao Santos Silva replied

03 Aug 2017, 15:32
Dear Flora,

I think that you can use something like

Code:

predict yhat if e(sample)==1

Best wishes,

Joao
Leave a comment:
Flora Panna Biro replied

03 Aug 2017, 03:33
Dear Joao,

I have written you a few comments and month before regarding a research I was doing then. I am now developing the paper. In short I am studying the effect of governance indicators on inward FDI in Latin-Amerika, for 12 years, 18 target and 29 source countries. I used gravity model with PPML and pair and year fixed effect, my results are consistent and accepted by the RESET test.

After estimating the model it excluded 2460 observations out of 6217. (I am aware of the problem of the non-existence ML estimates using PPML.) I got a couple of questions regarding my results, just to make sure that I could understand well the process. My problem is that when I used the command „predict yhat”, I got the predicted values for each observation, even for the ones that have been previously excluded. Does the command „predict yhat” apply the fitted model to calculate the predicted values?
Is there any conveniente tool (command) to identify the observations that have been excluded?

Thank you!

Regards,
Flóra
Leave a comment:
Muhammad Moiz replied

14 Jul 2017, 13:30
Thank you.

Regards,
Muhammad Moiz
Leave a comment:
Joao Santos Silva replied

14 Jul 2017, 12:17
PPML does not work with negative numbers, but neither does OLS in logs because you cannot take logs of negative numbers. Sorry, I cannot help much here.

Best wishes,

Joao
Leave a comment:
Muhammad Moiz replied

14 Jul 2017, 02:17
Dear Joao,

Thank you. I am trying to run the PPML but having a problem because some of the FDI figures are negative. How can I solve this problem?

Regards,
Muhammad Moiz
Leave a comment:
Joao Santos Silva replied

13 Jul 2017, 13:00
Dear Muhammad,

The problem with the OLS estimation of the log model is not the zeros but the fact that the non-linear transformation generally leads to an inconsistent estimator. So, I would still recommend PPML, and it is as easy to use as OLS.

Best wishes,

Joao
1 like
Leave a comment:
Muhammad Moiz replied

13 Jul 2017, 04:55
Dear Joao,

I am doing my masters in International Trade. I am testing the impact of RTAs on FDI to emerging economies. I gathered data for 4 host and 69 source countries over a 12 year period, I chose the BRICS countries except Russia for which I couldn't find data. I have used the normal Fixed Effects model since there aren't many missing figures. I know you have suggested using the PPML if there are a large number of zeros. Is it okay to estimate the results with just OLS?

Thank you.
Leave a comment:
Dilshat Obul replied

21 Jun 2017, 22:54
Dear Joao,

Thank you for the answer, it is a great help to me!

Regards,

Dilshat
Leave a comment:
Joao Santos Silva replied

21 Jun 2017, 12:54
Dear Dilshat,

I am afraid I cannot conceive any reason to prefer the truncated Poisson regression.

Best wishes,

Joao
Leave a comment:
Dilshat Obul replied

21 Jun 2017, 04:06
Dear Joao,

Thank you for your attention. I am doing my research on interprovincial trade using gravity model and want to use the estimating method PPML (you recommend). Recently I saw a paper on world bank's webside on the same topic. The name of paper is "Estimating the Gravity Model When Zero Trade Flows are Frequent and Economically Determined". in this paper they sad turncated PPML and PPML is similar, sometimes turncated version's performance is better. should we consider the turncated version of PPML? and what is the exact stata code for turncated PPML?

Best Regrads

Dilshat

Last edited by Dilshat Obul; 21 Jun 2017, 04:09.
Leave a comment:
Joao Santos Silva replied

01 May 2017, 09:52
Dear Killian,

Just use the -xi- or -tab- commands to create the variables. For example, if you create the variables with the prefix D_ you can then just include D_* in the regression.

Best wishes,

Joao
Leave a comment:
Killian Foubert replied

01 May 2017, 09:04
Originally posted by Joao Santos Silva View Post

Dear Adam,

Unfortunately, -ppml- is not compatible with factor variables. What you have to do is to create all the dummies yourself (for example using the -xi- command) and then run the model including the variables you created.

Hope this helps,

Joao

Dear Joao,

Thank you for your answers about this subject. I am facing exactly the same problem, and i was wondering how it was possible to include all the variables created by the FE process without typing their name one by one (I got year, country origin, country destination*year, and country origin*countrydest for my model, so almost 6.500 variables after FE created).

Thank you in advance,

Killian
Leave a comment:

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: