Estimating the gravity model using PPLM, general advice

Megan Ward

Join Date: Nov 2018

Posts: 17
#1

Estimating the gravity model using PPLM, general advice

26 Nov 2018, 08:31

Hi,

For my thesis I am intending to estimate a gravity model, looking at specifically the effect of the SAFTA trade agreement on trade in agri-food products.I would like to be able to estimate the level of trade creation and trade diversion occuring. I have a few specific questions I am hoping can be answered on this forum, and also looking for any advice you may have in regard to this study.

Some questions...
1. The dataset I have formed only contains import data, would it be meaningful to do the same study using export data, can i expect the results to be different? it took quite a while to create the import data set so would like to know before I spend time creating on for export data...

2. I first started with a simplified estimation...

. ppml foodimports ln_dist ln_gdp1 ln_gdp2

note: checking the existence of the estimates
WARNING: foodimports has very large values, consider rescaling
WARNING: ln_dist has very large values, consider rescaling or recentering
WARNING: ln_gdp1 has very large values, consider rescaling or recentering
WARNING: ln_gdp2 has very large values, consider rescaling or recentering

Number of regressors excluded to ensure that the estimates exist: 0
Number of observations excluded: 0

note: starting ppml estimation
note: foodimports has noninteger values

Iteration 1: deviance = 1.32e+11
Iteration 2: deviance = 1.10e+11
Iteration 3: deviance = 1.07e+11
Iteration 4: deviance = 1.07e+11
Iteration 5: deviance = 1.07e+11
Iteration 6: deviance = 1.07e+11
Iteration 7: deviance = 1.07e+11
Iteration 8: deviance = 1.07e+11
Iteration 9: deviance = 1.07e+11
Iteration 10: deviance = 1.07e+11

Number of parameters: 4
Number of observations: 418761
Pseudo log-likelihood: -5.361e+10
R-squared: 6.988e-08
Option strict is: off

Robust
foodimports Coef. Std. Err. z P>z [95% Conf. Interval]

ln_dist -1.85e-20 7.07e-23 -261.78 0.000 -1.87e-20 -1.84e-20
ln_gdp1 7.27e-07 1.47e-06 0.49 0.621 -2.15e-06 3.61e-06
ln_gdp2 -2.34e-20 5.44e-23 -430.94 0.000 -2.35e-20 -2.33e-20
_cons 10.71151 .0123599 866.64 0.000 10.68729 10.73574

magnitude of coefficients are very small as is the R squared, i assume this is because i havent used fixed effects, and/or many variables.
when i do include some more variables, for example a couple of different trade agreement dummies, the level of iterations goes behind 50, at which point I cancel the estimation because it is taking too long. Is this usual and i should be patient, or is something wrong with my estimation?

ppml foodimports ln_dist ln_gdp1 ln_gdp2 comesa nafta safta

note: checking the existence of the estimates
WARNING: foodimports has very large values, consider rescaling
WARNING: ln_dist has very large values, consider rescaling or recentering
WARNING: ln_gdp1 has very large values, consider rescaling or recentering
WARNING: ln_gdp2 has very large values, consider rescaling or recentering
WARNING: comesa has very large values, consider rescaling or recentering
WARNING: nafta has very large values, consider rescaling or recentering
WARNING: safta has very large values, consider rescaling or recentering

Number of regressors excluded to ensure that the estimates exist: 0
Number of observations excluded: 0

note: starting ppml estimation
note: foodimports has noninteger values

Iteration 1: deviance = 1.02e+36
Iteration 2: deviance = 3.75e+35
Iteration 3: deviance = 1.38e+35
Iteration 4: deviance = 5.08e+34
Iteration 5: deviance = 1.87e+34
Iteration 6: deviance = 6.88e+33
CONTINUES FOREVER....

3. I would also like to control for country fixed effects, using the code:

. egen exporter = group (iso_o)
(32735 missing values generated)

. egen importer = group (iso_d)
(77313 missing values generated)

. ppml foodimports ln_dist ln_gdp1 ln_gdp2 i.exporter i.importer
factor variables and time-series operators not allowed
r(101);

I am unsure why this occurs.

4. which method is best to control for country fixed effects, and is it necessary to include time fixed effects, and country pair fixed effects? i am using a panel data set

5. please suggest to me any other advice on things I should consider in order to find some reliable results in my study!

Thank you very much in advance, Any help is greatly appreciated
Megan
Tags: None
Megan Ward

Join Date: Nov 2018

Posts: 17
#2

26 Nov 2018, 08:55

6. Another question I have is how to make the trade diversion variable, if im using import data, does the dummy variable take a 1 if the importer is in the trade agreement, or if the exporter is?
Comment
Megan Ward

Join Date: Nov 2018

Posts: 17
#3

26 Nov 2018, 09:47

7.another question in how to create the dummy variable for trade diversion, I want the variable to be 1 if the importer is apart of a trade agreement, i tried using the code...
generate saftad = .
replace saftad = 1 if (iso_o =="AFG"|"BGD"|"BTN"|"IND"|"MDV"|"NPL"|"PAK"|"LKA ") & (year >= 2004 )

to show that if the importer is one of these countries, and it is after the year 2004, the variable should contain one.
I get the error message
type mismatch
r(109);

type mismatch;
In an expression, you attempted to combine a string and numeric
subexpression in a logically impossible way. For instance, you
attempted to subtract a string from a number or you attempted
to take the substring of a number.

Does anyone know how i can resolve this
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3022
#4

26 Nov 2018, 15:05

Dear Megan Ward,

Let's go one step at the time. Your results are very strange and suggest that there is something wrong with your data. Are you sure imports are in levels (not logged) and that the regressors are logged?

Best wishes,

Joao
Comment
Megan Ward

Join Date: Nov 2018

Posts: 17
#5

28 Nov 2018, 01:27

Hi Professor Santos Silva.

Imports are in levels and all regressors are logged.

Kind regards,
Megan
Comment
Megan Ward

Join Date: Nov 2018

Posts: 17
#6

28 Nov 2018, 05:34

Hi,
I figured I had an issue with my dataset, which occured when I merged two datasets,
Ive fixed this problem now, and ran the simple regression....

. ppml foodimports ln_dist ln_gdp1 ln_gdp2

note: checking the existence of the estimates
WARNING: foodimports has very large values, consider rescaling
WARNING: ln_gdp1 has very large values, consider rescaling or recentering
WARNING: ln_gdp2 has very large values, consider rescaling or recentering

Number of regressors excluded to ensure that the estimates exist: 0
Number of observations excluded: 0

note: starting ppml estimation
note: foodimports has noninteger values

Iteration 1: deviance = 2.93e+09
Iteration 2: deviance = 2.18e+09
Iteration 3: deviance = 2.09e+09
Iteration 4: deviance = 2.09e+09
Iteration 5: deviance = 2.09e+09
Iteration 6: deviance = 2.09e+09
Iteration 7: deviance = 2.09e+09
Iteration 8: deviance = 2.09e+09

Number of parameters: 4
Number of observations: 16607
Pseudo log-likelihood: -1.044e+09
R-squared: .70764527
Option strict is: off
------------------------------------------------------------------------------
| Robust
foodimports | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ln_dist | -1.022369 .019506 -52.41 0.000 -1.0606 -.9841378
ln_gdp1 | .8504187 .0142094 59.85 0.000 .8225687 .8782687
ln_gdp2 | .6869148 .0112485 61.07 0.000 .6648681 .7089615
_cons | .1208962 .2877725 0.42 0.674 -.4431275 .6849199
------------------------------------------------------------------------------
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3022
#7

28 Nov 2018, 11:38

Great; thanks for the update.

Best wishes,

Joao
Comment
Megan Ward

Join Date: Nov 2018

Posts: 17
#8

29 Nov 2018, 04:27

I have managed to create the dummy variable to show trade diversion, However I am still stuck on exactly how to show fixed effects and which ones to choose exactly. Do you think it is necessary to use time dummies?
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3022
#9

29 Nov 2018, 08:57

Dear Megan Ward,

The standard is to use time-varying importer and exporter fixed effects. Consider using the command ppml_panel_sg that has a nice way to deal with the fixed effects.

Best wishes,

Joao
Comment
Megan Ward

Join Date: Nov 2018

Posts: 17
#10

30 Nov 2018, 07:19

Hi Professor,
Thanks for all your help, the ppml_panel_sg seems to do a good job.
I ran the code...
ppml_panel_sg foodimport divsafta col_to col_fr comcur ln_dist ln_gdp_both comlang_off safta , ex(iso3_o) im(iso3_d) y(year) nopair

I have put my results in a word document to share. Although some questions remain...

1. The R squared is very high, should this be a worry or its simply to do with including the fixed effects which explain a lot of the variation in trade ?
2.Another concern I have is that the dummy variable i created to show trade divergence was dropped from the model, this message appeared...

note: divsafta omitted because of collinearity over lhs>0 (creates possible existence issue)

Any suggestions on how to change this, respecifying the model maybe? it is not essential for my study to know the trade diversion effects, although I think it would be interesting ...

3. These estimates have come from a smaller data set I have been using (only using OECD and SAFTA countries, number of observations= 21000) i have another larger dataset involving 180 countries, number of observations = 500,000). both data sets cover the years 1995-2010. naturally i would expect the regression to run slowly, but when i run the regression STATA doesnt seem to progress further than this...

ppml_panel_sg foodimports ln_gdp_both ln_dist safta , ex(iso3_o) im(iso3_d) y(year) nopair
Initializing...
Checking for possible non-existence issues...
Iterating...

Do you have an recommendations on how to remedy this? I wouldnt mind using a smaller sample if it means i get results, but surely only looking at the trade from a selected number of countries would cause bias?

Many thanks again,
Megan

Attached Files

panelfixed effects.doc (2.7 KB, 1 view)
Comment
Megan Ward

Join Date: Nov 2018

Posts: 17
#11

30 Nov 2018, 07:25

4. Another question I have is why does the number of observations decrease, in my case to around 12,000, if ppml does not drop zero trade flow data ?
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3022
#12

30 Nov 2018, 07:56

Dear Megan Ward,

1 - The R2 is irrelevant and it is always high in models with all the fixed effects.
2 - Those dummies are collinear with the fixed effects and will always drop out. One possibility is to include the sum of the 2 dummies instead of the separate dummies, but you need to be careful with the interpretation (this imposes the restriction that the coefficients on the 2 dummies are the same).
3 - Estimation will take a long time; you need to wait.
4 - I am not sure what you mean by this, but some observations that are perfectly predicted will be dropped and that will reduce the sample size, but not dramatically.

Best wishes,

Joao
Comment
Megan Ward

Join Date: Nov 2018

Posts: 17
#13

30 Nov 2018, 08:12

Thanks, I will wait patiently for the results to come through!
Comment
Megan Ward

Join Date: Nov 2018

Posts: 17
#14

30 Nov 2018, 08:26

The results came through...

ppml_panel_sg foodimports ln_gdp_both ln_dist safta , ex(iso3_o) im(iso3_d) y(year) nopair
Initializing...
Checking for possible non-existence issues...
Iterating...
initial values not feasible
r(1400);

the error code read...
[P] error . . . . . . . . . . . . . . . . . . . . . . . . Return code 1400
numerical overflow;
You have attempted something that, in the midst of the
necessary calculations, has resulted in something too large
for Stata to deal with accurately. Most commonly, this is
an attempt to estimate a model (say with regress) with more
than 2,147,483,647 effective observations. This effective
number could be reached with far fewer observations if you
were running a frequency-weighted model.

(end of search)

not entirely sure how I could resolve this without reducing the data. and if the only solution is to reduce the data, by looking at less countries, what would be the best method to do so? in the literature there doesnt seem to be any explanation on which countries are used in this type of study
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3022
#15

30 Nov 2018, 11:18

Dear Megan Ward,

I suggest you contact Tom Zylkin who is the author of the command.

Best wishes,

Joao
Comment

Announcement

Estimating the gravity model using PPLM, general advice

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment