ppml Gravity Model Problem

Guest

ppml Gravity Model Problem

09 Aug 2015, 09:45

I have some questions regarding a ppml estimation of a gravity model of trade.

The data-set contains nearly 290.000 bilateral observations over 50 years. The data-set used is provided by Rose (2005):
http://faculty.haas.berkeley.edu/arose/RecRes.htm

Unfortunately, I have just the log values for most of the variables. Thus, the advantage of “ppml” regarding the treatment of zero values might disappear. However, I am strongly concerned about the heteroscedasticity in the data.

Reading the paper by Silva and Tenreyro 2006 (The log of gravity: http://personal.lse.ac.uk/tenreyro/LGW.html) was an eye-opener for me. As an undergraduate I have to admit that the implementation for me in Stata appears to be a little tricky. It would be really interesting for me if the results of the paper might change when I apply the Pseudo-Poisson Maximum Likelihood estimator. I use Stata 13.

The description of the variables:

.	* Summary of	the dataset
.	sum
	Variable	Obs	Mean	Std. Dev.	Min	Max

cty1	219573	292.6153	186.4372	111	964
cty2	219573	565.7396	220.612	112	968
year	219573	1979.758	11.98733	1948	1997
ctyname1	0
ctyname2	0

pairid	219573	11150.04	8554.216	765	32585
ltrade	219573	14.64697	3.35878	-11.4853	25.31005
ltrade1to2	192720	14.80027	3.36609	-16.47211	25.19833
ltrade2to1	182644	14.68049	3.482428	-13.54052	25.41054
ldist	219573	8.167161	.8075762	3.782556	9.421514

lrgdp	219573	47.85111	2.665963	35.3876	58.01698
lrgdppc	219573	16.03824	1.449853	10.1211	20.89841
regional	219573	.012292	.1101862	0	1
border	219573	.0308371	.1728766	0	1
comlang	219573	.2266627	.4186735	0	1

comcol	219573	.1015653	.3020765	0	1
comctry	219573	.0003051	.0174656	0	1
colony	219573	.0209953	.1433687	0	1
curcol	219573	.0020494	.0452243	0	1
custrict	219573	.0144326	.1192658	0	1

landl	219573	.2388955	.4596647	0	2
island	219573	.3444595	.5413812	0	2
lareap	219573	24.21759	3.289929	9.638662	32.19601
amount	6775	1050.271	2834.907	4	29871
defby1	6775	.0727675	.2597737	0	1

paris	219573	.0075009	.0862826	0	1
imf	219573	.2911332	.5029937	0	2

First, I transformed the variables back with exp().
Second, I re-scaled them because of the warnings by the first regression.

gen trade_0 = exp(ltrade)/(1e12)
gen trade1to2_0 = exp(ltrade1to2)/(1e12)
gen trade2to1_0 = exp(ltrade2to1)/(1e12)
gen dist_0 = exp(ldist)/(1e12)
gen rgdp_0 = exp(lrgdp)/(1e12)
gen rgdppc_0 = exp(lrgdppc)/(1e12)
gen amount_0 = exp(amount)/(1e12)

Third, I generated the dummy variables and eliminated the xi: after I worked through this: http://www.statalist.org/forums/foru...rgence-problem

gen island_0=1 if island==0
replace island_0=0 if island_0==.
gen island_1=1 if island==1
replace island_1=0 if island_1==.
gen island_2=1 if island==2
replace island_2=0 if island_2==.

gen landl_0=1 if landl==0
replace landl_0=0 if landl_0==.
gen landl_1=1 if landl==1
replace landl_1=0 if landl_1==.
gen landl_2=1 if landl==2
replace landl_2=0 if landl_2==.

gen imf_none=1 if imf==0
replace imf_none=0 if imf_none==.
gen imf_one=1 if imf==1
replace imf_one=0 if imf_one==.
gen imf_both=1 if imf==2
replace imf_both=0 if imf_both==.

Using:
ppml trade paris amount custrict dist comlang border regional ///
rgdp rgdppc comcol curcol colony comctry island_0 island_1 ///
island_2 landl_0 landl_1 landl_2 imf_none imf_one imf_both, cluster(pairid)

I get this results:

	(1)
	trade_0
paris	0.957^***
	(12.67)

custrict	0.133
	(1.41)

dist_0	-98447200.1^***
	(-17.40)

comlang	-0.0975^*
	(-2.30)

border	1.387^***
	(22.54)

regional	1.656^***
	(25.14)

rgdp_0	4.84e-13^***
	(20.65)

rgdppc_0	8608.7^***
	(33.25)

comcol	-3.257^***
	(-10.38)

curcol	0.237^**
	(2.63)

colony	0.991^***
	(17.76)

comctry	-1.445^***
	(-7.24)

island_0	-0.00664
	(-0.12)

island_2	0.0830
	(0.90)

landl_0	0.918^***
	(22.49)

landl_2	-0.384^***
	(-4.32)

imf_none	1.419^***
	(13.43)

imf_one	0.772^***
	(7.29)

_cons	-11.34^***
	(-91.22)
N	219558

I am worried about the strange estimator for dist_0.

Without the re-scaling using:

ppml trade paris amount custrict dist comlang border regional ///
rgdp rgdppc comcol curcol colony comctry island_0 island_1 ///
island_2 landl_0 landl_1 landl_2 imf_none imf_one imf_both, cluster(pairid)

I get this results (which are reasonable for me) :

	(1)
	trade
paris	1.246^***
	(12.85)

amount	0.0000560^***
	(7.14)

custrict	0.693^***
	(4.84)

dist	-0.0000687
	(-1.84)

comlang	-0.0886
	(-0.45)

border	1.514^***
	(7.70)

regional	1.593^***
	(4.36)

rgdp	1.30e-24^***
	(11.94)

rgdppc	1.33e-08^***
	(7.23)

comcol	-1.363^**
	(-2.74)

colony	1.240^***
	(4.08)

island_0	-0.0737
	(-0.35)

island_2	0.994
	(1.28)

landl_0	2.802^***
	(12.87)

landl_1	1.788^***
	(7.92)

imf_one	-0.0568
	(-1.10)

imf_both	-0.200
	(-1.17)

_cons	14.41^***
	(40.54)
N	6760

Any help would be appreciated

Edit: Trying to get this outputs more readable, so far I attached pictures.

3 Photos

Last edited by sladmin; 27 Nov 2017, 09:28. Reason: anonymize poster

Tags: None

Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#2

09 Aug 2015, 11:41

Dear Guest,

Thank you for your interest in PPML. Using PPML should be as easy as using OLS, so let me see if I can help. I had a quick look at what you have done and spotted at least one mistake: you should not take the exponential of the regressors; for example, one of the regressors should be log distance, not distance. Also, I guess that there other things wrong with what you are doing because your -ppml- results indicate a very small number of observations, but we'll get to that later.

So, my suggestion is that you do the following: create the variable trade in levels by taking the exponential of log of trade. As you say, that will not create the zeros, but that is not a priority. Then run -ppml- exactly like you would do OLS; you should even start by using the -xi- prefix instead of creating the dummies yourself (only in very rare cases that is a source of problems). Please show us the results you get and we'll take it from there, OK?

All the best,

Joao

Last edited by sladmin; 27 Nov 2017, 09:28. Reason: anonymize poster
Comment
Guest
#3

09 Aug 2015, 13:34

Dear Prof. Santos Silva,

Thank you very much for your help.

I used:
gen trade = exp(ltrade)

xi: ppml trade ldist lrgdp lrgdppc paris amount i.imf custrict comlang ///
border regional i.landl i.island lareap comcol curcol colony comctry ///
, cluster(pairid)

This are the Results:

Regarding the number of observations by dropping amount and using:
xi: ppml trade ldist lrgdp lrgdppc paris custrict comlang ///
border regional lareap comcol curcol colony comctry i.landl i.island ///
i.imf, cluster(pairid)

the results are:

and using: "su amount"

Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
amount | 6760 1047.603 2836.906 4 29871

Again many thanks for your help.

Kind regards

Last edited by sladmin; 27 Nov 2017, 09:28. Reason: anonymize poster
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#4

09 Aug 2015, 13:50

Hello again,

Thanks for the update. All looks normal now, right? I do not know what "ammount" is, but it looks like you are paying a heavy price for including it.

About the zeros, as far as I understand you do not have those observations in your dataset, right? Of course this is not ideal, but my experience is that omitting the zeros has reasonably small consequences (the results in the "Log of Gravity" illustrate that). So, you should include a reference to the absence of zeros but do not worry too much about that.

All the best,

Joao
Comment
Guest
#5

09 Aug 2015, 14:06

Hello Prof. Santos Silva,

Yes, at the first look I am fine with the results.

The variable "amount" represents the amount of debt which is treated in a renegotiation. As you mention it is a huge price I would pay, but it also makes results different so I have to think about this in-depth. Especially the variable of interest Paris turning from negative to positive, still significant.

The data-set is from the Rose 2005 paper "One reason countries pay their debts: renegotiation and international trade". He links renegotiation (here Paris) and trade controling for typical gravity variables. The paper was published in 2005 so one year before your contribution with the "Log of Gravity".

However, I have still two follow up questions. Does ppml allow for lags or is there any trap I should take care about and should I re-scale the variables ?

Many thanks in advance

Last edited by sladmin; 27 Nov 2017, 09:28. Reason: anonymize poster
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#6

09 Aug 2015, 15:05

Dear Guest,

There is no need to rescale if you are able to get convergence, but if you divide trade by 1e6 (or something like that) convergence may be quicker. About lags; lags of the regressors are fine, lags of the dependent variable could be problematic, so I would avoid it. Finally, about amount, would it make sense to log it?

All the best,

Joao

Last edited by sladmin; 27 Nov 2017, 09:29. Reason: anonymize poster
Comment
Guest
#7

10 Aug 2015, 02:49

Dear Prof. Santos Silva,

Thank you for this suggestion. As all other variables are in logs this might make sense.
Doing this changes the results slightly but not in a way I would be puzzled about.

On the other hand, I thought again about including ln(amount) (="lamount") into the regression and not losing all the observations.
I want to include the amount to control whether the impact of a renegotiation (paris==1) on bilateral trade depends on the ln(amount) that was treated.

The initial regression above seems to drop all observations whenever "amount==.". But this might be the case in all years after a debt renegotiation. Thus the regression would be unusable.

I came up with this to (at least partly) solve the issue:

First generating the "lamount" variable:

gen lamount = ln(amount)

Second using:
replace lamount=0 if (paris==0 & lamount==.)

Whenever paris==0 and the observation for lamount is missing I generated a zero, assuming the lamount is zero.

Of course,this might cause strong measurement error because the zeros I assume could be really missing values/measurement error. I have to look up the paris club source and talk to my adviser.
So far it is my best guess to deal with the problem.

I compared:

. su lamount

Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
lamount | 219558 .1740202 1.013627 0 10.30464

. sum lamount if lamount==0

Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
lamount | 212798 0 0 0 0

As a result exactly N (6760 observations) seem to be >0 and thus unchanged from my operation.

Than I run the regression using:

xi: ppml trade ldist lrgdp lrgdppc paris lamount custrict comlang ///
border regional lareap comcol curcol colony comctry i.landl i.island ///
i.imf, cluster(pairid)

Here are the results:

Best regards

Last edited by sladmin; 27 Nov 2017, 09:29. Reason: anonymize poster
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#8

10 Aug 2015, 07:16

Dear Guest,

Indeed you should discuss this approach with your supervisor; at least, I would add a dummy identifying the observations you tweaked. Finally, do not include the values for ll and bic; they are irrelevant for models estimated by pseudo maximum likelihood.

All the best,

Joao

Last edited by sladmin; 27 Nov 2017, 08:25. Reason: anonymize original poster
Comment

Announcement

ppml Gravity Model Problem

Comment

Comment

Comment

Comment

Comment

Comment

Comment