Gravity model of trade estimation problems in Stata 13

Maks Wawrzyniak

Join Date: Jun 2017

Posts: 4
#1

Gravity model of trade estimation problems in Stata 13

11 Jun 2017, 11:15

Hello everyone,

I am a graduate student writing their thesis on the determinants of China’s exports to its 35 biggest trading partners. I am working on a three dimensional balanced panel (country sector year) and including sector dimension to the panel is imperative for my research. I also work on STATA 13.

I have a problem with being able to set up a three dimensional panel with xtset, as it does not allow three variables and does not allow for repeated time values within panel (There are 4 sectors for each country and observations through 14 years for each sector). I tried to circumvent this by grouping together country-industry dummies, yet it has longstanding implications for fixed effects in such a model. Is there another way that I could set up the data to be able to run all the relevant tests? (for unit root and whatnot).

Another question is, assuming there is no other way than to include country-industry fe, the resulting xtunitroot (ips and dfuller) tests seem to imply two of my variables which are ln(exporter gdp) and ln(exporter gdp per capita) are nonstationary. The dependant variable is stationary(log(export)) is stationary including 1 lag, as well as importer gdp and importer gdp per capita. I am not sure how to proceed from this, as I know this violates OLS assumptions as I can’t regress I(0) on I(1) series. What should I do?

I have a few more questions, but this is a nice start.

Thank you VERY much to anyone who can help in any way.
Tags: None
Tom Zylkin

Join Date: Nov 2016

Posts: 188
#2

12 Jun 2017, 00:22

Hi Maks,

If you are trying to do a FE linear model with more than 2 dimensions, I would tend to recommend reghdfe. However, I am not clear why using country-industry and time fes would be a problem here. That seems to me to be the most natural way to set up a panel-data regression in this case, especially if you want to exploit time variation for identification. But it depends on what variables you are really interested in and whether they vary over time.

Perhaps someone else who more familiar with the time series aspects of what you are asking can give you more specific advice on those and on the above. But if China is the only exporter, shouldn't the exporter GDP and GDP per capita variables here be absorbed by using time dummies? If you are not already including these, I think this is something you should strongly consider.

Hope this is at least a little bit helpful...

Tom
Comment
Maks Wawrzyniak

Join Date: Jun 2017

Posts: 4
#3

20 Jul 2017, 12:52

I know this is a thread I didn't write in since, but I have been doing extensive research on my gravity model and I ran into another set of questions;

Thank you, Tom Zylkin, for your suggestion, as using dummies for years did help to bring the estimation closer to coefficients which I was expecting it to take. The panel problem from the latest
entry became outdated as I instead decided to run seperate regressions inside of the sectors instead. The model as it is right now employs: gdp, differential gdp, rta dummy, common border dummy, distance and time dummies and destination dummies to predict trade. The data also, as specified before, describes export to 35 countries captured over 15 years.

I decided to use PPML as it proved to be robust in spite of not having zeroes in the trade matrix. The new set of questions is as follows:
After countlessly respecifying and rerunning and investigating in depth the unit roots of each variables I found out most of them may be actually nonstationary, along with trade. Now, I heard that it is not a pertinent issue in the case of my dataset, as my N is bigger than T. I am unsure about that conclusion, and I want to verify it. In the case of actually needing to correct with ECM and the like, which relevant commands do I use for the tests and the model? Last time I tried running xtpedroni or xtwest they wouldn't compute...

I have tried using ppml as well as xtpoisson to run the relevant regressions but both of them give me different coefficients. In the case of ppml I run:

ppml exp lgdp lgdppcd cb rta ldistance time_fe* importer_fe*

In the case of xtpoisson I run:

xtset importer time
xtpoisson exp lgdp lgdppcd cb rta ldistance i.time

In the case of xtpoisson, it seems to produce coefficients which are closer to expectations. Using PPML as specified causes distance to yield a plus coefficient unlike xtpoisson, as well as the gdp differential coefficient is more sound in the case of xtpoisson... I want to employ ppml in the end as it has robust standard errors.

In the beginning I also checked simpler panel data methods like xtreg, fe and xtreg, re. The hausman test yielded a p-value of 0, which means that between fe and re the coefficients were systematically different. I was advised to use random effects in spite of that as they drop a significant part of my independant variables. I am unsure if that is correct however. While checking the same thing in the context of xtpoisson, the hausman test cannot compute it but even by eye the coefficients don't differ. Is it okay to run a random effects specification in that case?

Since I decided to run seperate regressions for each sector is it possible for me to compare coefficients in between them in some manner, as to make conclusions on the effects of say, gdp being stronger in the case of one rather than the other? I know that might seem like a basic question, but I am unsure if carrying out a between-regression comparison in such capacity as I've never done that before.

Thank you for taking the time to read this, I really need help with those questions as I couldn't really arrive at an answer by myself so far and I'm expected to turn in the finished product next month for revision, so my deadline is really close. I'll be really greatful to anybody who can shed some light on my problems.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3063
#4

21 Jul 2017, 12:08

Dear Maks,

Here are some answers:

1 - I assume your N is much larger than T, so do not worry about non-stationarity.

2 - Your -xtpoisson- regression is RE; you need to include the FE option and then the results should be the same.

3 - Just ignore the estimates based on the linear models as they are not reliable (and be very skeptical of RE estimators anyway).

4 - If you mean testing the significance of the difference, I think that there is a command that allows you to do it, but I cannot recall which one it is.

Best wishes,

Joao
Comment
Maks Wawrzyniak

Join Date: Jun 2017

Posts: 4
#5

22 Jul 2017, 06:39

Originally posted by Joao Santos Silva View Post

Dear Maks,

Here are some answers:

1 - I assume your N is much larger than T, so do not worry about non-stationarity.

2 - Your -xtpoisson- regression is RE; you need to include the FE option and then the results should be the same.

3 - Just ignore the estimates based on the linear models as they are not reliable (and be very skeptical of RE estimators anyway).

4 - If you mean testing the significance of the difference, I think that there is a command that allows you to do it, but I cannot recall which one it is.

Best wishes,

Joao

First and foremost, thank You so much for your answer, it has been very helpful.

From what I can gather of the advice: I should not try to use random effects specification for gravity modelling. As You have specified, indeed, using FE in xtpoisson did give the same result as the ppml command. The problem with that approach in my case is that it inevitably drops the time invariant, country specific variables that are distance and common border, which are also part of my research questions. Would a random effects estimation be unreliable in case of wanting to find estimates for those variables? If yes, then is there any alternatives to this method to include them?

The ppml model fits my data better than linear models anyway, as evidenced by the model specification test. I only mention them in my paper solely for comparison, and I rely on ppml for conclusions.

Thank You again,
Maks
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3063
#6

22 Jul 2017, 07:14

Maks,

The problem is not the estimation method but your data. Because you only have one exporter, the importer dummies do not allow you to estimate the effects of time-invariant characteristics of the pair. For that you need data from different importers and exporters. The alternative is to drop the dummies, but you gain nothing by using RE.

Best wishes,

Joao
Comment

Maks Wawrzyniak

Join Date: Jun 2017
Posts: 4

28 Jul 2017, 06:14

Originally posted by Joao Santos Silva View Post

Maks,

The problem is not the estimation method but your data. Because you only have one exporter, the importer dummies do not allow you to estimate the effects of time-invariant characteristics of the pair. For that you need data from different importers and exporters. The alternative is to drop the dummies, but you gain nothing by using RE.

Best wishes,

Joao

Thank You for the extremely valuable input. I have been working on my paper for the last few days non-stop, so I couldn't properly write back but now all that's left is to carry through with the estimation results. I want to put my doubts to rest, so I have a few more questions.

I understand that absconding the pair fe in this case grants me the ability to produce results for time-invariant variables, but I am also aware that without those dummies I no longer have a theory-consistent proxy for multilateral resistance terms. I have seen papers mentioning random intercept PPML regression which would allow to estimate the coefficients for those while also relaxing the random effects assumptions given a big enough sample. Is that methodology applicable here? If not, I can at best add time dummies to the PPML equation.
I decided to test different specifications for those models more. While PPML does test better under RESET test, OLS seems to imply coefficients which are more in line with previous studies. (Importer GDP has an elasticity of 0.7 in OLS while only 0.25 in PPML). Including fixed effects for country pairs (just country dummies) and time dummies they yield similair coefficients for Importer GDP but they both get surprisingly close to zero (0.03 and 0.07). Is something wrong with my data or method? As a reminder, I proxy exporter characteristics with time dummies. I include the regressions under this for refference. I am sorry for the formatting, I am not sure how to append regressions here.

OLS:

Linear regression		Number of obs	= 9382
		F( 6, 9375)	= 2280.47
		Prob > F	= 0.0000
		R-squared	= 0.6424
		Root MSE	= 1.6584

Robust
lexp Coef.	Std. Err.	t	P>t	[95% Conf.	Interval]

lgdp .7090634	.0196851	36.02	0.000	.6704763	.7476505
lgdppcd .1340517	.0142817	9.39	0.000	.1060566	.1620469
ldistw -.5735783	.0421837	-13.60	0.000	-.6562674	-.4908892
commonlang 1.35763	.0666349	20.37	0.000	1.227011	1.488249
commonborder .3907141	.0737927	5.29	0.000	.2460645	.5353638
rta 1.234155	.0544715	22.66	0.000	1.127379	1.340931
_cons -1.630703	.6881079	-2.37	0.018	-2.979544	-.2818626

PPML:

Number of parameters: 7
Number of observations: 9382
Pseudo log-likelihood: -8.887e+09
R-squared: .59787896
Option strict is: off
(Std. Err.	adjusted	for 149 clusters in pc)

Robust
exp Coef. Std. Err. z	P>z	[95% Conf. Interval]

lgdp .242857 .0150617 16.12	0.000	.2133365 .2723774
lgdppcd .3829119 .0760324 5.04	0.000	.2338913 .5319326
ldistw -1.014807 .1268726 -8.00	0.000	-1.263472 -.766141
commonlang .5475853 .2352185 2.33	0.020	.0865655 1.008605
commonborder .5565695 .2084685 2.67	0.008	.1479787 .9651603
rta .6171878 .2163826 2.85	0.004	.1930856 1.04129
_cons 13.22188 1.148438 11.51	0.000	10.97098 15.47278

Country and time effects:

Fixed-effects (within) regression	Number of obs = 9382
Group variable: pc	Number of groups = 149
R-sq: within = 0.6125	Obs per group: min = 4
between = 0.4598	avg = 63.0
overall = 0.2940	max = 64
	F(18,148) = 177.01
corr(u_i, Xb) = 0.1691	Prob > F = 0.0000
(Std. Err.	adjusted for 149 clusters in pc)

Robust
lexp Coef. Std. Err. t	P>t [95% Conf. Interval]

lgdp .0771779 .0319479 2.42	0.017 .014045 .1403109
lgdppcd .1065127 .0317846 3.35	0.001 .0437024 .169323
ldistw 0 (omitted)
commonlang 0 (omitted)
commonborder 0 (omitted)
rta -.1834136 .1122312 -1.63	0.104 -.4051961 .0383689

PPML with country and time fe

	(Std. Err.	adjusted	for clustering on pc)

	Robust
exp	Coef.	Std. Err. z	P>z	[95% Conf. Interval]

lgdp	.0398499	.0172611 2.31	0.021	.0060187 .0736812
lgdppcd	.1853515	.0555009 3.34	0.001	.0765717 .2941313
rta	.0769348	.1165917 0.66	0.509	-.1515807 .3054503

time
2001	.0714634	.0107115 6.67	0.000	.0504693 .0924576
2002	.2705849	.0181149 14.94	0.000	.2350803 .3060895
2003	.5509846	.0320815 17.17	0.000	.4881059 .6138633
2004	.8264636	.0560871 14.74	0.000	.7165348 .9363923
2005	1.052256	.0762326 13.80	0.000	.902843 1.201669
2006	1.275218	.0912789 13.97	0.000	1.096315 1.454122
2007	1.491403	.1075612 13.87	0.000	1.280587 1.702219
2008	1.630144	.1205517 13.52	0.000	1.393867 1.866421
2009	1.47785	.114627 12.89	0.000	1.253185 1.702514
2010	1.731811	.1224791 14.14	0.000	1.491756 1.971865
2011	1.904045	.1268558 15.01	0.000	1.655412 2.152677
2012	1.978931	.129065 15.33	0.000	1.725968 2.231893
2013	2.053478	.1332393 15.41	0.000	1.792334 2.314623
2014	2.095717	.1339651 15.64	0.000	1.83315 2.358284
2015	2.095204	.1353254 15.48	0.000	1.829971 2.360437

I really hope somebody can help me answer those last two inquiries. I'm really greatful for all the help I got so far!

Comment

Joao Santos Silva

Join Date: Apr 2014

Posts: 3063
#8

29 Jul 2017, 07:43

Maks,

1 - I do not think the RE models you mention can do that, so stay away from them.

2 - Having results in line with previous studies is not necessarily good, especially if earlier work is wrong! The dummies you include is the model must be almost collinear with GDP, hence the small coefficients.

Best wishes,

Joao
Comment
Romano Piras

Join Date: Mar 2020

Posts: 17
#9

09 Mar 2020, 04:48

Hello everyone,
I've just read in this post:

"I heard that it is not a pertinent issue in the case of my dataset, as my N is bigger than T. (Maks);
"I assume your N is much larger than T, so do not worry about non-stationarity" (Joao).

I'm working with a bilateral migration flows data set with N=96 and T=34 and given Maks and Joao statements, I deduce that I do not have to be worried about unit root and cointegration, am I right? Is there anyone who could give me some references on this topic?

Best regards
Romano
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3063
#10

10 Mar 2020, 13:08

Dear Romano Piras,

It all depends on how you do the asymptotics. if you are doing the asymptotics on N with T fixed, you do not need to worry about the time-series properties. However, if you do the asymptotics in T with N fixed, then you need to worry about the time series properties.

Do you really have N=96 or do you have flows between 96 countries, which is almost 10,000 observations?

Best wishes,

Joao
Comment
Romano Piras

Join Date: Mar 2020

Posts: 17
#11

11 Mar 2020, 04:02

Dear Joao,
many thanks for your answer. Actually I have bilateral flows from 8 origin countries towards 12 destinations which correspond to 96 bilateral units of my gravity panel (that's why I wrote N=96). These flows are observed annually for 34 years (T), thus the total number of observations is N x T=3264.

Best regards,
Romano
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3063
#12

11 Mar 2020, 04:47

Dear Romano Piras,

Can't you get data for more countries? As it stands, I do not think you can ignore the time-series dimension of the problem.

Best wishes,

Joao
Comment
Romano Piras

Join Date: Mar 2020

Posts: 17
#13

11 Mar 2020, 05:22

Dear Joao,
actually what I'm studying are interregional flows across 20 Italian regions and I want to concentrate my analyse on bilateral flows from the 8 Southern towards the 12 Centre-Northern regions. With the available data, I can also study the overall pattern of bilateral flows considering each region, at the same time, as both sending and receiving region (excluding intra-regional flows). In such a case I would have N = 20 x 19 = 380 bilateral flows, and the total number of observations would be 380 x 34 (years) = 12920. In such a case, if I understand what you mean, I could go safely ignoring the time-series dimension of the problem, is that right?

Best Regards,
Romano
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3063
#14

12 Mar 2020, 07:42

That sounds much better.

Best wishes,

Joao
Comment
Romano Piras

Join Date: Mar 2020

Posts: 17
#15

03 Apr 2020, 03:52

Dear Joao
many thanks and sorry for the delay in responding to you.

romano
Comment

Announcement