PPML, panel data - Statalist

Said Jafar

Join Date: Feb 2015

Posts: 113
#181

29 Nov 2017, 23:01

Dear Cengiz.

If your panelid includes hs fixed effects, then no need to include again. But note that including origin_id dest_id h2 all together is not as same as including them separately, especially in xt style commands.

All the best.
Comment
Dilshat Obul

Join Date: Jun 2017

Posts: 8
#182

20 Jan 2018, 20:27

Hi all~~
i have a trade flow data at industry level from 10 industry of country A to 10 industry of 30 countries respectively, for 4 years, the trade flow is in two direction, from A to 30 countries and 30 countries to A. It's somehow like a Input-Output-Table. I want to estimate the Economic Integration Agreement between country A and several countries of that 30 countries. but i don't know how to set the fixed effects to get a precise estimation. If anyone can help me, I will really appreciate.

All the best.

Last edited by Dilshat Obul; 20 Jan 2018, 20:31.
Comment
Marcelo Dolabella

Join Date: Jan 2018

Posts: 5
#183

25 Jan 2018, 07:37

Dear all,

I am estimating a gravity FE panel for the world trade for one particular product (HS-6digit) and I have some questions.

1) I have around 97% of zeros. Is PPML still a consistent estimator in this case? Santos Silva & Tenreyro (2011) argue so but I am not sure if their simulations considered this amount of zeros. I am aware that this information (97% of zeros) might not be very informative beacuse what I need to analyze is conditional over-dispersion. However I am not sure how to assess it. Any comments on the topic will be very appreciated. ( I am aware that negative binomial models might present difficulty in converging, and I have not found many applications fo zero inflated models or hurdle models for panel data)

2)While doing xtpoisson, fe or ppml_panel_sg should all my continuous variables enter the regression as logs? ( unfortunately I cannot run ppml because the amount of country pair dummies exceeds the maximum number of variables Stata 13 handles)

3) I am worried about endogeneity of tariffs. I am thinking of estimating it with the lagged value of tariffs as a proxy and maybe using some instruments. For the latter, which command should I use? I am not sure if ivpoisson is a fixed effects estimator. Should I go with this one or should I estimate a linear first stage with xtreg, fe and then use the results as the instrument in the xtpoisson, fe.

Thank you in advance
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3027
#184

25 Jan 2018, 13:02

Dear Marcelo,

1) The percentage of zeros and conditional over-dispersion are totally are totally irrelevant for the consistency of PPML.

2) That is generally the right approach.

3) ivpoisson is not valid with fixed effects and the approach you propose also appears to be invalid (I am not sure if I understood it).

Best wishes,

Joao
Comment
Dilshat Obul

Join Date: Jun 2017

Posts: 8
#185

26 Jan 2018, 20:27

Dear Joao,

i have a trade flow data at industry level from 10 industry of country A to 10 industry of 30 countries respectively, for 4 years with intervals, the trade flow is in two direction, from A to 30 countries and 30 countries to A. It's somehow like a Input-Output-Table, but obviously it is not symmetric. if we assume the each industry is the origin of the trade flow, then 10 origin to 10*30=300destinaton, and 300origin to 10destination. and there are a lot of zero. I want to estimate the Economic Integration Agreement between country A and several countries of that 30 countries.
1. Can i still use gravity model?
2. i don't know how to set the fixed effects to get a precise estimation. i tried exporter-year, importer-year fixed effects, but ended up with all collinear, not enough observation ...et.

thank you!

All the best.

Dilshat
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3027
#186

27 Jan 2018, 12:02

Dear Dilshat,

Indeed, you can still use a gravity model and estimate it by PPML. Have a look at the ppml_panel_sg command and see if it works for you.

Best wishes,

Joao
Comment
Dilshat Obul

Join Date: Jun 2017

Posts: 8
#187

29 Jan 2018, 01:29

Dear Joao,

thank you for your reply. I tried that command ( ppml_panel_sg) , but my bilateral dummy variable for
Economic Integration Agreement was ommitted because of collinear with fixed effects. the fixed effects I used are origin industry_year and destination industry_year fixed effects.

One interesting thing is, the same fixed effects are ok with (30*10)*(30*10) data, but not ok with 10*(30*10) data.
how do we set the fixed effect on this case? or should I estimate separately that country A with each of 30 country?

thank you !
Bests,

Dilshat
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3027
#188

29 Jan 2018, 14:55

If the variable is dropped, that means its effect cannot be identified; I am afraid there is no way around it.

I am not familiar with what you are doing, but I would not estimate each country separately.

Best wishes,

Joao
Comment
Dilshat Obul

Join Date: Jun 2017

Posts: 8
#189

31 Jan 2018, 03:02

Dear Joao,

yeah, OK! Thank you~! I will try more on this . there must be some solutions~

Beats,

Dilshat
Comment
Olivia Senlin

Join Date: Feb 2018

Posts: 19
#190

20 Feb 2018, 10:11

Dear Joao,

I am trying to estimate the determinants of trade of salmon. I am looking at the effect on the sanctions imposed on Norway by China in 2010.
My data consists of Norwegian export data to 140 countries, 1998-2016. (I have different categories, but for the models I am asking you about now I have omitted the categories). I also have Chinese import data from 40 countries, 1998-2016.

After reading the "log of gravity" and looking around I thought that I would use the ppml method, but I have some questions to these estimations and results, and I want to see that my data is in shape to use the model. Sorry if my questions are very elementary.

These estimations are of export from Norway to 140 countries.
The first code I do not include importer and year effects, whereas I do in the second. I find the results odd and have some questions.
The difference in differences variables, POST, PD and did are included in both 1) and 2)

In 1) lgdp_imp (ln gdp importer) is 0.72, whereas lgdp exp is 0.06 and lpop_imp (ln population importer) is -0.06. Is that not strange numbers? Especially the negative lpop.

and in 2) lgdp_imp is 1.73, and lpop_imp is -1.39. , and ldist has a p value of 0.6. What can this indicate?

And my final question, is ppml suitable for did?

1)

PHP Code:

xi: ppml total_sum lgdp_imp lgdp_exp lpop_exp lpop_imp ldist imp_landlock contig POST PD did, cluster (importer)

PHP Code:

Number of parameters: 11 Number of observations: 4002 Pseudo log-likelihood: -6282172.7 R-squared: .49790244 Option strict is: off (Std. Err. adjusted for 138 clusters in importer) ------------------------------------------------------------------------------ | Semirobust total_sum | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- lgdp_imp | .7259125 .1667171 4.35 0.000 .399153 1.052672 lgdp_exp | .0606086 .2429501 0.25 0.803 -.4155648 .536782 lpop_exp | 6.30272 1.532978 4.11 0.000 3.298139 9.307301 lpop_imp | -.0615439 .2121635 -0.29 0.772 -.4773767 .354289 ldist | -1.163593 .2515042 -4.63 0.000 -1.656533 -.6706543 imp_landlock | -1.831931 .3727096 -4.92 0.000 -2.562428 -1.101433 contig | -.2226264 .5300417 -0.42 0.674 -1.261489 .8162363 POST | -.1390643 .0414326 -3.36 0.001 -.2202707 -.0578579 PD | -.5181162 .5365442 -0.97 0.334 -1.569723 .5334911 did | -.7081487 .1474914 -4.80 0.000 -.9972265 -.419071 _cons | -98.94527 20.63708 -4.79 0.000 -139.3932 -58.49734

2)

PHP Code:

xi: ppml total_sum lgdp_imp lgdp_exp lpop_exp lpop_imp ldist imp_landlock contig POST PD did i.importer i.year, cluster (importer)

PHP Code:

Number of parameters: 164 Number of observations: 3857 Pseudo log-likelihood: -1053542.9 R-squared: .92469055 Option strict is: off (Std. Err. adjusted for 133 clusters in importer) -------------------------------------------------------------------------------- | Semirobust total_sum | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------------+---------------------------------------------------------------- lgdp_imp | 1.73636 .2992958 5.80 0.000 1.149751 2.322969 lgdp_exp | .1814279 .2833422 0.64 0.522 -.3739126 .7367683 lpop_imp | -1.391558 .8740295 -1.59 0.111 -3.104624 .3215088 ldist | 1.279127 2.757464 0.46 0.643 -4.125404 6.683657 imp_landlock | 3.593131 1.371672 2.62 0.009 .9047021 6.281559 contig | 10.05969 3.719869 2.70 0.007 2.768878 17.3505 POST | .7288191 .1104002 6.60 0.000 .5124386 .9451996 PD | 8.08265 .8095152 9.98 0.000 6.496029 9.66927 did | -1.643086 .316987 -5.18 0.000 -2.264369 -1.021803

Thank you in advance,
Best,

Olivia Eline
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3027
#191

21 Feb 2018, 13:38

Dear Olivia,

PPML is indeed appropriate to estimate these models, but I think you need to think carefully about the specification you use: e.g., is the Norwegian population a relevant factor for the exports of salmon? You need to see how you can specify you model to account for the fact that you are using data on the exports from a single country.

Best wishes,

Joao
Comment
Olivia Senlin

Join Date: Feb 2018

Posts: 19
#192

26 Feb 2018, 09:38

Dear Joao,
Thank you for your reply! I highly appreciate it. I also have some further questions if you don't mind.

Yes, I suppose it (Norwegian population) doesn't bring much to the model. So I want to examine the effects of the sanctions on the trade.
I have the trade data divided on a product-level (different products of salmon) as well, but in the last post I ran it with general export numbers.

1)
I have to ask you about the distance variable (ldist). It is expected to get a negative value, right? But in my second estimation, I include the importer dummies, is that why ldist has a coefficient of 1.27 and a p-value of 0.643?
But should't it be the same for imp_landlocked as this is a time-invariant variable? This one has a 0.009 p-value.
But shouldn't both of these variables be dropped when I add the importer dummies, or am I misunderstanding something?

2) As I said, I have data in country - product - year, is it correct to set it like this? And also add product dummies?

PHP Code:

egen panelid = group(importer category) xtset panelid year xi: ppml ImportTons lgdp_imp lgdp_exp lgdpcap_imp lgdpcap_exp ldist imp_landlock contig POST PD did i.importer i.year i.category, cluster (importer)

PHP Code:

Number of parameters: 173 Number of observations: 23327 Pseudo log-likelihood: -4383575.5 R-squared: .89674007 Option strict is: off (Std. Err. adjusted for 133 clusters in importer) -------------------------------------------------------------------------------- | Semirobust ImportTons | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------------+---------------------------------------------------------------- lgdp_imp | .3448101 .7483004 0.46 0.645 -1.121832 1.811452 lgdp_exp | .1814605 .2833128 0.64 0.522 -.3738223 .7367433 lgdpcap_imp | 1.39151 .8740319 1.59 0.111 -.3215613 3.104581 ldist | .9888964 2.438403 0.41 0.685 -3.790286 5.768079 imp_landlock | 3.234136 5.442267 0.59 0.552 -7.432512 13.90078 contig | 10.39028 .2977781 34.89 0.000 9.806644 10.97391 POST | .7288256 .110402 6.60 0.000 .5124416 .9452096 PD | 8.177748 5.143414 1.59 0.112 -1.903158 18.25865 did | -1.644468 .3171585 -5.19 0.000 -2.266087 -1.022848

3) If I want to look at the difference-in-differences effect of salmon trade on a specific product, versus all others, would it be correct to estimate the model with import of that category as dep var?
And would I have to drop the other observations before doing this?

4) Is the R-squared important in ppml or should it be overlooked?
5) Are there other tests than the RESET test that you recommend to test the specification?
When I run the reset test with the help from your web site I get that

PHP Code:

( 1) fit2 = 0 chi2( 1) = 44.31 Prob > chi2 = 0.0000

Thank you again,

Olivia
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3027
#193

26 Feb 2018, 14:39

Dear Olivia,

Most of your questions are about modeling decisions and you should discuss these with your advisor or other researchers in the project, but

4) I would say that the R2 is never important.

5) The RESET is probably the only test that may be relevant in your context and it looks as if your model fails it.

Joao
Comment
Olivia Senlin

Join Date: Feb 2018

Posts: 19
#194

27 Feb 2018, 07:32

Dear Joao,
Thank you for your answer and sorry to bother you with irrelevant questions.

1) For my first questions, I just thought that the ppml function in stata would automatically drop the time-invariant variables when country dummies are included, is this not correct?

2)When I estimate the model without the product categories, and only set importer as the panel id, I get that the RESET result of

PHP Code:

( 1) fit2 = 0 chi2( 1) = 0.82 Prob > chi2 = 0.3660

What can this mean?

3) I just want to make sure I conducted the RESET test correctly the first time:

PHP Code:

xi: ppml ImportTons lgdp_imp lgdp_exp lgdpcap_imp lgdpcap_exp ldist imp_landlock contig POST PD did i.importer i.year i.category, cluster (importer) predict fit, xb gen fit2=fit^2 xi: ppml ImportTons lgdp_imp lgdp_exp lgdpcap_imp lgdpcap_exp ldist imp_landlock contig POST PD did i.importer i.year i.category fit2, cluster (importer) test fit2=0

4) When I run different specifications I always get that

PHP Code:

WARNING: lgdp_imp has very large values, consider rescaling or recentering WARNING: lgdp_exp has very large values, consider rescaling or recentering

and

PHP Code:

note: ImportTons has noninteger values

Is this of importance?

Thank you again so much,
Best,
Olivia
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3027
#195

27 Feb 2018, 13:17

Olivia,

1) The "ppml" command was not designed for count data, so it won't do that; "xtpoisson" will do it.

2) That model passes the test.

3) That looks right to me :-)

4) You can ignore the warnings unless you have convergence issues; the note you can always ignore!

All the best,

Joao
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment