PPML estimation, no standard errors reported

Giulia Sabbadini

Join Date: Jul 2017

Posts: 2
#16

10 Jul 2017, 09:33

Dear Statalist,
I am Giulia and this is my first post. I am working on a paper studying the effects of Non-Tariff Measures (SPS and TBT) on Processed Food Exports. I have 149 countries, 13 years and 56 product lines for a total of 16,053,856 observations (exporter * importer * year *product). I constructed a dummy variable for both SPS_jikt and TBT_jikt notifications. The dummies pick up value 1 from year t and onwards if the importing country i imposed at least one sps/tbt measure on product line k in year t. The vector of the other variables includes standard gravity covariates.
I tried to use the ppml_panel_sg command by typing:
ppml_panel_sg trade sps tbt, ex(iso_o) im(iso_d) y(year) ind(HS) sym cluster(id)
where id=group(iso_o iso_d HS),the variable I used to set the panel dimension of the dataset.
What Stata returns is the following:
“Checking for possible non-existence issues...
note: sps omitted because of collinearity over lhs>0 (creates possible existence issue)
note: tbt omitted because of collinearity over lhs>0 (creates possible existence issue)
Error: all main covariates appear to be collinear with the implied set of fixed effects”
I have no clue regarding a possible solution to this problem. Alternatively, I was thinking about a two-stage procedure (à la Helpman-Melitz-Rubinstein) to take into account all the zeros. In particular, to run a Probit estimation for the first stage and then using the areg command for the second stage, absorbing the exporter-time, importer-time and pair FE.
Any suggestion would be greatly appreciated.
Thanks in advance,
Giulia
Comment
Andrew Chan

Join Date: Jun 2017

Posts: 17
#17

10 Jul 2017, 10:54

Thanks Joao.

Tom: thank you for the suggestion to use poi2hdfe, I tried the following

poi2hdfe y_cgt x_cgt _CG*, id1(id_GT) id2(id_CT) cluster(id_CG)

I am explicitly estimating coefficients for the smallest FE as you suggest (the country-group FE), but after over an hour of waiting the estimator failed to converge. I'm at a loss with what I can do to move forward. Any other suggestions would be most appreciated.

Thanks,
Andrew
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3006
#18

10 Jul 2017, 13:31

Dear Andrew,

Unlike -ppml- and -ppml_panel_sg-, poi2hdfe is not guaranteed to converge. One option is to first use -ppml- to selects the sample and variables to use and then do poi2hdfe. The -ppml- help file has an ecemple of how to do this with a tobit, but with -poi2hdfe- should be similar.

Best wishes,

Joao
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3006
#19

10 Jul 2017, 13:43

Dear Giulia,

If I understand it correctly, your variables of interest are just characteristics of the importer, and therefore are dropped when you include destination dummies. If that is the case, you simply cannot estimate their effect.

You will have the same problem with the HMR approach. Anyway, I would strongly advise against that approach for reasons explained here.

Best wishes,

Joao
Comment
Tom Zylkin

Join Date: Nov 2016

Posts: 188
#20

10 Jul 2017, 17:43

Originally posted by Andrew Chan View Post

Thanks Joao.

Tom: thank you for the suggestion to use poi2hdfe, I tried the following

poi2hdfe y_cgt x_cgt _CG*, id1(id_GT) id2(id_CT) cluster(id_CG)

I am explicitly estimating coefficients for the smallest FE as you suggest (the country-group FE), but after over an hour of waiting the estimator failed to converge. I'm at a loss with what I can do to move forward. Any other suggestions would be most appreciated.

Thanks,
Andrew

Hi Andrew,

OK, sorry to hear that. I did have another suggestion though that (might?) work for you. Since your data does have a three-way interacted fixed effects structure (similar to panel gravity), is it possible to treat your "group" id as though it were an "importer"/"destination" in a gravity setup.

If so, you could then run:

ppml_panel_sg y_cgt x_cgt , ex(id_C) im(id_G) year(id_T) cluster(id_CG)

which will give you "CT", "GT", and "CG" fixed effects (I think this is what you want, right?)

I think this should work so long as (c,g,t) uniquely describes your data. If not, you may need to run collapse (sum) beforehand. Anyway, fingers crossed, but I think this should work...

Regards,
Tom
Comment
Andrew Chan

Join Date: Jun 2017

Posts: 17
#21

11 Jul 2017, 07:59

Hi Tom,

Thanks for following through, your suggestion worked! I appreciate it.

I have one last question: is there a reason the -keep- option is not keeping groups with all zeros in the dep var? I realize -ppml_panel_sg- drops these observations normally, but shouldn't -keep- allow for estimation with all observations? Any advice why this is happening would be helpful. Thanks.

Regards,
Andrew

Last edited by Andrew Chan; 11 Jul 2017, 08:05.
Comment
Tom Zylkin

Join Date: Nov 2016

Posts: 188
#22

11 Jul 2017, 17:57

Originally posted by Andrew Chan View Post

Hi Tom,

Thanks for following through, your suggestion worked! I appreciate it.

I have one last question: is there a reason the -keep- option is not keeping groups with all zeros in the dep var? I realize -ppml_panel_sg- drops these observations normally, but shouldn't -keep- allow for estimation with all observations? Any advice why this is happening would be helpful. Thanks.

Regards,
Andrew

Hi Andrew,

Very happy to hear that it worked! As to your question, I think the dropped observations you are referring to must be the ones that are dropped because they belong to FE groups for which the LHS is always zero... an example would be if you have country-time fixed effects and you only observe zero values for a particular country and a particular year. It is not possible to include these observations as the FE that corresponds to them is not defined (technically it is negative infinity.)

Hope this helps.

Tom

PS: I got your original post where you said you reached the max number of iterations before convergence, so I had some suggestions for diagnostics. I see now from the edited version of your post this is no longer an issue. So for posterity I will just leave what I originally wrote here:

For if ppml_panel_sg does not converge within the max number of iterations...

- Have you tried running either ppml_panel_sg with the "strict" option enabled. If so, do your of the main covariates drop? (This suggestion also applies to -ppml- as well, since this option is taken from the original -ppml- command.)

- To check if it's actually converging, you can try running the following syntax:

ppml_panel_sg y_cgt x_cgt , ex(id_C) im(id_G) year(id_T) cluster(id_CG) verb(25) noaccel tol(50000)

where "verb(25)" will show output from every 25 iterations and tol(50000) can be used to toggle the max number of iterations. Does it look like it is converging?

- If you run this code this may figure out whether there is a particular variable that may be causing a problem, you can run

hdfe x_cgt if y_cgt>0, absorb(id_CT id_GT id_CG) gen(test)
sum test* if y_cgt>0
corr test* if y_cgt>0
reg test* if y_cgt>0

This will check the degree of collinearity in x_cgt over y_cgt>0 after netting out fixed effects. (You can also do the same without the if y_cgt>0 for a more general collinearity check.)

Last edited by Tom Zylkin; 11 Jul 2017, 18:13.
Comment
Tom Zylkin

Join Date: Nov 2016

Posts: 188
#23

11 Jul 2017, 20:36

^ Sorry - to amend the above "tol(50000)" should be "max(50000)"... early morning mistake there: the "tol()" option obviously refers to the tolerance.
Comment
Giulia Sabbadini

Join Date: Jul 2017

Posts: 2
#24

13 Jul 2017, 02:06

Dear Joao,

sorry for the late reply and thanks for your comment. Basically, when I have an "industry" dimension, my fixed effects become importer-industry-time FE therefore absorbing my variables of interest. I will have to find another empirical strategy then. Thank you again,
Giulia
Comment
Andrew Chan

Join Date: Jun 2017

Posts: 17
#25

14 Jul 2017, 09:39

Sorry for the slow reply Tom.

I appreciate your detailed suggestions. I have experimented a bit with max() and will continue to see what I can do to solve my convergence problems. At the moment I have no questions but I will followup if I have any other problems.

Thanks again!
Comment
ana montes

Join Date: May 2018

Posts: 1
#26

09 May 2018, 10:35

Dear Santos-Silva,

I have read the post about the topic. I am having the same error message after making my sample a bit smaller. “variance matrix is nonsymmetric or highly singular”.
I am estimating a gravity model for migration with time*origin, time*destination and country pair fixed effect. 33 countries of destination and initially 193 origins. I was successful to estimate that model using ppml (takes 6 hours on average).
The command eliminated observations and variables (fixed effects). I decided to delete some of these myself and make my sample more "balanced" but after that I can no longer reproduce my results. I get this message: “variance matrix is nonsymmetric or highly singular” When I delete singletons, the problem is not solved.
is ppml sensible to sample size? or what could be the issue?

Thank you so much in advance if you read my post.

Best,
Ana
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3006
#27

09 May 2018, 15:26

Dear Ana,

The most likely cause is that you are including variables that should be dropped; Stata sometimes has trouble identifying these. It may be that with the full sample all the coefficients are identified but that may not be the case with the smaller sample. Note that you are using a lot of fixed effects and therefore it will be difficult to identify the coefficients of other regressors. One thing you can do is to try the ppml_panel_sg command that is better at dealing with the fixed effects.

Best wishes,

Joao
Comment
Cirlene Matos

Join Date: May 2018

Posts: 4
#28

23 May 2018, 21:36

Dear Santos Silva,

I am running the ppml comand on a cross section gravity model on migrations but I got no standard erros, I got only dots instead.
I have 558 regions and used the code:

ppml migra ljaffe ldisteuclid contig uf do2-do558 dde2-dde558 , cluster(ldisteuclid)

migra are the number of migrations from region o to region d
ljaffe is a variable between 0 and 1 (in ln)
ldisteuclid is euclidian distance (in ln)
contig is a dummy for sharing a commom border
uf is a dummy for belonging to the same state
do2-do558 are dummy variables for origin
dde2-dde558 are dummy variables for destination

Stata dropped 785 regressors , which are origin and destination FE dummys, and returned the messages:

Warning: variance matrix is nonsymmetric or highly singular
WARNING: The model appears to overfit some observations with migra=0

Can I use the ppml_panel_sg command instead, even working with cross-section ? And if I can, would the comand below be correct?

ppml_panel_sg migra ljaffe ldisteuclid contig uf , ex(o) im(d) cluster(ldisteuclid) nopair

Please, I would like any advise on how I can solve this problem.
Thank you very much

Best
Cirlene
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3006
#29

24 May 2018, 02:26

Dear Cirlene,

I do not know about ppml_panel_sg but try the following:

- Run the model without the constant but including all the dummies (i.e., do not exclude the first category)
- Check for "singletons"

Best wishes,

Joao
Comment
Cirlene Matos

Join Date: May 2018

Posts: 4
#30

24 May 2018, 12:28

Dear Santos Silva,

I will do that. Thank you for your advice.

Best
Cirlene
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment