PPML estimation, no standard errors reported

Zheng Wang

Join Date: Jan 2016

Posts: 5
#1

PPML estimation, no standard errors reported

18 Jan 2016, 10:50

Dear Statalist,

I'm doing a research project which involves using PPML to estimate a model with some of the dependent variables having zero values. However, I'm encountering a problem with PPML estimations, i.e. no standard errors are reported. I wonder if anybody has similar experience to this and any practical advice would be much appreciated.

Let me describe the data and results in more detail, if that helps.

Notation: j (country), k (industry), t (year).

Research question: I'm looking at how a policy variable "p_jkt" affects an economic outcome "y_jkt".

Data: 225 countries (j), 27 industries (k), 7 years (t).

Zeros in key variables (especially the key independent variable): 0s account for 11% of the observations of y_jkt, and 0s account for 96% (YES, 96%!) of the observations of p_jkt (in particular most jk cells contain only 0s in all years).

Additional controls: $xlist$, mostly defined at jt level.

Regressions and problems encountered as follows:

set matsize 11000, perm

xtset jk t

xi, prefix(_D) noomit i.j i.k i.t

xi, prefix(_E) noomit i.jk i.kt i.jt

* Reg 1 (OK)
ppml y_jkt p_jkt ${xlist} _Dj* _Dk* _Dt*

* Reg 2 (OK)
ppml y_jkt p_jkt ${xlist} _Ejt* _Ekt*

* Reg 3 (no standard errors reported for all regressors)
ppml y_jkt p_jkt ${xlist} _Ejk*

* Reg 4 (no standard errors reported for all regressors)
ppml y_jkt p_jkt ${xlist} _Ejk* _Dt*

* Reg 5 (no standard errors reported for all regressors)
ppml y_jkt p_jkt ${xlist} _Ejk* _Ekt*

So the main problems happened with Reg 3 - 5 where PPML converged but didn't give any standard errors for any coefficients.

Any advice or suggestions are very much appreciated.

Last edited by Zheng Wang; 18 Jan 2016, 10:54.
Tags: poisson, PPML, zeros
Joao Santos Silva

Join Date: Apr 2014

Posts: 3006
#2

18 Jan 2016, 15:04

Hello and thanks for this,

My guess is that some of your regressors are singltons. What happens if you estimate using -reg- with robust standard errors?

Joao
1 like
Comment
Zheng Wang

Join Date: Jan 2016

Posts: 5
#3

18 Jan 2016, 15:26

Hi Joao,

Thanks very much for your comment. (1) By singletons, do you mean jk dummies with only one obs that has the value of 1? I found the jk cells have at least 2 obs that have the value of 1. So not sure if they qualify as singletons. (2) Even if there are singletons, won't they be automatically dropped by ppml? (3) I used "areg, absorb(jk) cluster(jk)" to absorb the jk dummies, there didn't seem to be any problems with the estimation of the coefficients and s.e.

Originally posted by Joao Santos Silva View Post

Hello and thanks for this,

My guess is that some of your regressors are singltons. What happens if you estimate using -reg- with robust standard errors?

Joao

Last edited by Zheng Wang; 18 Jan 2016, 15:29.
Comment
Natasha Agarwal

Join Date: Jan 2016

Posts: 14
#4

18 Jan 2016, 20:43

Hello there,

I am also facing the same problem when it comes to singletons. I am yet not sure what singletons means. I have done a google search for the same and it says which says that a country-industry dummy takes the value of 1 for one observation but N-1 for the same country-dummy observations. If this is actually the case, then I do not see any singleton observations myself in the dataset as each country-industry at least have 2 observations. Is there a way to identify singletons?
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3006
#5

19 Jan 2016, 01:59

Singletons are dummies with N-1 observations that are equal. For example, N-1 zeros and one observation equal to 1. The problem with the singletons is that they will set some residuals to zero and that creates problems when computing the robust covariance matrix estimator. You can look directly for singletons (that has been discussed in this forum), or you can just compute the residuals and see if you have any zeros there (or residuals that are so small that are effectively zero).

Another possible cause of the problem you are facing is perfect collinearity. In principle Stata is able to detect those cases, but in some cases in lets in variables that should be dropped. You can try to use _rmcoll before you estimate the model.

Without having your data and understanding your model well, it is difficult for me to help much more on this, but a standard approach to this problem is to start by estimating a simpler model where that problem does not appear and then include more variables step by step to try to identify what is the variable causing the problem.

All the best,

Joao
Comment
Zheng Wang

Join Date: Jan 2016

Posts: 5
#6

20 Jan 2016, 16:24

Dear Joao,

Thanks for your guidance. I did find the singletons. However, I have another question regarding the data. Since the key RHS variable (in monetary values) has a large num of zeros (>90% of obs ), how should I deal with it? By theory (a modified gravity model), it has to be logged, but obviously then I will lose all these zeros. I guess adding 1 is also not a good solution for the same reason of using poission/ppml. So, is there any better way ahead?

Thanks very much in advance.

Originally posted by Joao Santos Silva View Post

Singletons are dummies with N-1 observations that are equal. For example, N-1 zeros and one observation equal to 1. The problem with the singletons is that they will set some residuals to zero and that creates problems when computing the robust covariance matrix estimator. You can look directly for singletons (that has been discussed in this forum), or you can just compute the residuals and see if you have any zeros there (or residuals that are so small that are effectively zero).

Another possible cause of the problem you are facing is perfect collinearity. In principle Stata is able to detect those cases, but in some cases in lets in variables that should be dropped. You can try to use _rmcoll before you estimate the model.

Without having your data and understanding your model well, it is difficult for me to help much more on this, but a standard approach to this problem is to start by estimating a simpler model where that problem does not appear and then include more variables step by step to try to identify what is the variable causing the problem.

All the best,

Joao
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3006
#7

20 Jan 2016, 16:47

Dear Zheng,

I would say that PPML should be your staring point.

All the best,

Joao
Comment
Zheng Wang

Join Date: Jan 2016

Posts: 5
#8

21 Jan 2016, 02:28

Hi Joao,

Thanks. But the problem is, even using PPML/Poisson, I have to log the RHS variable p_jkt by theory, could you suggest anyway to deal with zeros with p_jkt?

Originally posted by Joao Santos Silva View Post

Dear Zheng,

I would say that PPML should be your staring point.

All the best,

Joao
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3006
#9

21 Jan 2016, 16:05

Dear Zheng,

There is no right answer for this, but lots of wrong ones. One trick that is often used is as follow:

a) log p_jkt and replace the missing values by zeros;
b) create a dummy equal to 1 for the observations where p_jkt equals zero
c) include both variables in the model.

All the best,

Joao
Comment
Zheng Wang

Join Date: Jan 2016

Posts: 5
#10

22 Jan 2016, 13:00

Thanks very much Joao. This trick is actually very similar to what I have just read. Much appreciation for your very useful and patient guidance so far.

Originally posted by Joao Santos Silva View Post

Dear Zheng,

There is no right answer for this, but lots of wrong ones. One trick that is often used is as follow:

a) log p_jkt and replace the missing values by zeros;
b) create a dummy equal to 1 for the observations where p_jkt equals zero
c) include both variables in the model.

All the best,

Joao
Comment
Andrew Chan

Join Date: Jun 2017

Posts: 17
#11

07 Jul 2017, 14:01

Dear Statalist,

I have a similar problem to Zheng, where no standard errors are reported after estimating a PPML model. Here some info about the data and the model I am estimating.
About 6500 observations: 35 countries, 150 groups each in 2 or more countries, 20 time periods.

Notation: country (c) group (g) time (t)

My outcome y_cgt has many zeros (~30%), and I'm interested in how x_cgt affects y_cgt.

I tried the following:

xi, prefix(_CG) noomit i.cg
xi, prefix(_CT) noomit i.ct
xi, prefix(_GT) noomit i.gt
ppml y_cgt x_cgt _CG*, cluster(cg)
[this reports standard errors]

ppml y_cgt x_cgt _CT*, cluster(cg)
[this does not report standard errors; warning = variance matrix is nonsymmetric or highly singular]

ppml y_cgt x_cgt _CG* _CT*
[this does report standard errors]

ppml y_cgt x_cgt _CG* _CT*, cluster(cg)
[this does not report standard errors; warning = variance matrix is nonsymmetric or highly singular]

ppml y_cgt x_cgt _CG* _CT* _GT*
[this does not report standard errors; warning = variance matrix is nonsymmetric or highly singular]

ppml y_cgt x_cgt _CG* _CT* _GT*, cluster(cg)
[this does not report standard errors; warning = variance matrix is nonsymmetric or highly singular]

I don't know how to make sense of this. I have no problem running these regressions when I use REGHDFE and absorb the large number of fixed effects.

I checked and from what I can tell all fixed effect dummies have a value of one at least twice so I don't think the problem is singletons. However a lot of the fixed effects are dropped because of collinearity when using PPML. To make matters more confusing, REGHDFE doesn't drop anything and reports no singletons when I estimate the identical regression using REGHDFE instead of PPML. Any suggestions what is happening or what to check? I would really like to see the PPML estimates with standard errors. Thanks.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3006
#12

07 Jul 2017, 16:00

Dear Andrew,

I suggest you try -ppml_panel_sg-; comparing with REGHDFE is not really helpful.

Best wishes,

Joao
Comment
Andrew Chan

Join Date: Jun 2017

Posts: 17
#13

08 Jul 2017, 10:40

Dear Joao,

Thank you for your response and suggestion. I have related question then about -ppml_panel_sg- in terms of how I specify fixed effects. As noted in my original post, I have (i) country-group , (ii) country-year and (iii) group-year fixed effects, and my unit of observation is a country-group-year. So I am not estimating a gravity model. How might you suggest I specify the fixed effects given that -ppml_panel_sg- requires I enter origin, destination, and time period fixed effects?

I tried the following but it receive an error message telling me I have insufficient observations?

ppml_panel_sg y_cgt x_cgt, exporter(id_GT) importer(id_CG) year(id_CT)

where id_GT = group-time indicator, id_CG = country-group indicator and id_CT = country-time indicator.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3006
#14

08 Jul 2017, 13:16

Dear Andrew,

The author of -ppml_panel_sg-, Tom Zylkin, frequently contributes to the forum and he may be able to help.

Best wishes,

Joao
Comment
Tom Zylkin

Join Date: Nov 2016

Posts: 188
#15

09 Jul 2017, 03:25

Hi Andrew,

As you say, this is something that you unfortunately cannot currently do with ppml_panel_sg, since that command is set up to work with gravity models. Which is the smallest of your FE dimensions? What I was going to suggest was to use Paulo Guimaraes's command "poi2hdfe", where you might use the command's differencing algorithm to difference out the larger two FEs, but explicitly estimate coefficients for the smallest. (Specifically, I'm referring to the FEs you would assign to fe1() and fe2() with poi2hdfe.)

That might work if there are not too many fixed effects in the smaller dimension. Anyway, I hope that it does.

Regards,
Tom
Comment

Announcement

PPML estimation, no standard errors reported

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment