PPML,panel data - Statalist

Ann Ng

Join Date: Apr 2017

Posts: 25
#16

11 Apr 2017, 16:25

Dear Joao,
Thank you very much for your helpful reponses. I have one question about the RESET test. According to your paper - The log of gravity (2006), RESET test is to check the adequacy of the estimated models. I performed the test according to the instruction on your website. For example,
xi: ppml FDI indvars i.year i.pair, cluster(dist)
eststo ppml1
predict fit, xb
gen fit2 = fit^2
xi: ppml FDI indvars i.year i.pair fit2, cluster(dist)
test fit2 = 0
I run the test for all models and all p-values cant reject the null hypothesis at 5% (or even 10%). This means my model is inappropriate. But this inadequacy is because the estimation method (ppml) is irrelevant or my model specification is inappropriate?
Thank you very much for your time!
Best regards,
Anh
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3012
#17

12 Apr 2017, 04:39

Dear Ann,

Good news: the null hypothesis of the RESET test is that the model is correctly specified; if you cannot reject it at any reasonable level you have nothing to worry about.

Anyway, if the models fail the test that suggests that there is a problem with the model specification, not with the estimator.

Best wishes,

Joao
1 like
Comment
Ann Ng

Join Date: Apr 2017

Posts: 25
#18

27 Apr 2017, 05:38

Dear Joao,
According to your papers, it is recommended to estimate the nonlinear equation - logged continuous regressors and original dependent variable - by PPML. However, with my bilateral FDI equation, there are a number of interaction variables which are equal the the product of 2 continuous variables. If i take the log of these variables, then all of them will become just the single variable. Can I use PPML without taking log of continuous regressors? Are there any significant impact if I do this?
Thank you very much for your time!
Best regards,
Anh
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3012
#19

27 Apr 2017, 11:57

Dear Ann,

You can use as regressors whatever you think is appropriate.

Best wishes,

Joao
1 like
Comment

Jeff Wooldridge

Join Date: Apr 2014
Posts: 2168

#20

28 Apr 2017, 05:52

A couple of comments about ppml versus xtpoisson, and Joao's example. First, it is true that, unfortunately, xtpoisson currently only allows clustering at the level of the panel data cross-sectional identifier. This is unfortunate, and should be fixed in future versions of Stata. I should note that Tim Simcoe, currently at the BU Business School, wrote the program xtpqml, which is a wrapper for xtpoisson. xtpqml allows clustering at different levels, like ppml. I believe Tim's program has been available since 2007, and I have used it several times.

As Joao stated, xtpoisson will have a computational advantage in the cases of large cross sectional dimensions because it does not put in a dummy variable for each cross sectional unit.

Joao claims that ppml does a better job at identifying variables that should be dropped, and illustrates this claim using the data set http://personal.lse.ac.uk/tenreyro/mock.dta. This data set seems a bit contrived, and isn't anything close to a data set one would actually analyze. It has one usable cross section observation corresponding to w = 0, and then it has 98 cases within w = 0. So it's like having one country.

Having said that, I am confused about the findings. Hopefully Joao can weigh in, as he claims that ppml properly drops the variable z while xtpoisson does not. I must be missing something. It's true that z does not vary much. Of the 98 cases for w = 0 (the only usable ID), z = 0 for 96 cases and z = 1 for 2 cases. Now, it's true that the data have been generated so that y > 0 for z = 0 and y = 0 for z = 1, but I still don't see that z should be dropped. Generally, any variable that has variation within each identify stays in Poisson FE estimation. z does have some variation. In fact, z = 1 is actually a good predictor of y = 0. And, in fact, using xtpoisson, z has a very negative coefficient. Naturally, it's very statistically insignificant, but I don't see why it should've dropped out. The huge standard error is telling us we cannot have much faith in the coefficient estimate, but that's different from saying it should fall out. And so now I'm puzzled ppml drops z.

To carry this a bit further, ppml and xtpoisson should give the same as just a poisson regression using w = 0, as that is the only usable ID. When I use poisson, z again does not drop out, although again its standard error is very large. I don't think this is numerical error.

Now, while ppml and poisson give the same estimates on x to the reported digits, xtpoisson does not, and so it seems subject to a bit of numerical imprecision. There's probably a way to fix that, but applying xtpoisson with one usable cross section would never be done.

There are some mysteries to be solved here, and I wonder if the differences occur with more standard panel data sets with N bigger than a few (or one).

Incidentally, note that the data set Joao generated doesn't allow clustering because there's only one cross-sectional unit that is useful for FE estimation.

Code:

.  xi: ppml y x z i.w
i.w               _Iw_1-3             (_Iw_1 for w==-1 omitted)

note: checking the existence of the estimates

Number of regressors excluded to ensure that the estimates exist: 3
Excluded regressors:  z _Iw_2 _Iw_3
Number of observations excluded: 4

note: starting ppml estimation
note: y has noninteger values

Iteration 1:   deviance =  126.4931
Iteration 2:   deviance =  124.7082
Iteration 3:   deviance =  124.7033
Iteration 4:   deviance =  124.7033

Number of parameters: 2
Number of observations: 96
Pseudo log-likelihood: -167.38815
R-squared: .01191314
Option strict is: off
------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   .1033029   .0986563     1.05   0.295    -.0900598    .2966657
       _cons |   .5584825   .0958664     5.83   0.000     .3705878    .7463772
------------------------------------------------------------------------------

. xtpoisson y x z, fe
note: you are responsible for interpretation of non-count dep. variable
note: 2 groups (2 obs) dropped because of only one obs per group

Iteration 0:   log likelihood = -168.29669  
Iteration 1:   log likelihood = -164.89046  
Iteration 2:   log likelihood = -164.13606  
Iteration 3:   log likelihood =  -163.9467  
Iteration 4:   log likelihood = -163.91054  
Iteration 5:   log likelihood = -163.90469  
Iteration 6:   log likelihood = -163.90327  
Iteration 7:   log likelihood = -163.90297  
Iteration 8:   log likelihood = -163.90291  
Iteration 9:   log likelihood = -163.90289  

Conditional fixed-effects Poisson regression    Number of obs     =         98
Group variable: w                               Number of groups  =          1

                                                Obs per group:
                                                              min =         98
                                                              avg =       98.0
                                                              max =         98

                                                Wald chi2(2)      =       1.80
Log likelihood  = -163.90289                    Prob > chi2       =     0.4068

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   .1034529   .0771442     1.34   0.180     -.047747    .2546528
           z |  -14.12826   637.0267    -0.02   0.982    -1262.678    1234.421
------------------------------------------------------------------------------

. poisson y x z if w == 0
note: you are responsible for interpretation of noncount dep. variable

Iteration 0:   log likelihood = -167.85043  
Iteration 1:   log likelihood = -167.38816  
Iteration 2:   log likelihood = -167.38815  

Poisson regression                              Number of obs     =         98
                                                LR chi2(2)        =       8.79
                                                Prob > chi2       =     0.0124
Log likelihood = -167.38815                     Pseudo R2         =     0.0256

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   .1033029    .077143     1.34   0.181    -.0478946    .2545004
           z |  -19.86129   11195.73    -0.00   0.999    -21963.09    21923.37
       _cons |   .5584826   .0775793     7.20   0.000     .4064299    .7105353
------------------------------------------------------------------------------

Comment

Joao Santos Silva

Join Date: Apr 2014

Posts: 3012
#21

28 Apr 2017, 15:46

Dear Jeff,

I guess that the key to the mystery can be found here:

Originally posted by Jeff Wooldridge View Post

Now, it's true that the data have been generated so that y > 0 for z = 0 and y = 0 for z = 1, but I still don't see that z should be dropped.

Just like in a binary model, z should be dropped because it perfectly predicts a zero outcome. Indeed, z will also drop if we do

Code:

logit y x z

If we do not drop the perfect predictors, Stata will either iterate forever, or produce a spurious estimate of the coefficient of z (the very negative and insignificant estimates you mention). In most cases it iterates forever, but in this particular example it doesn't; that is why the data is so "contrived". So, the reason to drop z has nothing to do with the lack of variation within the identifier, which is apparently what you had in mind.

The problems with perfect predictors are well known in the context of binary choice models and Stata automatically drops perfect predictors (as in the example above); ppml does the same thing for Poisson regression. The same problem also exists in other models such as the Tobit, and ppml can be used to solve that; please see the final example in the latest version of ppml.

We have addressed these problems is a couple of short papers:
Santos Silva, J.M.C. and Tenreyro, Silvana (2011), poisson: Some Convergence Issues, STATA Journal, 11(2), pp. 207-212.

Santos Silva, J.M.C. and Tenreyro, Silvana (2010), On the Existence of the Maximum Likelihood Estimates in Poisson Regression, Economics Letters, 107(2), pp. 310-312.

Finally, about the different results for xtpoisson: as you know, the objective function of xtpoisson is different form the objective function for poisson (and ppml) and therefore small differences can be expected unless the convergence criteria used are very stringent. So, I do not think that the evidence available allows us to conclude that xtpoisson is subject to a bit of numerical imprecision; it may be the other way around.

Please do let me know if any of the above is unclear or if I missed any of your questions.

Best wishes,

Joao
1 like
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2168
#22

29 Apr 2017, 12:27

Yes, Joao, that makes sense. Thanks. I should've looked for your papers. It also explains why xtpoisson and poisson give different estimates for the coefficient on z.
Comment
Dr. Iqra Yaseen

Join Date: Jun 2022

Posts: 89
#23

09 Oct 2023, 04:30

Joao Santos Silva may i use ppmlhdfe without xtsetting my panel data???
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3012
#24

09 Oct 2023, 10:58

Dear sybil arqi,

I think you can, but why would you want to do that?

Best wishes,

Joao
Comment
Dr. Iqra Yaseen

Join Date: Jun 2022

Posts: 89
#25

10 Oct 2023, 01:53

Joao Santos Silva i need to use ppmlhdfe without xtset my data bcx once i had mi set my data to cope with missing data but later on i simply dropped the _mi_set variable...now when i try to xtset my data it shows error message of mi_set data i tried everything to mi unset my data i tried mi_extraxt and everything else...but i couldn't unset my mi set data
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment