Problems with RESET test p-value for PPML estimators

Marianne Sasing

Join Date: Nov 2021

Posts: 10
#16

27 Nov 2021, 11:01

Hello Prof. Joao Santos Silva , I encountered a similar problem as Chan's when I applied ppml to a gravity model augmented with outward FDI stock, infrastructure level of the importing and exporting countries, and BRI participation. Here is my code following the Advanced Guide to Trade Policy Analysis by Yotov et al. (2016):

egen exp_time=group(exporter year)
quietly tabulate exp_time, generate(EXPORTER_TIME_FE)
egen imp_time=group(importer year)
quietly tabulate imp_time, generate(IMPORTER_TIME_FE)
egen pair_id=group(exporter importer)
quietly tabulate pair_id, generate(PAIR_FE)
ppml tradeflow_baci ln_distcap ln_ofdi rta bri bri_ln_exporter_infra bri_ln_importer_infra bri_ln_ofdi ln_ofdi_ln_exporter_infra ln_ofdi_ln_importer_infra PAIR_FE* EXPORTER_TIME_FE* IMPORTER_TIME_FE*, cluster(pair_id)

"bri_ln_exporter_infra", "bri_ln_importer_infra", "bri_ln_ofdi", "ln_ofdi_ln_exporter_infra", and "ln_ofdi_ln_importer_infra" are all interaction variables. For example, "bri_ln_importer_infra" = bri*ln(importer_infra) where bri is a dummy variable indicating participation to the Belt and Road Initiative at time t and "importer_infra" is the importing country's infrastructure level at time t.

I got the result below with no coefficient estimates for the independent variables:

note: starting ppml estimation
note: tradeflow_baci has noninteger values

Iteration 1: deviance = 1.09e+09
Iteration 2: deviance = 1.05e+09
Iteration 3: deviance = 1.05e+09
Iteration 4: deviance = 1.05e+09
Iteration 5: deviance = 1.05e+09
Iteration 6: deviance = 1.05e+09

Number of parameters: 1
Number of observations: 86
Pseudo log-likelihood: -5.236e+08
R-squared: .
Option strict is: off
(Std. err. adjusted for 23 clusters in pair_id)
------------------------------------------------------------------------------
Robust
tradeflow_~i | Coefficient std. err. z P>|z| [95% conf. interval]
------------------------------------------------------------------------------
_cons | 15.66561 .4062579 38.56 0.000 14.86936 16.46186
------------------------------------------------------------------------------

Alternatively, I tried using ppmlhdfe and got coefficient estimates for the independent variables and some of the fixed effects, but none of them had p-values, robust sd, z-scores, and confidence intervals. In the ppmlhdfe regression, it mentioned the following warnings: missing F statistic; dropped variables due to collinearity or too few clusters and variance matrix is nonsymmetric or highly singular. In the ppml regression, it said that it excluded 164 regressors to ensure that the estimates exist.

Would you happen to know what these could mean and how should I fix the code? Thank you so much!
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3010
#17

27 Nov 2021, 12:56

Dear Marianne Sasing,

I think that there are two problems here. One is that your sample is far too small to estimate the many parameters you are trying to estimate. The other is that you may be trying to include variables that are collinear with the fixed effects, and you may not be including the fixed effects correctly when you use ppmlhdfe. My suggestion is that you focus on ppmlhdfe and make sure you include the fixed effects correctly (e.g., you should not include PAIR_FE* and the fixed effect but should include pair_id instead).

Best wishes,

Joao
Comment
Marianne Sasing

Join Date: Nov 2021

Posts: 10
#18

27 Nov 2021, 13:54

Thank you so much for this Prof. Joao Santos Silva! We will try to obtain more observations and consider a simpler model with fewer parameters as well. Sorry but I'm quite new to Stata and to the ppmlhdfe command. I was hoping to ask if it is correct that I revised the code to this:
ppmlhdfe tradeflow_baci ln_distcap ln_ofdi rta bri bri_ln_exporter_infra bri_ln_importer_infra bri_ln_ofdi ln_ofdi_ln_exporter_infra ln_ofdi_ln_importer_infra, absorb(pair_id) ?

Here are the results I got:

Finally, may I also ask two more questions? (1)To justify causality, do we still need to test for stationarity and cointegration if our panel dataset is from 2010-2018 only? And do we also need to conduct tests for multicollinearity and serial correlation? For stationarity, I tried using the xtunitroot fisher test for our unbalanced panel dataset but it only returned the error code 2000:

xtunitroot fisher tradeflow_baci, dfuller lags(1)
performing unit-root test on first panel using the syntax
dfuller tradeflow_baci, lags(1)
returned error code 2000

Lastly, (2) following Head and Mayer (2014), we initially planned to obtain just one of the importer-fixed effect, exporter-fixed effect, and country-pair fixed effect and then regress each of those on all the independent variables they absorbed to get within estimates of the coefficients of those variables which were redacted from the fixed effects model. But in running the absorb command for ppmlhdfe, there was no coefficient that we could use to generate predicted values of the fixed effect for each observation and with which to conduct the second-step regression. Would you know what we might have missed? Here is the second-step regression I am referring to (where u_ij=country-pair fixed effect; X_jt=importer-time fixed effect; and π_it=exporter-time fixed effect):

Thank you so much again for your time!!
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3010
#19

28 Nov 2021, 03:50

Dear Marianne Sasing,

I am afraid I cannot see the results you tried to post; please check the FAQs to see how you can post results.

On your other questions, if you have data for enough countries you can ignore the stationarity issue. You can also forget about multicollinearity and you can deal with serial correlation by using clustered standard errors.

Please check the ppmlhdfe help file to see how you can save the fixed effects.

Best wishes,

Joao
Comment
Marianne Sasing

Join Date: Nov 2021

Posts: 10
#20

28 Nov 2021, 07:09

Dear Prof. Joao Santos Silva , sorry about that, here are the results! Your response to the other questions are also duly noted and are very helpful, thank you!
Attached Files
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3010
#21

28 Nov 2021, 07:43

These are on line with what would be expected.
Comment
Marianne Sasing

Join Date: Nov 2021

Posts: 10
#22

28 Nov 2021, 08:48

Thank you so much for your help!!
Comment
Marianne Sasing

Join Date: Nov 2021

Posts: 10
#23

14 Feb 2022, 08:15

Originally posted by Joao Santos Silva View Post

Dear Marianne Sasing,

I think that there are two problems here. One is that your sample is far too small to estimate the many parameters you are trying to estimate. The other is that you may be trying to include variables that are collinear with the fixed effects, and you may not be including the fixed effects correctly when you use ppmlhdfe. My suggestion is that you focus on ppmlhdfe and make sure you include the fixed effects correctly (e.g., you should not include PAIR_FE* and the fixed effect but should include pair_id instead).

Best wishes,

Joao

Good day Prof. Joao Santos Silva ! I apologize for getting back to you about this again after some time, but may I ask regarding your notes here, if we encounter a similar problem where estimates are produced but instead do not pass the RESET test, can this indication of misspecification also be attributed to a small sample size? Does this mean then that a sample size that is too small is enough reason for misspecification? May I also ask if, in employing PPML estimation, we should remove variables that are perfectly (or near perfectly) collinear with the dependent variable? For example, if we consider just one exporter for all partner importing countries and we have a variable for that exporter's infrastructure level across different time periods, then that means the value of the exporter's infrastructure level is the same for all countries in each time period, hence the perfectly collinear relationship. In such a case, can we safely remove that variable without risking omitted variable bias? Thank you so much!

Last edited by Marianne Sasing; 14 Feb 2022, 08:17.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3010
#24

14 Feb 2022, 08:39

Dear Marianne Sasing,

Yes, you can (and must!) drop variables that are collinear with others; this does not cause omitted variable bias.

As for whether a small sample size can lead to misspecificantion, what can happen is that the sample is not representative of the population. But it may also be the case that the test over-rejects in small samples and therefore the rejection is spurious. As you see, there are many problems with small samples...

Best wishes,

Joao
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment