PPML, panel data - Statalist

Peter Christoforou

Join Date: Apr 2021

Posts: 2
#316

07 Apr 2021, 08:51

[QUOTE=Joao Santos Silva;n1336758]Dear JD,

Hello professor Joao Santos Silva,

I have a question based on the answer you gave to dupont john about using PPML over OLS even if the the dataset has NOT zero trade flows. In that scenario (no zero trade flows), we expect that heteroscedasticity of the error would be less so I do not understand why the results from OLS and PPML could not be almost the same?

I am trying to estimate the impact of migration on trade and and I am wondering why the estimators of PPML and OLS differ between each other ( they have a different sign and they are both significant). I do not have zero trade flows. I was wondering if that implies that I misspecified model or I have to ignore one answer. The sign of the PPML estimator makes more sense.

I know you cannot answer the second question based on the information I gave you. I am only concerned about the first question (paragraph one)

Best regards

P.Christoforou, undegraduate student
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3008
#317

07 Apr 2021, 10:47

Dear Peter Christoforou,

Even without zeros, trade data is very heteroskedastic, so I am not surprised that the results are different and that the ones obtained with PPML make more sense

Best wishes,

Joao
1 like
Comment
Peter Christoforou

Join Date: Apr 2021

Posts: 2
#318

15 Apr 2021, 08:12

Dear Professor Joao Santos Silva,

Thanks for the advice before, it was very helpful. I also want to make I a new question. I have read your article (THE LOG OF GRAVITY, 2006, very good article by the way) and I have some questios.
I understand how the log- transformation causes inconsistency in the presence of heteroskedasticity, but I am not sure why Poisson estimator is ideal estimator in this circumstance.

I understand that Poisson gives consistent estimators if the conditional mean is correctly parametrized (BY C. GOURIEROUX, A. MONFORT, AND A. TROGNON).

Model

Stage 1: T=ao * Yi^a1 * Yj^a2 * D^a3 which is equals to
Stage 2: lnT= (b0+a2lnYi + a3lnYj +a3lnD)
Stage 3: T= exp(b0+a2lnYi + a3lnYj +a3lnD)

Q1. Is the third stage when we parametrized the model so the conditional mean is correctly parametrized and it coincides with the Poisson estimator under C. Gourieroux, A. Monfort, And A. Trogno?
Q2. The interpretation of the estimations are the same with the OLS because logarithms were taken in the second stage?

I know that my questions have no relation with the Stata. Sorry for asking so many questions!

Best wishes,

Peter Christoforou
Comment
Ridwan Sheikh

Join Date: Apr 2021

Posts: 167
#319

20 May 2021, 05:10

Dear Joao,

I have PANEL DATA on bilateral imports of 84 developing countries from 24 OECD countries at three-digit ISIC industry level, from 2000-2015. The zero trade flows problem is usually occuring at disaggregated level (may be 4-digit or 5-digit product level) . However in my case i have zero trade values in my dependent variable also at three digit ISIC sectoral level. Therefore to effectively account for them, I am trying to estimate the parameters of my Model using PPML . I am using STATA 14.2 , what are the codes to estimate my panel using PPML. Important to mention that , I am using Baier and Bergstrand (2009) Gravity Model approach that does not require using any country , time and industry specific fixed effects. They argue that the reduced form Gravity Model (which they derive using Taylor series approximation) can be estimated using OLS , which give identical results as those obtained from using FE models of approxmating Multilateral resistance . However OLS performs well if there are not zero valued trade flows in the dependent variable.
I have estimated the parameters of my model using OLS also (as suggest by Baier et al.) , But want to check the robustness using PPML .
please guide me what codes will help me to estimate the paramters of my three dimensional panel data using STATA 14.2
Thanks
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3008
#320

20 May 2021, 05:35

Dear Ridwan Sheikh,

The standard commands would be ppml and ppmlhdfe; I am not entirely sure they work on your version of Stata, but hopefully at least one of them does. Estimating the model with fixed effects is very easy if you can use ppmlhdfe.

Best wishes,

Joao
1 like
Comment
Dani Rojas

Join Date: May 2021

Posts: 4
#321

20 May 2021, 09:43

Dear Joao Santos Silva , when I use xtpoisson with fixed effects stata delete all the data for which my dependent variable is zero. That's fine? Does this mean that stata is not considering that information in the estimate? Data that is zero is not deleted when I use ppml, but I need to incorporate fixed effects, that is why I am using xtpoisson. I hope you can help me.

Best regards
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3008
#322

20 May 2021, 10:11

Dear Dani Rojas,

That is absolutely fine because those observations are not informative (it does not delete all zeros, but only the observations for the pairs that always have zeros). However, I suggest you use ppmlhdfe.

Best wishes,

Joao
Comment
Dani Rojas

Join Date: May 2021

Posts: 4
#323

20 May 2021, 10:43

Thank you very much Joao Santos Silva , you don't know how much I appreciate your help
1 like
Comment
Ridwan Sheikh

Join Date: Apr 2021

Posts: 167
#324

21 May 2021, 06:24

Thank you very much Joao Santos Silva , I highly appreciate for getting back to me .
I have two last question . (1) Can we still use PPML when we do not have country-pair, importer or exporter and industry fixed effects in the model . Because as of now I am not including any of the Fixed effects in my model owing to the fact that I am approximating multilateral resistance using first order Taylor series expansion, Baier and Bergstrand (2009, 2010). However due to presence of heteroskedasticity and zero values in my dependent variable, I prefer PPML over OLS . But my confusion is can we use PPML when we do not have any fixed effects in the model ?
(2) As suggested by you when I tried running :
ppml Trade_Values lnGDP lnPOP lnDistance CommLng CommCol (othercovariates) if industry== 1, cluster(distwces)
I get a WARNING that dependenent variable (Trade_Values) and some independent variables have large values, consider rescaling or recentering. However the STATA do produce results on the parameters after some iterations . I don't know whether those results are correct or not ? given the fact that you argued in on of the papers (Poisson : some convergence issues) that STATA may find it difficult to find PPML estimates when some variables have large values, this issue arises because PPML command is sensitive to numerical problems. I want to ask, what rescaling I should be using in my model to the variables (dependent and independent) that report this problem ?
Please get back to me about these two questions, I shall be very thankful to you.
I am new to STATA and extremely sorry if I am asking too much .
Regards,
Ridwan
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3008
#325

21 May 2021, 06:45

Dear Ridwan Sheikh,

1) Yes, PPML does not require fixed effects.
2) If you get results, you can safely ignore those warnings.

Best wishes and best of luck,

Joao
1 like
Comment
Ridwan Sheikh

Join Date: Apr 2021

Posts: 167
#326

21 May 2021, 11:42

Thank you very much Joao Santos Silva for clarification, I find it very useful .
regards,
Ridwan
Comment
Felix Dornseifer

Join Date: Jan 2021

Posts: 3
#327

17 Jun 2021, 08:51

Dear all,
Despite the fact that I am using ppmlhdfe, I hope this is the right place to ask my question. I am estimating a gravity model on a panel data set (6 years, 26k observations of FDI on a set with around 100 origin and 160 destination countries) to explain FDI and want to include an interaction term:

Code:

ppmlhdfe fdi log_sci c.log_sci#c.log_entry_time_d $controls, a(from_country_iso#year in_country_iso#year year) cluster(from_country_iso#year in_country_iso#year year)

log_entry_proc_d measures the time in days needed to open a business in the destination country, log_sci is a bilateral measure for social connections, both are count variables. $controls is a vector of several bilateral controls.

My question is: How to interpret the coefficient of the interaction term as descriptive as possible? I found some articles that interact bilateral variables with unilateral variables that take values between 0 and 1 which is then relatively straight forward to interpret (M Bailey, A Gupta, S Hillenbrand, T Kuchler, R Richmond, J Stroebel; International trade and social connectedness, Journal of International Economics 129, 103418). However I am not really sure how to do that when interacting a bilateral variable with and unilateral count variable. I guess it has to be interacted in logs (like I did) then but interpretation gives me some headache. Ideally, I would like to formulate a statement like: If it takes x number of days to open a business in country A, then the elasticity of FDI w.r.t. SCI is y.
Thanks for the help!
Comment
Tom Zylkin

Join Date: Nov 2016

Posts: 188
#328

18 Jun 2021, 05:55

Originally posted by Felix Dornseifer View Post

Dear all,
Despite the fact that I am using ppmlhdfe, I hope this is the right place to ask my question. I am estimating a gravity model on a panel data set (6 years, 26k observations of FDI on a set with around 100 origin and 160 destination countries) to explain FDI and want to include an interaction term:

Code:

ppmlhdfe fdi log_sci c.log_sci#c.log_entry_time_d $controls, a(from_country_iso#year in_country_iso#year year) cluster(from_country_iso#year in_country_iso#year year)

log_entry_proc_d measures the time in days needed to open a business in the destination country, log_sci is a bilateral measure for social connections, both are count variables. $controls is a vector of several bilateral controls.

My question is: How to interpret the coefficient of the interaction term as descriptive as possible? I found some articles that interact bilateral variables with unilateral variables that take values between 0 and 1 which is then relatively straight forward to interpret (M Bailey, A Gupta, S Hillenbrand, T Kuchler, R Richmond, J Stroebel; International trade and social connectedness, Journal of International Economics 129, 103418). However I am not really sure how to do that when interacting a bilateral variable with and unilateral count variable. I guess it has to be interacted in logs (like I did) then but interpretation gives me some headache. Ideally, I would like to formulate a statement like: If it takes x number of days to open a business in country A, then the elasticity of FDI w.r.t. SCI is y.
Thanks for the help!

Hi Felix,
If I understand correctly, you can calculate the elasticity as a function of the number of days. In Stata syntax, it looks like this:

HTML Code:

y = _b[log_sci] + x * _b[c.log_sci#c.log_entry_time_d]

A natural way to present this would be to calcuate y at at the mean value of x. Or, you can demean x beforehand; then the coefficient on log_sci gives you the elasticity at the mean value of x. Hopefully this is clear, but the second coefficient tells you how much a change in the number of days affects the elasticity of fdi with respect to log_sci.

Regards,
Tom
Comment
Felix Dornseifer

Join Date: Jan 2021

Posts: 3
#329

18 Jun 2021, 07:17

Hi Tom,
Thanks for your answer, you have confirmed my guess. Conversely, this means that the coefficient of log_sci by itself has no meaningful interpretation, since the value 0 for the number of days to open a business is not realistic. That point puzzled me the most.

Best
Felix
Comment
alessio lombini

Join Date: Dec 2020

Posts: 98
#330

28 Jun 2021, 08:47

Dear professor Joao Santos Silva,

I am estimating a gravity model on a panel dataset of 59 countries over 11 years. My dependent variable is foreign value-added in gross export and, although the dependent variable does contain few zero, I will run a Poisson pseudo-likelihood regression. Moreover, I will also include importer- and exporter-time fixed effects. I have two questions that I did not see answered in previous posts:

1. Do I need to perform any diagnostic test pre or post estimation? For instance, is the PPML regression consistent in the case of cross-sectional dependence and/or serial correlation?
2. Do I need to take the log of variables that are expressed in percentage or I can include them as a percentage?

Thank you in advance for your attention and best regards
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment