PPML Model Estimation with Dyadic Data and Three-way Fixed Effects

Mauricio Carvalho

Join Date: May 2018

Posts: 22
#1

PPML Model Estimation with Dyadic Data and Three-way Fixed Effects

16 Apr 2024, 06:20

Dear Statalist community,

I am currently working on a research project that involves estimating a Poisson Pseudo-Maximum Likelihood (PPML) model in Stata. My dataset is structured in a panel dyadic format, representing origin-to-destination firm migration.

Considering the origin municipality as "i" and the destination as "j," to apply the most recommended level of fixed effects in a three-way framework (e.g., Yotov et al., 2016), namely pair fixed effects, one needs to have explanatory variables that vary at the "ij,t" level. Contrary to what is common in trade literature, I do not have such variables. Instead, I aim to estimate a variable in differences, i.e., the difference between the time required to access motorways at the destination and that at the origin. For example:

delta_X_itj = x_jt - x_it

However, upon estimating the model using

Code:

ppmlhdfe

, the result yields zeros (omitted). My intuition suggests that this outcome makes sense as my variable is after all a linear combination of both "jt" and "it" variables, potentially leading it to be absorbed by the fixed effects.

So, the command is:

Code:

ppmlhdfe y tt_mtw_ar_diff /// log_distance_rn_rdist control1_diff control2_diff control3_diff control4_diff , absorb(origin#year destination#ano origin#destination) vce(cluster orgin#destination) d predict lambda matrix beta = e(b) ppml_fe_bias y tt_mtw_ar_diff, i(ldest) j(ldest_destino) t(ano) lambda(lambda) beta(beta)

Where tt_mtw_ar_diff is the variable of interest; log_distance_rn_rdist is the distance by road (time-variant) between origin I and destination j

However, upon running the

Code:

ppml_fe_bias

command, which incorporates the correction proposed by Weidner and Zylkin (2021), I obtain a negative coefficient (different from zero). In other words, it seems peculiar to me that with this correction, the variable of interest can be estimated, whereas without it, it cannot. Is this an expected outcome? Am I doing something wrong?

Thank you for your help and insights.

Kind Regards,
Mauricio.
Tags: gravity model, panel data, PPML, ppml_fe_bias, three-way fixed effects
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#2

16 Apr 2024, 22:47

Dear Mauricio Carvalho,

I believe you are not using ppml_fe_bias correctly. Try using in the options i, j, and t exactly the same variables that are absorbed in ppmlhdfe.

Best wishes,

Joao
Comment
Mauricio Carvalho

Join Date: May 2018

Posts: 22
#3

17 Apr 2024, 03:00

Dear Prof João Joao Santos Silva

Apologies for my mistake. Indeed, I am using the same fixed effects absorbed in ppmlhdfe. I just attempted to make it more reader-friendly, but I forgot to change it throughout. To clarify, "ldest" corresponds to "origin," and "ldest_destino" corresponds to "destination" -- and "ano" corresponds to year. Once again, I've attached the estimation below. Here, to simplify, I am regressing y_ijt on x_itj only (that is, tot_reloc on tt_mtw_ar_diff) with three-way fixed effects.

Code:

ppmlhdfe tot_reloc tt_mtw_ar_diff , absorb(ldest#ano ldest_destino#ano ldest#ldest_destino, save) vce(cluster ldest#ldest_destino) d predict lambda matrix beta = e(b) ppml_fe_bias tot_reloc tt_mtw_ar_diff, i(ldest) j(ldest_destino) t(ano) lambda(lambda) beta(beta)

Thank you very much, Professor!
Comment
Mauricio Carvalho

Join Date: May 2018

Posts: 22
#4

17 Apr 2024, 10:32

I have a related question regarding #1 and #2. As I mentioned earlier, since my variable of interest does not vary at the "ijt" level, I am using the difference of this variable — i.e., destination minus origin. The "i" and "j" represent units at the municipality level. I am considering whether it would be reasonable, in order to continue using the three-way fixed effects (best practice), to "level-up" the regional level of the fixed effects. For example, instead of controlling for municipality_origin#time; municipality_destination#time; and municipality_origin#municipality_destination, I would use nuts3_origin#time, nuts3_destination#time, and nuts3_origin#nuts3_destination. Here, nuts3 refers to the NUTS3 regions as defined by the EU, which, in my case, they are larger regions than individual municipalities.

In this latter case, would the three-way FE-PPML still be subject to asymptotic bias?

Perhaps Professor Tom Zylkin could offer some guidance here as well?

Thank you in advance!
Comment
Tom Zylkin

Join Date: Nov 2016

Posts: 188
#5

17 Apr 2024, 20:47

Dear Mauricio,
Since your initial estimate is not identified in the first step, it is not appropriate to apply a bias correction in the second step, and this should not be expected to give you meaningful results. I agree it is strange it should give you a result at all given how you defined the variable. That is something I can look into for you if you want. As another tip, when using ppml_fe_bias, please keep in mind you need to include all the same x variables as the original model.

For the second question, assuming there are a reasonable number of municipalities in each region we should be less concerned about asymptotic bias in that case. You will probably want to add controls for variables that vary by municipality, such as log population and log income (if you have such data).

Regards,
Tom
2 likes
Comment
Mauricio Carvalho

Join Date: May 2018

Posts: 22
#6

18 Apr 2024, 04:02

Dear Tom Zylkin,

Thank you very much!

Yes, you are right. I should not apply the bias correction in the second step since my parameter of interest was not identified in the first step. Although it might seem obvious, I just realized that my variable of interest (delta_X_itj = x_jt - x_it) is a linear combination of "jt" and "it" variables after obtaining a "0/omitted" result in the first step.

Regarding the second question, I have 23 NUTS3 regions and each region has an average of 12 municipalities. Do you think this is enough? Overall, I have a three-time period 275 × 274 matrix of relocation flow (origin-destination pairs), resulting in a panel of 226,050 observations. If you allow me, can I ask one more question? Could you recommend some references or intuition to gain a better understanding of the fact that if the number of municipalities in each region is large enough, we can be less concerned with asymptotic bias?

Thank you for your tip regarding including the same X variables in the "ppml_fe_bias" command and for including some controls regarding my second question!

I think I will go for estimating my model with three-way FE-PPML with the higher regional level (NUTS3) and including controls, as you suggested.

Kind Regards,
Mauricio.
Comment
Tom Zylkin

Join Date: Nov 2016

Posts: 188
#7

18 Apr 2024, 07:12

Hi Mauricio,
For some intuition, please read Section 2 of the Weidner and Zylkin paper along with Appendix A.8 in our online appendix. In short, what matters are (i) how many observations there are to estimate each fixed effect, which determines the order of the bias, and (ii) the order of the standard error. If you are assuming clustering by pair and treating T is fixed, the order of the standard errors is 1/N. The order of the bias can be thought of 1/(R*N) where R is the average number of municipalities per region (this is based on an asymptotic where both R and N become large). Therefore, if we can treat R as large the bias will be small relative to the standard error. R=12 is not necessarily a large number, but we can still expect the bias to be (approximately!) 12x less, which is a big difference.

Also please note that if a variable is perfectly collinear, its coefficient estimate is not 0 but rather it is undetermined.
Regards,
Tom
1 like
Comment
Mauricio Carvalho

Join Date: May 2018

Posts: 22
#8

19 Apr 2024, 01:46

Dear Professor Tom Zylkin,

Thank you very much for your invaluable assistance!

Kind regards,
Mauricio.
Comment

Announcement

PPML Model Estimation with Dyadic Data and Three-way Fixed Effects

Comment

Comment

Comment

Comment

Comment

Comment

Comment