Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • PPML Model Estimation with Dyadic Data and Three-way Fixed Effects

    Dear Statalist community,

    I am currently working on a research project that involves estimating a Poisson Pseudo-Maximum Likelihood (PPML) model in Stata. My dataset is structured in a panel dyadic format, representing origin-to-destination firm migration.

    Considering the origin municipality as "i" and the destination as "j," to apply the most recommended level of fixed effects in a three-way framework (e.g., Yotov et al., 2016), namely pair fixed effects, one needs to have explanatory variables that vary at the "ij,t" level. Contrary to what is common in trade literature, I do not have such variables. Instead, I aim to estimate a variable in differences, i.e., the difference between the time required to access motorways at the destination and that at the origin. For example:

    delta_X_itj = x_jt - x_it

    However, upon estimating the model using
    Code:
    ppmlhdfe
    , the result yields zeros (omitted). My intuition suggests that this outcome makes sense as my variable is after all a linear combination of both "jt" and "it" variables, potentially leading it to be absorbed by the fixed effects.

    So, the command is:

    Code:
    ppmlhdfe y tt_mtw_ar_diff ///
    log_distance_rn_rdist control1_diff control2_diff control3_diff control4_diff , absorb(origin#year destination#ano origin#destination) vce(cluster orgin#destination) d
    predict lambda 
    matrix beta = e(b) 
    ppml_fe_bias y tt_mtw_ar_diff, i(ldest) j(ldest_destino) t(ano) lambda(lambda) beta(beta)
    Where tt_mtw_ar_diff is the variable of interest; log_distance_rn_rdist is the distance by road (time-variant) between origin I and destination j

    However, upon running the
    Code:
    ppml_fe_bias
    command, which incorporates the correction proposed by Weidner and Zylkin (2021), I obtain a negative coefficient (different from zero). In other words, it seems peculiar to me that with this correction, the variable of interest can be estimated, whereas without it, it cannot. Is this an expected outcome? Am I doing something wrong?


    Click image for larger version

Name:	bias_correction.png
Views:	1
Size:	19.1 KB
ID:	1750117


    Thank you for your help and insights.

    Kind Regards,
    Mauricio.

  • #2
    Dear Mauricio Carvalho,

    I believe you are not using ppml_fe_bias correctly. Try using in the options i, j, and t exactly the same variables that are absorbed in ppmlhdfe.

    Best wishes,

    Joao

    Comment


    • #3
      Dear Prof João Joao Santos Silva

      Apologies for my mistake. Indeed, I am using the same fixed effects absorbed in ppmlhdfe. I just attempted to make it more reader-friendly, but I forgot to change it throughout. To clarify, "ldest" corresponds to "origin," and "ldest_destino" corresponds to "destination" -- and "ano" corresponds to year. Once again, I've attached the estimation below. Here, to simplify, I am regressing y_ijt on x_itj only (that is, tot_reloc on tt_mtw_ar_diff) with three-way fixed effects.

      Code:
      ppmlhdfe tot_reloc tt_mtw_ar_diff , absorb(ldest#ano ldest_destino#ano ldest#ldest_destino, save) vce(cluster ldest#ldest_destino) d
      predict lambda 
      matrix beta = e(b) 
      ppml_fe_bias tot_reloc tt_mtw_ar_diff, i(ldest) j(ldest_destino) t(ano) lambda(lambda) beta(beta)
      Click image for larger version

Name:	stata.png
Views:	1
Size:	35.9 KB
ID:	1750263




      Thank you very much, Professor!

      Comment


      • #4
        I have a related question regarding #1 and #2. As I mentioned earlier, since my variable of interest does not vary at the "ijt" level, I am using the difference of this variable — i.e., destination minus origin. The "i" and "j" represent units at the municipality level. I am considering whether it would be reasonable, in order to continue using the three-way fixed effects (best practice), to "level-up" the regional level of the fixed effects. For example, instead of controlling for municipality_origin#time; municipality_destination#time; and municipality_origin#municipality_destination, I would use nuts3_origin#time, nuts3_destination#time, and nuts3_origin#nuts3_destination. Here, nuts3 refers to the NUTS3 regions as defined by the EU, which, in my case, they are larger regions than individual municipalities.

        In this latter case, would the three-way FE-PPML still be subject to asymptotic bias?

        Perhaps Professor Tom Zylkin could offer some guidance here as well?

        Thank you in advance!

        Comment


        • #5
          Dear Mauricio,
          Since your initial estimate is not identified in the first step, it is not appropriate to apply a bias correction in the second step, and this should not be expected to give you meaningful results. I agree it is strange it should give you a result at all given how you defined the variable. That is something I can look into for you if you want. As another tip, when using ppml_fe_bias, please keep in mind you need to include all the same x variables as the original model.

          For the second question, assuming there are a reasonable number of municipalities in each region we should be less concerned about asymptotic bias in that case. You will probably want to add controls for variables that vary by municipality, such as log population and log income (if you have such data).

          Regards,
          Tom

          Comment


          • #6
            Dear Tom Zylkin,

            Thank you very much!

            Yes, you are right. I should not apply the bias correction in the second step since my parameter of interest was not identified in the first step. Although it might seem obvious, I just realized that my variable of interest (delta_X_itj = x_jt - x_it) is a linear combination of "jt" and "it" variables after obtaining a "0/omitted" result in the first step.

            Regarding the second question, I have 23 NUTS3 regions and each region has an average of 12 municipalities. Do you think this is enough? Overall, I have a three-time period 275 × 274 matrix of relocation flow (origin-destination pairs), resulting in a panel of 226,050 observations. If you allow me, can I ask one more question? Could you recommend some references or intuition to gain a better understanding of the fact that if the number of municipalities in each region is large enough, we can be less concerned with asymptotic bias?

            Thank you for your tip regarding including the same X variables in the "ppml_fe_bias" command and for including some controls regarding my second question!

            I think I will go for estimating my model with three-way FE-PPML with the higher regional level (NUTS3) and including controls, as you suggested.

            Kind Regards,
            Mauricio.

            Comment


            • #7
              Hi Mauricio,
              For some intuition, please read Section 2 of the Weidner and Zylkin paper along with Appendix A.8 in our online appendix. In short, what matters are (i) how many observations there are to estimate each fixed effect, which determines the order of the bias, and (ii) the order of the standard error. If you are assuming clustering by pair and treating T is fixed, the order of the standard errors is 1/N. The order of the bias can be thought of 1/(R*N) where R is the average number of municipalities per region (this is based on an asymptotic where both R and N become large). Therefore, if we can treat R as large the bias will be small relative to the standard error. R=12 is not necessarily a large number, but we can still expect the bias to be (approximately!) 12x less, which is a big difference.

              Also please note that if a variable is perfectly collinear, its coefficient estimate is not 0 but rather it is undetermined.
              Regards,
              Tom

              Comment


              • #8
                Dear Professor Tom Zylkin,

                Thank you very much for your invaluable assistance!

                Kind regards,
                Mauricio.

                Comment

                Working...
                X