Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Understanding multi-way clustering results from PPMLHDFE gravity estimation

    Dear all,

    As I read that ignoring multiway clustering in estimating a gravity model leads to misleading inference (Egger and Tarlea, 2015), I was attempting to add multi-way clustering into my estimation.

    My command used is as follows:
    ppmlhdfe y x1#x2, absorb(panelID prod_sector_year Inv_year) vce(cluster Inv_Country ProdCountry year). //here I wanted to impose the clustering at source country, destination country and year levels.

    However, I do not understand how the number of clusters for which the standard errors were adjusted were determined (the last line in the picture).
    Click image for larger version

Name:	Capture.JPG
Views:	1
Size:	32.2 KB
ID:	1713788


    And how should I understand extremely small residual df here? I am very confused by this number of residual df, especially in comparison to the attempt where I impose vce(cluster panelID#year). Because I think that the second clustering here is more demanding:
    ppmlhdfe y x1#x2, absorb(panelID prod_sector_year Inv_year) vce(panelID#year). //here I wanted to impose the clustering at every combination of source country X destination country X year.
    Click image for larger version

Name:	Capture.JPG
Views:	1
Size:	21.7 KB
ID:	1713789


    Is there any problem in my codes? Did I accidentally estimate something different than I wanted?

    I appreciate any help and comments!

    Best regards,
    Lishu

  • #2
    Dear lishu zhang,

    When you cluster, the effective number of observations is the number of clusters; when you do multi-clustering, it is the smallest of these. So, in the first case, if you have 20 years, it is as if you have 20 observations. In the second case, your clusters are very small (in ID in a year), and therefore you have many of them.

    Best wishes,

    Joao

    Comment


    • #3
      Dear Joao Santos Silva,

      Thank you very much for the information.

      If I understood correctly, what the first line of codes does is not dissecting my sample subsequently along each cluster dimension but imposing the one with smallest number of groups.

      If I may have a follow-up question: is vce(cluster unit1 unit2 unit3) a correct practice of multi-way clustering? to me it seems like (choosing the strongest) one-way clustering. Or should I take my second line of codes as the implementation of the real multi-way clustering?

      Thank you in advance for your help!

      Best regards,
      Lishu

      Comment


      • #4
        Dear lishu zhang,

        I do not think that is right: I assume that it clusters along all dimensions, and reports the size of the smaller one. Your first line is the right way of doing multi-way clustering.

        Best wishes,

        Joao

        Comment


        • #5
          Dear Joao Santos Silva,

          I see. Thank you for the explanation!

          Best regards,
          Lishu

          Comment


          • #6
            Dear Joao Santos Silva,

            In a structural gravity model estimated using ppmlhdfe, I want to implement all the diagnostics as metioned in Egger and Tarlea (2015), that is;

            (a) Huber–White-type robust standard errors without clustering
            (b) Standard errors clustered at (and may be correlated over time within) country pairs
            (c) Standard errors clustered at (and may be correlated within) base groups (importer, exporter, and year), as well as every combination of the three.
            (d) Same as (c), except for country-pairs being dyadic (symmetric for ij and ji).

            For (a) and (b), I did the following

            Code:
            egen idt_ci = group(iso_i year)
            egen idt_cj = group(iso_j year)
            egen id_ci_cj = group(iso_i iso_j)
            
            * (a) Huber-White-type Standard errors without clustering
            
            ppmlhdfe trade x1, a(idt_ci idt_cj id_ci_cj) vce(robust)
            
            * (b) Standard errors are clustered at country pairs
            
            ppmlhdfe trade x1, a(idt_ci idt_cj id_ci_cj) vce(cluster id_ci_cj)
            I am not sure, how to implement (c) and (d) and what does a base group mean?

            Egger and Tarlea (2015) further refer to case (c) as "multi-way clustering assuming asymmetric pair-wise components" and case (d) as "multi-way clustering assuming symmetric pairwise (dyadic) components".

            I shall be very thankful, if you provide STATA code as how to create these base groups/symmetric and assymetric country pairs, and in general how to implement (c) and (d) using ppmlhdfe.

            Thank you,
            (Ridwan)
            Last edited by Ridwan Sheikh; 25 Apr 2026, 02:30.

            Comment


            • #7
              Dear Ridwan Sheikh,

              At least the current (2026) version of ppmlhdfe allows multyway clustering, so it should be easy to do all of that.

              Best wishes,

              Joao

              Comment

              Working...
              X