Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ppmlhdfe and multiple, heterogeneous slopes

    Dear Stata Community,

    I am wondering why ppmlhdfe returns error (3201) when multiple, heterogeneous slopes are absorbed. For example, running
    Code:
    use "http://fmwww.bc.edu/RePEc/bocode/e/EXAMPLE_TRADE_FTA_DATA" if category=="TOTAL", clear
    egen imp = group(isoimp)
    egen exp = group(isoexp)
    ppmlhdfe trade fta, a(year#c.(imp exp) imp#exp) cluster(imp#exp)
    returns
    Code:
    . ppmlhdfe trade fta, a(year#c.(imp exp) imp#exp) cluster(imp#exp)
    reghdfe_panel_precondition():  3201  vector required
    FixedEffects::load_weights():     -  function returned error
             fixed_effects():     -  function returned error
    GLM::init_fixed_effects():     -  function returned error
                     <istmt>:     -  function returned error
    r(3201);
    In contrast, asking reghdfe to perform a somehow similar estimation by running
    Code:
    use "http://fmwww.bc.edu/RePEc/bocode/e/EXAMPLE_TRADE_FTA_DATA" if category=="TOTAL", clear
    egen imp = group(isoimp)
    egen exp = group(isoexp)
    gen ltrade = ln(trade)
    reghdfe ltrade fta, a(year#c.(imp exp) imp#exp) cluster(imp#exp)
    returns
    Code:
    . reghdfe ltrade fta, a(year#c.(imp exp) imp#exp) cluster(imp#exp)
    (MWFE estimator converged in 4 iterations)
    
    HDFE Linear regression                            Number of obs   =      5,932
    Absorbing 2 HDFE groups                           F(   1,   1189) =      13.62
    Statistics robust to heteroskedasticity           Prob > F        =     0.0002
    R-squared       =     0.9308
    Adj R-squared   =     0.9133
    Within R-sq.    =     0.0042
    Number of clusters (imp#exp) =      1,190         Root MSE        =     0.5901
    
    (Std. Err. adjusted for 1,190 clusters in imp#exp)
    
    Robust
    ltrade       Coef.   Std. Err.      t    P>t     [95% Conf. Interval]
    
    fta    .1877008   .0508547     3.69   0.000     .0879259    .2874757
    _cons    13.16649     .01622   811.74   0.000     13.13467    13.19831
    
    
    Absorbed degrees of freedom:
    
    Absorbed FE  Categories  - Redundant  = Num. Coefs
    -
    year#c.imp          5           0           5    ?
    year#c.exp          5           0           5    ?
    imp#exp       1190        1190           0    *
    
    ? = number of redundant parameters may be higher
    * = FE nested within cluster; treated as redundant for DoF computation
    I am fully aware that year#c.(imp exp) is not equivalent to year#imp year#exp. I am just wondering why ppmlhdfe returns the error whereas reghdfe does not.

    Thank you,

    Mihai

  • #2
    Hopefully Tom Zylkin will be able to help.

    Comment


    • #3
      Hi Mihai Paraschiv,

      ppmlhdfe does not support the syntax you are using in the first example. I am actually a bit confused why you would want to use "c.exp" and "c.imp" here, since exp and imp surely are just numerical IDs, not continuous variables per se.

      A more reasonable specification would be
      Code:
      ppmlhdfe trade fta, a(c.year#imp c.year#exp imp#exp) cluster(imp#exp)
      which treats "year" as a continuous variable (i.e., a time trend) and allows its effects to differ by exporter and importer. The output you get is

      Code:
      Iteration 1:   deviance = 3.6104e+09  eps = .         iters = 4    tol = 1.0e-04  min(eta) =  -3.96  P   
      Iteration 2:   deviance = 1.1115e+09  eps = 2.25e+00  iters = 3    tol = 1.0e-04  min(eta) =  -5.23      
      Iteration 3:   deviance = 7.1214e+08  eps = 5.61e-01  iters = 3    tol = 1.0e-04  min(eta) =  -6.35      
      Iteration 4:   deviance = 6.5477e+08  eps = 8.76e-02  iters = 3    tol = 1.0e-04  min(eta) =  -7.34      
      Iteration 5:   deviance = 6.4821e+08  eps = 1.01e-02  iters = 2    tol = 1.0e-04  min(eta) =  -8.18      
      Iteration 6:   deviance = 6.4741e+08  eps = 1.23e-03  iters = 2    tol = 1.0e-04  min(eta) =  -8.78      
      Iteration 7:   deviance = 6.4724e+08  eps = 2.73e-04  iters = 2    tol = 1.0e-04  min(eta) =  -9.04      
      Iteration 8:   deviance = 6.4714e+08  eps = 1.51e-04  iters = 2    tol = 1.0e-04  min(eta) =  -9.09      
      Iteration 9:   deviance = 6.4706e+08  eps = 1.16e-04  iters = 2    tol = 1.0e-04  min(eta) =  -9.10      
      Iteration 10:  deviance = 6.4700e+08  eps = 9.32e-05  iters = 2    tol = 1.0e-04  min(eta) =  -9.10      
      Iteration 11:  deviance = 6.4654e+08  eps = 7.20e-04  iters = 5    tol = 1.0e-05  min(eta) =  -9.16      
      Iteration 12:  deviance = 6.4653e+08  eps = 3.72e-06  iters = 2    tol = 1.0e-05  min(eta) =  -9.16      
      Iteration 13:  deviance = 6.4643e+08  eps = 1.64e-04  iters = 5    tol = 1.0e-06  min(eta) =  -9.19   S  
      Iteration 14:  deviance = 6.4643e+08  eps = 6.35e-07  iters = 4    tol = 1.0e-06  min(eta) =  -9.19      
      Iteration 15:  deviance = 6.4639e+08  eps = 5.58e-05  iters = 42   tol = 1.0e-07  min(eta) =  -9.21   S  
      Iteration 16:  deviance = 6.4639e+08  eps = 1.08e-09  iters = 5    tol = 1.0e-07  min(eta) =  -9.21   S  
      Iteration 17:  deviance = 6.4639e+08  eps = 1.18e-06  iters = 172  tol = 1.0e-08  min(eta) =  -9.22   S  
      Iteration 18:  deviance = 6.4639e+08  eps = 6.44e-13  iters = 2    tol = 1.0e-09  min(eta) =  -9.22   S O
      ------------------------------------------------------------------------------------------------------------
      (legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance)
      Converged in 18 iterations and 262 HDFE sub-iterations (tol = 1.0e-08)
      
      HDFE PPML regression                              No. of obs      =      5,950
      Absorbing 3 HDFE groups                           Residual df     =      1,189
      Statistics robust to heteroskedasticity           Wald chi2(1)    =       0.85
      Deviance             =  646391033.2               Prob > chi2     =     0.3563
      Log pseudolikelihood = -323240197.1               Pseudo R2       =     0.9895
      
      Number of clusters (imp#exp)=      1,190
                                  (Std. Err. adjusted for 1,190 clusters in imp#exp)
      ------------------------------------------------------------------------------
                   |               Robust
             trade |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               fta |    .076812   .0832684     0.92   0.356     -.086391     .240015
             _cons |   16.51015   .0431315   382.79   0.000     16.42561    16.59469
      ------------------------------------------------------------------------------
      
      Absorbed degrees of freedom:
      -----------------------------------------------------+
       Absorbed FE | Categories  - Redundant  = Num. Coefs |
      -------------+---------------------------------------|
        imp#c.year |        35           0          35    ?|
        exp#c.year |        35           0          35    ?|
           imp#exp |      1190        1190           0    *|
      -----------------------------------------------------+
      ? = number of redundant parameters may be higher
      * = FE nested within cluster; treated as redundant for DoF computation
      Also note that the model you estimated via reghdfe is equivalent to

      Code:
      reghdfe ltrade fta, a(year#c.imp year#c.exp imp#exp) cluster(imp#exp)
      Thus, the syntax you are using does not seem necessary. I hope I do not misunderstand anything.

      Regards, Tom
      Last edited by Tom Zylkin; 19 May 2021, 10:03.

      Comment


      • #4
        Dear Joao Santos Silva and Tom Zylkin,

        thank you very much for the help and support in this matter.

        The reason for my inquiry into year#c.(imp exp) syntax involves an exercise that uses year#c.(v1 v2 ... vn), where v1 through vn are symmetric country dummies (i.e., =1 if country is in the pair; without distinguishing between its status as an exporter or importer). In essence, and as part of this exercise, I was trying to find a way to incorporate country-year fixed effects as opposed to exporter-year and importer-year fixed effects into the estimation.

        I apologize for the ambiguity of my initial post -- I should have made the above clear from the very beginning. Nevertheless, the reply above is very helpful.

        Thank you once more,
        Mihai
        Last edited by Mihai Paraschiv; 19 May 2021, 11:27.

        Comment


        • #5
          Dear Mihai Paraschiv,

          If I understand you correctly, you wish to have a time trend that varies by symmetric pair?

          In that case, you should have a single variable called "v" that gives a different ID for each pair. There is no need to have v1 to vn. Set the ID equal to 0 if there are any pairs not in group you are describing. This will ensure any such pairs have a common slope parameter, which is probably what you want.

          In addition, there are two other points I should make:

          1. Since your model has a time trend associated with v1 ... vn, you should make sure your model also has a fixed effect for v1... vn. Otherwise your "slope" coefficient is not being calculated relative to a corresponding "intercept' and thus could be very misleading. Indeed, your estimates will actually depend on how you define the trend - eg using the raw "year" variable vs. subtracting the first year - and this is definitely not what you want.

          2. When ppml is used with both fixed effects and time trends, it is important to be aware of a possible incidental parameter problem. Please see Weidner and Zylkin (2021); in particular, check out our Appendix A.8 where we try to give a general characterization for PPML and some heuristics you can use. You can also check out our Section 2, which provides some additional intuition, especially for the estimation of gravity models.

          Again, hope this helps.

          Regards,
          Tom

          Comment


          • #6
            Dear Tom Zylkin,

            thank you very much for the follow up as well as for the two points above, both of which are immensely helpful.

            I was not after absorbing/estimating a time trend that varies by symmetric pair per se. Instead, I was looking for a way to have ppmlhdfe absorb symmetric country#1-year and country#2-year fixed effects, where country#1 and country#2 are the two countries of the dyad. This approach does not distinguish between country#1 (and country#2) as the exporting or importing country.

            Nevertheless, and as you have indicated, the two are approaches are related. Specifically, absorbing a symmetric country-pair (e.g., same id for AUS - AUT as for AUT - AUS) fixed effect interacted with the year variable does produce estimates that are very similar to those obtained when absorbing the symmetric country#1-year and country#2-year fixed effects; all while dropping the regressors that vary along the country-pair*year dimensions.

            Thank you once more for your time and help,
            Mihai

            Comment


            • #7
              Dear Mihai Paraschiv

              Glad I could help. One thing I will add is that if you are only looking to use fixed effects, you should be careful not to use the "c." prefix as that specifies a continuous variable, such as a time trend.

              As maybe one last thing, if you want the specification I think you are describing, what you can do is first take the average trade flow within each pair (e.g.., (AUS-AUT + AUT-AUS)/2) as your dependent variable. Then there is no distinction between exporter-time and importer-time fixed effects. But I'm glad what I suggested earlier was satisfactory.

              Regards,
              Tom

              Comment


              • #8
                Thank you for the insight and suggestions Tom Zylkin. This is very much appreciated.

                Best,
                Mihai

                Comment

                Working...
                X