ppmlhdfe and multiple, heterogeneous slopes

Mihai Paraschiv

Join Date: Feb 2017
Posts: 7

ppmlhdfe and multiple, heterogeneous slopes

17 May 2021, 15:50

Dear Stata Community,

I am wondering why ppmlhdfe returns error (3201) when multiple, heterogeneous slopes are absorbed. For example, running

Code:

use "http://fmwww.bc.edu/RePEc/bocode/e/EXAMPLE_TRADE_FTA_DATA" if category=="TOTAL", clear
egen imp = group(isoimp)
egen exp = group(isoexp)
ppmlhdfe trade fta, a(year#c.(imp exp) imp#exp) cluster(imp#exp)

returns

Code:

. ppmlhdfe trade fta, a(year#c.(imp exp) imp#exp) cluster(imp#exp)
reghdfe_panel_precondition():  3201  vector required
FixedEffects::load_weights():     -  function returned error
         fixed_effects():     -  function returned error
GLM::init_fixed_effects():     -  function returned error
                 <istmt>:     -  function returned error
r(3201);

In contrast, asking reghdfe to perform a somehow similar estimation by running

Code:

use "http://fmwww.bc.edu/RePEc/bocode/e/EXAMPLE_TRADE_FTA_DATA" if category=="TOTAL", clear
egen imp = group(isoimp)
egen exp = group(isoexp)
gen ltrade = ln(trade)
reghdfe ltrade fta, a(year#c.(imp exp) imp#exp) cluster(imp#exp)

returns

Code:

. reghdfe ltrade fta, a(year#c.(imp exp) imp#exp) cluster(imp#exp)
(MWFE estimator converged in 4 iterations)

HDFE Linear regression                            Number of obs   =      5,932
Absorbing 2 HDFE groups                           F(   1,   1189) =      13.62
Statistics robust to heteroskedasticity           Prob > F        =     0.0002
R-squared       =     0.9308
Adj R-squared   =     0.9133
Within R-sq.    =     0.0042
Number of clusters (imp#exp) =      1,190         Root MSE        =     0.5901

(Std. Err. adjusted for 1,190 clusters in imp#exp)

Robust
ltrade       Coef.   Std. Err.      t    P>t     [95% Conf. Interval]

fta    .1877008   .0508547     3.69   0.000     .0879259    .2874757
_cons    13.16649     .01622   811.74   0.000     13.13467    13.19831


Absorbed degrees of freedom:

Absorbed FE  Categories  - Redundant  = Num. Coefs
-
year#c.imp          5           0           5    ?
year#c.exp          5           0           5    ?
imp#exp       1190        1190           0    *

? = number of redundant parameters may be higher
* = FE nested within cluster; treated as redundant for DoF computation

I am fully aware that year#c.(imp exp) is not equivalent to year#imp year#exp. I am just wondering why ppmlhdfe returns the error whereas reghdfe does not.

Thank you,

Mihai

Tags: None

Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#2

19 May 2021, 06:43

Hopefully Tom Zylkin will be able to help.
Comment

Tom Zylkin

Join Date: Nov 2016
Posts: 188

19 May 2021, 10:00

Hi Mihai Paraschiv,

ppmlhdfe does not support the syntax you are using in the first example. I am actually a bit confused why you would want to use "c.exp" and "c.imp" here, since exp and imp surely are just numerical IDs, not continuous variables per se.

A more reasonable specification would be

Code:

ppmlhdfe trade fta, a(c.year#imp c.year#exp imp#exp) cluster(imp#exp)

which treats "year" as a continuous variable (i.e., a time trend) and allows its effects to differ by exporter and importer. The output you get is

Code:

Iteration 1:   deviance = 3.6104e+09  eps = .         iters = 4    tol = 1.0e-04  min(eta) =  -3.96  P   
Iteration 2:   deviance = 1.1115e+09  eps = 2.25e+00  iters = 3    tol = 1.0e-04  min(eta) =  -5.23      
Iteration 3:   deviance = 7.1214e+08  eps = 5.61e-01  iters = 3    tol = 1.0e-04  min(eta) =  -6.35      
Iteration 4:   deviance = 6.5477e+08  eps = 8.76e-02  iters = 3    tol = 1.0e-04  min(eta) =  -7.34      
Iteration 5:   deviance = 6.4821e+08  eps = 1.01e-02  iters = 2    tol = 1.0e-04  min(eta) =  -8.18      
Iteration 6:   deviance = 6.4741e+08  eps = 1.23e-03  iters = 2    tol = 1.0e-04  min(eta) =  -8.78      
Iteration 7:   deviance = 6.4724e+08  eps = 2.73e-04  iters = 2    tol = 1.0e-04  min(eta) =  -9.04      
Iteration 8:   deviance = 6.4714e+08  eps = 1.51e-04  iters = 2    tol = 1.0e-04  min(eta) =  -9.09      
Iteration 9:   deviance = 6.4706e+08  eps = 1.16e-04  iters = 2    tol = 1.0e-04  min(eta) =  -9.10      
Iteration 10:  deviance = 6.4700e+08  eps = 9.32e-05  iters = 2    tol = 1.0e-04  min(eta) =  -9.10      
Iteration 11:  deviance = 6.4654e+08  eps = 7.20e-04  iters = 5    tol = 1.0e-05  min(eta) =  -9.16      
Iteration 12:  deviance = 6.4653e+08  eps = 3.72e-06  iters = 2    tol = 1.0e-05  min(eta) =  -9.16      
Iteration 13:  deviance = 6.4643e+08  eps = 1.64e-04  iters = 5    tol = 1.0e-06  min(eta) =  -9.19   S  
Iteration 14:  deviance = 6.4643e+08  eps = 6.35e-07  iters = 4    tol = 1.0e-06  min(eta) =  -9.19      
Iteration 15:  deviance = 6.4639e+08  eps = 5.58e-05  iters = 42   tol = 1.0e-07  min(eta) =  -9.21   S  
Iteration 16:  deviance = 6.4639e+08  eps = 1.08e-09  iters = 5    tol = 1.0e-07  min(eta) =  -9.21   S  
Iteration 17:  deviance = 6.4639e+08  eps = 1.18e-06  iters = 172  tol = 1.0e-08  min(eta) =  -9.22   S  
Iteration 18:  deviance = 6.4639e+08  eps = 6.44e-13  iters = 2    tol = 1.0e-09  min(eta) =  -9.22   S O
------------------------------------------------------------------------------------------------------------
(legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance)
Converged in 18 iterations and 262 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression                              No. of obs      =      5,950
Absorbing 3 HDFE groups                           Residual df     =      1,189
Statistics robust to heteroskedasticity           Wald chi2(1)    =       0.85
Deviance             =  646391033.2               Prob > chi2     =     0.3563
Log pseudolikelihood = -323240197.1               Pseudo R2       =     0.9895

Number of clusters (imp#exp)=      1,190
                            (Std. Err. adjusted for 1,190 clusters in imp#exp)
------------------------------------------------------------------------------
             |               Robust
       trade |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         fta |    .076812   .0832684     0.92   0.356     -.086391     .240015
       _cons |   16.51015   .0431315   382.79   0.000     16.42561    16.59469
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
  imp#c.year |        35           0          35    ?|
  exp#c.year |        35           0          35    ?|
     imp#exp |      1190        1190           0    *|
-----------------------------------------------------+
? = number of redundant parameters may be higher
* = FE nested within cluster; treated as redundant for DoF computation

Also note that the model you estimated via reghdfe is equivalent to

Code:

reghdfe ltrade fta, a(year#c.imp year#c.exp imp#exp) cluster(imp#exp)

Thus, the syntax you are using does not seem necessary. I hope I do not misunderstand anything.

Regards, Tom

Last edited by Tom Zylkin; 19 May 2021, 10:03.

Comment

Mihai Paraschiv

Join Date: Feb 2017

Posts: 7
#4

19 May 2021, 11:22

Dear Joao Santos Silva and Tom Zylkin,

thank you very much for the help and support in this matter.

The reason for my inquiry into year#c.(imp exp) syntax involves an exercise that uses year#c.(v1 v2 ... vn), where v1 through vn are symmetric country dummies (i.e., =1 if country is in the pair; without distinguishing between its status as an exporter or importer). In essence, and as part of this exercise, I was trying to find a way to incorporate country-year fixed effects as opposed to exporter-year and importer-year fixed effects into the estimation.

I apologize for the ambiguity of my initial post -- I should have made the above clear from the very beginning. Nevertheless, the reply above is very helpful.

Thank you once more,
Mihai

Last edited by Mihai Paraschiv; 19 May 2021, 11:27.
Comment
Tom Zylkin

Join Date: Nov 2016

Posts: 188
#5

19 May 2021, 12:08

Dear Mihai Paraschiv,

If I understand you correctly, you wish to have a time trend that varies by symmetric pair?

In that case, you should have a single variable called "v" that gives a different ID for each pair. There is no need to have v1 to vn. Set the ID equal to 0 if there are any pairs not in group you are describing. This will ensure any such pairs have a common slope parameter, which is probably what you want.

In addition, there are two other points I should make:

1. Since your model has a time trend associated with v1 ... vn, you should make sure your model also has a fixed effect for v1... vn. Otherwise your "slope" coefficient is not being calculated relative to a corresponding "intercept' and thus could be very misleading. Indeed, your estimates will actually depend on how you define the trend - eg using the raw "year" variable vs. subtracting the first year - and this is definitely not what you want.

2. When ppml is used with both fixed effects and time trends, it is important to be aware of a possible incidental parameter problem. Please see Weidner and Zylkin (2021); in particular, check out our Appendix A.8 where we try to give a general characterization for PPML and some heuristics you can use. You can also check out our Section 2, which provides some additional intuition, especially for the estimation of gravity models.

Again, hope this helps.

Regards,
Tom
Comment
Mihai Paraschiv

Join Date: Feb 2017

Posts: 7
#6

21 May 2021, 11:23

Dear Tom Zylkin,

thank you very much for the follow up as well as for the two points above, both of which are immensely helpful.

I was not after absorbing/estimating a time trend that varies by symmetric pair per se. Instead, I was looking for a way to have ppmlhdfe absorb symmetric country#1-year and country#2-year fixed effects, where country#1 and country#2 are the two countries of the dyad. This approach does not distinguish between country#1 (and country#2) as the exporting or importing country.

Nevertheless, and as you have indicated, the two are approaches are related. Specifically, absorbing a symmetric country-pair (e.g., same id for AUS - AUT as for AUT - AUS) fixed effect interacted with the year variable does produce estimates that are very similar to those obtained when absorbing the symmetric country#1-year and country#2-year fixed effects; all while dropping the regressors that vary along the country-pair*year dimensions.

Thank you once more for your time and help,
Mihai
Comment
Tom Zylkin

Join Date: Nov 2016

Posts: 188
#7

21 May 2021, 11:43

Dear Mihai Paraschiv

Glad I could help. One thing I will add is that if you are only looking to use fixed effects, you should be careful not to use the "c." prefix as that specifies a continuous variable, such as a time trend.

As maybe one last thing, if you want the specification I think you are describing, what you can do is first take the average trade flow within each pair (e.g.., (AUS-AUT + AUT-AUS)/2) as your dependent variable. Then there is no distinction between exporter-time and importer-time fixed effects. But I'm glad what I suggested earlier was satisfactory.

Regards,
Tom
Comment
Mihai Paraschiv

Join Date: Feb 2017

Posts: 7
#8

21 May 2021, 14:08

Thank you for the insight and suggestions Tom Zylkin. This is very much appreciated.

Best,
Mihai
Comment

Announcement

ppmlhdfe and multiple, heterogeneous slopes

Comment

Comment

Comment

Comment

Comment

Comment

Comment