PPMLDHFE dropping linear time trend

Yevgeniy Feyman

Join Date: Jul 2016

Posts: 32
#1

PPMLDHFE dropping linear time trend

10 Jul 2021, 10:42

Hi all,

I'm trying to run the following code:

Code:

ppmlhdfe died year `controls' if year!=2020, offset(ln_pop) absorb(facility)

My outcome is the number of people who died in a given facility in a given year, `controls' are facility-year varying covariates. When I run this code, I get a message saying that "year" is dropped because of collinearity with fixed effects. The problem is that when I run this without any other variables (no fixed effects, no controls) I still get a message that year is dropped due to collinearity. This doesn't happen if I run this using reghdfe or ppml without fixed effects. Note that I am able to run this model with year dummies (i.year) successfully!

Any ideas why this might be happening? There are 4 years of data, 129 facilities each year.
Tags: None
Joao Santos Silva

Join Date: Apr 2014

Posts: 3030
#2

11 Jul 2021, 01:46

Dear Yevgeniy Feyman,

Please show us the results with the different estimators so that we can comment on it.

Best wishes,

Joao
Comment

Yevgeniy Feyman

Join Date: Jul 2016
Posts: 32

12 Jul 2021, 07:17

Thanks for the response Joao Santos Silva

I've included the results from the log file below. Because this is using protected data, I can't share all coefficients and variable names, but I've included what I could.

PPMLHDFE, with facility FE

Code:

. ppmlhdfe died year `controls' if year!=2020, offset(ln_survivor) absorb(facility)
note: year is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-06)
Iteration 1:   deviance = 1.2555e+04  eps = .         iters = 1    tol = 1.0e-04  min(eta) =  -0.96  P   
Iteration 2:   deviance = 7.2185e+03  eps = 7.39e-01  iters = 1    tol = 1.0e-04  min(eta) =  -1.25      
Iteration 3:   deviance = 7.2001e+03  eps = 2.55e-03  iters = 1    tol = 1.0e-04  min(eta) =  -1.27      
Iteration 4:   deviance = 7.2001e+03  eps = 5.71e-08  iters = 1    tol = 1.0e-04  min(eta) =  -1.27      
Iteration 5:   deviance = 7.2001e+03  eps = 8.60e-16  iters = 1    tol = 1.0e-05  min(eta) =  -1.27   S O
------------------------------------------------------------------------------------------------------------
(legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance)
Converged in 5 iterations and 5 HDFE sub-iterations (tol = 1.0e-08)

PPML regression                                   No. of obs      =        516
                                                  Residual df     =        464
                                                  Wald chi2(51)   =    7495.94
Deviance             =  7200.080291               Prob > chi2     =     0.0000
Log pseudolikelihood = -6093.557735               Pseudo R2       =     0.9747
---------------------------------------------------------------------------------
                |               Robust
           died |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
           year |          0  (omitted)
  ...
          _cons |  -2.073217   .5057867    -4.10   0.000    -3.064541   -1.081893
---------------------------------------------------------------------------------

No facility fixed effects, or any other variables

Code:

. ppmlhdfe died year if year!=2020, offset(ln_survivor)
note: year is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-06)
Iteration 1:   deviance = 5.8633e+04  eps = .         iters = 1    tol = 1.0e-04  min(eta) =  -1.30  P   
Iteration 2:   deviance = 5.3110e+04  eps = 1.04e-01  iters = 1    tol = 1.0e-04  min(eta) =  -1.36      
Iteration 3:   deviance = 5.3105e+04  eps = 8.73e-05  iters = 1    tol = 1.0e-04  min(eta) =  -1.36      
Iteration 4:   deviance = 5.3105e+04  eps = 6.64e-11  iters = 1    tol = 1.0e-05  min(eta) =  -1.36      
Iteration 5:   deviance = 5.3105e+04  eps = 1.17e-16  iters = 1    tol = 1.0e-06  min(eta) =  -1.36   S O
------------------------------------------------------------------------------------------------------------
(legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance)
Converged in 5 iterations and 5 HDFE sub-iterations (tol = 1.0e-08)

PPML regression                                   No. of obs      =        516
                                                  Residual df     =        515
                                                  Wald chi2(0)    =          .
Deviance             =  53105.10649               Prob > chi2     =          .
Log pseudolikelihood = -29046.07084               Pseudo R2       =     0.8795
------------------------------------------------------------------------------
             |               Robust
        died |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        year |          0  (omitted)
       _cons |  -3.185151   .0093252  -341.57   0.000    -3.203428   -3.166874
 ln_survivor |          1  (offset)
------------------------------------------------------------------------------

Year included as dummies

Code:

. ppmlhdfe died i.year `controls' if year!=2020, offset(ln_survivor) absorb(facility)
Iteration 1:   deviance = 7.2930e+03  eps = .         iters = 1    tol = 1.0e-04  min(eta) =  -0.70  P   
Iteration 2:   deviance = 1.4683e+03  eps = 3.97e+00  iters = 1    tol = 1.0e-04  min(eta) =  -1.08      
Iteration 3:   deviance = 1.4040e+03  eps = 4.58e-02  iters = 1    tol = 1.0e-04  min(eta) =  -1.17      
Iteration 4:   deviance = 1.4040e+03  eps = 5.14e-05  iters = 1    tol = 1.0e-04  min(eta) =  -1.17      
Iteration 5:   deviance = 1.4040e+03  eps = 1.73e-10  iters = 1    tol = 1.0e-05  min(eta) =  -1.17      
Iteration 6:   deviance = 1.4040e+03  eps = 1.38e-16  iters = 1    tol = 1.0e-06  min(eta) =  -1.17   S O
------------------------------------------------------------------------------------------------------------
(legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance)
Converged in 6 iterations and 6 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression                              No. of obs      =        516
Absorbing 1 HDFE group                            Residual df     =        333
                                                  Wald chi2(54)   =     479.64
Deviance             =  1403.959419               Prob > chi2     =     0.0000
Log pseudolikelihood = -3195.497299               Pseudo R2       =     0.9867
---------------------------------------------------------------------------------
                |               Robust
           died |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
           year |
          2017  |  -.0861966   .0293674    -2.94   0.003    -.1437556   -.0286377
          2018  |  -.1131845   .0330952    -3.42   0.001      -.17805    -.048319
          2019  |  -.1211958   .0395994    -3.06   0.002    -.1988092   -.0435824
...
          _cons |  -2.009282   5.636472    -0.36   0.721    -13.05656       9.038
---------------------------------------------------------------------------------

No fixed effects, using -ppml-

Code:

. ppml died year `controls' if year!=2020, offset(ln_survivor)

note: checking the existence of the estimates
WARNING: year has very large values, consider rescaling  or recentering
WARNING: drivedistancepc has very large values, consider rescaling  or recentering

Number of regressors excluded to ensure that the estimates exist: 0
Number of observations excluded: 0

note: starting ppml estimation

Iteration 1:   deviance =   12241.7
Iteration 2:   deviance =  7012.154
Iteration 3:   deviance =  6994.436
Iteration 4:   deviance =  6994.435
Iteration 5:   deviance =  6994.435

Number of parameters: 53
Number of observations: 516
Pseudo log-likelihood: -15702158
R-squared: 4.496e-06
Option strict is: off
---------------------------------------------------------------------------------
                |               Robust
           died |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
           year |  -.0240183   .0057341    -4.19   0.000     -.035257   -.0127796
    ...
          _cons |   46.21975   11.53539     4.01   0.000     23.61081    68.82869
---------------------------------------------------------------------------------

Comment

Joao Santos Silva

Join Date: Apr 2014

Posts: 3030
#4

12 Jul 2021, 07:40

Dear Yevgeniy Feyman,

What is the offset variable? Do you still have the problem without the offset? Also, this may have nothing to do with it, but the R2 in PPML is incredibly low; is there a reason for that (not that the R2 is important, but such low value may indicate a problem somewhere)?

Best wishes,

Joao
Comment
Yevgeniy Feyman

Join Date: Jul 2016

Posts: 32
#5

12 Jul 2021, 07:49

Hi Joao Santos Silva ,

The offset variable is the natural log of the number of patients alive at that facility at the beginning of that calendar year (patients are attributed to a facility).

Removing it still makes year drop out.

That's an interesting point about the R2 in PPML. I'm not sure why it would be that low. It's possible that the fixed effects do a LOT of the heavy lifting. (That's fine for my purposes because I'm using this to generate observed/expected ratios rather than inferring causality for any one variable.)
Comment
Yevgeniy Feyman

Join Date: Jul 2016

Posts: 32
#6

13 Jul 2021, 07:49

Hi Sergio Correia, any thoughts on why this might be happening? Could it be a bug with PPMLHDFE?
Comment
Sergio Correia

Join Date: Apr 2014

Posts: 420
#7

13 Jul 2021, 08:45

Hi Yevgeniy,

First, what versions of ppmlhdfe and reghdfe are you using? ("which <packagename>" does the trick).

Second, if the offset() does not matter, then this also causes an error no?

Code:

. ppmlhdfe died year if year!=2020

I would first try to remove the -if- condition to ensure there's no problem with the sample selection:

Code:

drop if year==2020 tab1 died year, m ppmlhdfe died year

Here, it would be useful to have an idea of what are the results of the tabulation, to be sure everything looks ok.

Also, you can try racking up the tolerance

Code:

ppmlhdfe died year, tol(1e-10)

I also noted that the message "<var> is probably collinear with the fixed effects" is actually from reghdfe, which gets called by ppmlhdfe. So you can try

Code:

reghdfe died year, tol(1e-10)

To see if you get the same error.

Lastly, the other thing I can think of is to run the ppmlhdfe command with the "verbose(1)" or "verbose(2)" options, to see if anything stands out. Usually the output of verbose is just technical stuff about variable parsing ,so there shouldn't be any problem in sharing this

-Sergio

PS: apologies in advance for the non-straightforward debugging advice. It's really hard to remote debug something with confidential data, because without an example I can reproduce I can't really tell if its a bug or not.
Comment

Yevgeniy Feyman

Join Date: Jul 2016
Posts: 32

13 Jul 2021, 09:30

Thanks for the quick response Sergio Correia !

The version of ppmlhdfe is 2.3.0 from Feb 25 2021.
The version of reghdfe is 6.12.1 from June 27 2021.

Second, if the offset() does not matter, then this also causes an error no?

That's right. The error is produced with and without the offset variable.

I would first try to remove the -if- condition to ensure there's no problem with the sample selection:

Good idea! This doesn't change anything unfortunately.

Here, it would be useful to have an idea of what are the results of the tabulation, to be sure everything looks ok.

This produces a fairly large table of unique values of "died." I can share that but it's quite substantial. It is a balanced panel (130 facilities each year).

The ppmlhdfe gives the same error message here too.

Also, you can try racking up the tolerance

Same error here.

I also noted that the message "<var> is probably collinear with the fixed effects" is actually from reghdfe, which gets called by ppmlhdfe. So you can try

Reghdfe does successfully estimate this! No error messages.

Lastly, I've pasted the ppmlhdfe verbose(2) log below:

Code:

. ppmlhdfe died year, tol(1e-10) verbose(2)

- Techniques used for detecting and fixing separation: fe simplex relu

## Parsing varlist: died year

macros:
           r(basevars) : "died year"
          r(indepvars) : "year"
          r(fe_format) : "%9.0g"
             r(depvar) : "died"

## Parsing vce()

macros:
       s(num_clusters) : "0"
            s(vcetype) : "unadjusted"

- Parsing absorb() and creating HDFE object:

- Parsing absorb() and creating HDFE object:

## Parsing absvars and HDFE options

macros:
       s(precondition) : "1"
           s(poolsize) : "."
        s(compute_rre) : "0"
     s(dofadjustments) : "pairwise clusters continuous"
    s(report_constant) : "1"
                  s(G) : "1"
      s(has_intercept) : "1"
        s(save_any_fe) : "0"
        s(save_all_fe) : "0"
            s(absvars) : " """
              s(ivars) : "_cons"
              s(cvars) : " """
            s(targets) : " """
         s(intercepts) : "1"
         s(num_slopes) : "0"

## Initializing Mata object for 1 fixed effects

   +-----------------------------------------------------------------------------------+
   |  i | g |  Name | Int? | #Slopes |    Obs.   |   Levels   | Sorted? | #Drop Singl. |
   |----+---+-------+------+---------+-----------+------------+---------+--------------|
   |  1 | 1 |       | Yes  |    0    |       520 |          1 |     Yes |          0   |
   +-----------------------------------------------------------------------------------+

## Initializing panelsetup() for each fixed effect

   - panelsetup()
## Loading weights [iweight=died]
## Sorting weights for each absvar
   - loading iweight weight from variable died
   - sorting weight for factor 

## Saving e(sample)

- Loading regression variables into Mata

macros:
        r(not_omitted) : "1"
            r(varlist) : "year"
     r(fullvarlist_bn) : "year"
        r(fullvarlist) : "year"
 @@ Standardizing variables
 @@ Removing collinear variables
 $$ - Finding separated variables
## Loading weights [iweight=died]
## Sorting weights for each absvar
   - loading iweight weight from variable died
   - sorting weight for factor 

 $$ No boundary observations (y=0), no separation tests required.
 @@ Starting GLM::solve
 @@ Setting initial values
## Loading weights [aweight=<placeholder for mu>]
## Sorting weights for each absvar
   - loading aweight weight from variable <placeholder for mu>
   - sorting weight for factor 
 @@ Starting IRLS
    Target HDFE tolerance:1.00e-11 
 @@@ HDFE.update_sorted_weights()
   - loading aweight weight from variable <placeholder for mu>
   - sorting weight for factor 
 @@@ HDFE._partial_out()
   - Running solver (acceleration=none, transform=symmetric_kaczmarz tol=1.0e-04)
   - Iterating:note: year is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-06)
 @@@ reghdfe_solve_ols()
  0
 @@@ updating eta/mu/deviance
Iteration 1:   deviance = 4.9777e+05  eps = .         iters = 1    tol = 1.0e-04  min(eta) =   0.55  P   
 @@@ HDFE.update_sorted_weights()
   - loading aweight weight from variable <placeholder for mu>
   - sorting weight for factor 
 @@@ HDFE._partial_out()
 @@@ reghdfe_solve_ols()
  0
 @@@ updating eta/mu/deviance
Iteration 2:   deviance = 4.9517e+05  eps = 5.25e-03  iters = 1    tol = 1.0e-04  min(eta) =   0.51      
 @@@ HDFE.update_sorted_weights()
   - loading aweight weight from variable <placeholder for mu>
   - sorting weight for factor 
 @@@ HDFE._partial_out()
 @@@ reghdfe_solve_ols()
  0
 @@@ updating eta/mu/deviance
Iteration 3:   deviance = 4.9517e+05  eps = 2.12e-06  iters = 1    tol = 1.0e-04  min(eta) =   0.51      
 @@@ HDFE.update_sorted_weights()
   - loading aweight weight from variable <placeholder for mu>
   - sorting weight for factor 
 @@@ HDFE._partial_out()
 @@@ reghdfe_solve_ols()
  0
 @@@ updating eta/mu/deviance
Iteration 4:   deviance = 4.9517e+05  eps = 3.66e-13  iters = 1    tol = 1.0e-05  min(eta) =   0.51      
 @@@ HDFE.update_sorted_weights()
   - loading aweight weight from variable <placeholder for mu>
   - sorting weight for factor 
 @@@ HDFE._partial_out()
 @@@ reghdfe_solve_ols()

## Solving least-squares regression of partialled-out variables

                  1              2
    +-------------------------------+
  1 |             0   -5.95983e-14  |
    +-------------------------------+
 @@@ updating eta/mu/deviance
Iteration 5:   deviance = 4.9517e+05  eps = 2.01e-16  iters = 1    tol = 1.0e-07  min(eta) =   0.51   S O
------------------------------------------------------------------------------------------------------------
(legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance)
Converged in 5 iterations and 5 HDFE sub-iterations (tol = 1.0e-10)
 @@ Computing DoF

## Estimating degrees-of-freedom absorbed by the fixed effects

   - there are 1 fixed intercepts and slopes in the 1 absvars
 @@ Computing final betas and standard errors

## Solving least-squares regression of partialled-out variables


## Estimating Robust Variance-Covariance Matrix of the Estimators (VCE)

   - VCE type: robust
   - Weight type: aweight
   - Small-sample-adjustment: q = N / (N-df_m-df_a) = 520 / (520 - 0 - 1) = 1.00192678

## Adding _cons to varlist

## Saving e(sample)

PPML regression                                   No. of obs      =        520
                                                  Residual df     =        519
                                                  Wald chi2(0)    =          .
Deviance             =  495173.2955               Prob > chi2     =          .
Log pseudolikelihood = -250094.1298               Pseudo R2       =    -0.0000
------------------------------------------------------------------------------
             |               Robust
        died |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        year |          0  (omitted)
       _cons |   7.982076   .0262533   304.04   0.000     7.930621    8.033532
------------------------------------------------------------------------------

Comment

Sergio Correia

Join Date: Apr 2014

Posts: 420
#9

13 Jul 2021, 23:13

Thanks for going through this. Grasping at straws here, but just to be sure, how many years are there in the sample? This almost looks like there is only one year (e.g. 2019) but the fact that reghdfe estimates this correctly puzzles me (can you show the output of "reghdfe year"?)

Also, can you share just year and randomized data on died?

EG, if you create fake data on died and send that fake data and the actual data on year maybe I can have an idea of what is going on? (this assumes the list of years is not confidential)
Comment
Sergio Correia

Join Date: Apr 2014

Posts: 420
#10

13 Jul 2021, 23:23

A few more thoughts:
This seems to happen when ppmlhdfe calls HDFE._partial_out() with IRLS weights. So it's not surprising there are no collinearities detected with reghdfe (which does not use weights) than with ppmlhdfe (which uses weights).

I was surprised that your note still had a tolerance of 1e-6 even though you stated tol(1e-10) ("year is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-06)". If I run something similar on my computer (on the same versions of ppmlhdfe/reghdfe) the warning error I get states correctly tol=1e-10 instead of tol=1e-6.

As a simple workaround, what if you directly edit the code and see what happens? More specifically, in Stata type "which reghdfe5.mata". Then, open that file in your favorite editor, and look for the line "kept2 = (diagonal(cross(y, y))' :/ kept2) :> (collinear_tol)" (line 1276). Replace that line with "kept2 = (diagonal(cross(y, y))' :/ kept2) :> 0" (i.e. replace collinear_tol with zero). This should effectively disable this check, so it shouldn't drop year anymore. Of course, be mindful that regressors that should be dropped might then stay in the regression, although this is usually very obvious to splot (you see regressors with huuge betas like 1e+10).

Best,
S
Comment
Yevgeniy Feyman

Join Date: Jul 2016

Posts: 32
#11

15 Jul 2021, 12:39

As a simple workaround, what if you directly edit the code and see what happens? More specifically, in Stata type "which reghdfe5.mata". Then, open that file in your favorite editor, and look for the line "kept2 = (diagonal(cross(y, y))' :/ kept2) :> (collinear_tol)" (line 1276). Replace that line with "kept2 = (diagonal(cross(y, y))' :/ kept2) :> 0" (i.e. replace collinear_tol with zero). This should effectively disable this check, so it shouldn't drop year anymore. Of course, be mindful that regressors that should be dropped might then stay in the regression, although this is usually very obvious to splot (you see regressors with huuge betas like 1e+10).

This option worked great! Thanks so much.

PPML and PPMLHDFE both give the same coefficient for year now (~0.02). Surprising that it was being dropped though because that seems like a very small coefficient.
Comment

Osiris Parcero

Join Date: Feb 2015
Posts: 12

#12

15 May 2022, 00:32

As a simple workaround, what if you directly edit the code and see what happens? More specifically, in Stata type "which reghdfe5.mata". Then, open that file in your favorite editor, and look for the line "kept2 = (diagonal(cross(y, y))' :/ kept2) :> (collinear_tol)" (line 1276). Replace that line with "kept2 = (diagonal(cross(y, y))' :/ kept2) :> 0" (i.e. replace collinear_tol with zero). This should effectively disable this check, so it shouldn't drop year anymore. Of course, be mindful that regressors that should be dropped might then stay in the regression, although this is usually very obvious to splot (you see regressors with huuge betas like 1e+10).

Dear Sergio Correia,
I also have the problem that the trend variable gets dropped when using ppmlhdfe.
I'm trying to run the following code:

Code:

ppmlhdfe F6CitesAcc Cites_2YearL died c.year , absorb(i.paper)  vce(cluster paper author_id_Sole)

The variables have the following meaning:

F6CitesAcc is the number of citations received by a paper from year t to year t+6,
died is a dummy variable taking the value 1 if the author of the paper is dead and 0 otherwise,
Cites_2YearL is the number of citations received by a paper from year t-2 to year t-1,
paper is an identifier for each paper,
author_id_Sole is an identifier for each author.

I get the following result

Code:

. ppmlhdfe F6CitesAcc Cites_2YearL died c.year , absorb(i.paper)  vce(cluster paper author_id_Sole)
(dropped 3307 observations that are either singletons or separated by a fixed effect)
note: year is probably collinear with the fixed effects (all partialled-out values are close to zero;
> tol = 1.0e-06)
Iteration 1:   deviance = 1.3292e+04  eps = .         iters = 1    tol = 1.0e-04  min(eta) =  -3.61  P
>    
Iteration 2:   deviance = 4.2797e+03  eps = 2.11e+00  iters = 1    tol = 1.0e-04  min(eta) =  -4.52  
>    
Iteration 3:   deviance = 3.2794e+03  eps = 3.05e-01  iters = 1    tol = 1.0e-04  min(eta) =  -5.38  
>    
Iteration 4:   deviance = 3.1633e+03  eps = 3.67e-02  iters = 1    tol = 1.0e-04  min(eta) =  -6.11  
>    
Iteration 5:   deviance = 3.1549e+03  eps = 2.65e-03  iters = 1    tol = 1.0e-04  min(eta) =  -6.54  
>    
Iteration 6:   deviance = 3.1547e+03  eps = 6.20e-05  iters = 1    tol = 1.0e-04  min(eta) =  -6.66  
>    
Iteration 7:   deviance = 3.1547e+03  eps = 1.12e-07  iters = 1    tol = 1.0e-05  min(eta) =  -6.67  
>    
Iteration 8:   deviance = 3.1547e+03  eps = 8.76e-13  iters = 1    tol = 1.0e-06  min(eta) =  -6.67  
> S O
------------------------------------------------------------------------------------------------------
> ------
(legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance)
Converged in 8 iterations and 8 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression                              No. of obs      =      4,794
Absorbing 1 HDFE group                            Residual df     =        169
Statistics robust to heteroskedasticity           Wald chi2(2)    =     136.90
Deviance             =  3154.701608               Prob > chi2     =     0.0000
Log pseudolikelihood = -8405.615872               Pseudo R2       =     0.9478

Number of clusters (paper)  =      1,006
Number of clusters (author_id_Sole)=       170
                 (Std. Err. adjusted for 170 clusters in paper author_id_Sole)
------------------------------------------------------------------------------
             |               Robust
  F6CitesAcc |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Cites_2YearL |   .0021969   .0002726     8.06   0.000     .0016627    .0027311
        died |   .4135046   .0481128     8.59   0.000     .3192052     .507804
        year |          0  (omitted)
       _cons |   4.552471   .0307828   147.89   0.000     4.492138    4.612804
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
       paper |      1006        1006           0    *|
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

I tried to apply the solution you suggested in post #10, which worked for Yevgeniy Feyman. However, it did not work for me, but I may be doing something wrong. Following your suggestion, I typed "which reghdfe5.mata" (without the quotes) on the Stata command line, but I got the following error message.
file reghdfe5.mata not found along ado-path
r(111);
Further comments: The problem does not appear if the fixed effects are dropped – i.e., with no absorb(i.paper).
I did follow the suggestions made in post #7 by increasing the tolerance, but the problem persists. The problem does not appear if year dummies are included instead the trend (i.e., i.year).
I am using Stata/MP 14.2
The version of ppmlhdfe is 2.2.0 02aug2019
I would be happy to provide more information if needed.

Comment

Karen Jyo

Join Date: Feb 2022

Posts: 7
#13

08 Nov 2023, 08:42

well, I meet similar problem of low R2 when I am using PPMLHDFE. I understand that R2 is not important, but lots of paper suggest high R2 in their PPML or PPMLHDFE result. So I wonder if I did something wrong? And I want to know the possible reasons cuz it’s weird
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3030
#14

08 Nov 2023, 10:43

Dear Karen Jyo,

An obvious thing to check is that you are not using a dependent variable in logs (that happens sometimes!). Also, I suggest that you compute the R2 as the square of the correlation between the dependent variable and its fitted values. Is this R2 still low? Finally, it would halp if you could share you estimation results to help us try to figure out what is going on.

Best wishes,

Joao
Comment

Announcement