Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • PPMLDHFE dropping linear time trend

    Hi all,

    I'm trying to run the following code:

    Code:
     ppmlhdfe died year `controls' if year!=2020, offset(ln_pop) absorb(facility)
    My outcome is the number of people who died in a given facility in a given year, `controls' are facility-year varying covariates. When I run this code, I get a message saying that "year" is dropped because of collinearity with fixed effects. The problem is that when I run this without any other variables (no fixed effects, no controls) I still get a message that year is dropped due to collinearity. This doesn't happen if I run this using reghdfe or ppml without fixed effects. Note that I am able to run this model with year dummies (i.year) successfully!

    Any ideas why this might be happening? There are 4 years of data, 129 facilities each year.

  • #2
    Dear Yevgeniy Feyman,

    Please show us the results with the different estimators so that we can comment on it.

    Best wishes,

    Joao

    Comment


    • #3
      Thanks for the response Joao Santos Silva

      I've included the results from the log file below. Because this is using protected data, I can't share all coefficients and variable names, but I've included what I could.

      PPMLHDFE, with facility FE
      Code:
      . ppmlhdfe died year `controls' if year!=2020, offset(ln_survivor) absorb(facility)
      note: year is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-06)
      Iteration 1:   deviance = 1.2555e+04  eps = .         iters = 1    tol = 1.0e-04  min(eta) =  -0.96  P   
      Iteration 2:   deviance = 7.2185e+03  eps = 7.39e-01  iters = 1    tol = 1.0e-04  min(eta) =  -1.25      
      Iteration 3:   deviance = 7.2001e+03  eps = 2.55e-03  iters = 1    tol = 1.0e-04  min(eta) =  -1.27      
      Iteration 4:   deviance = 7.2001e+03  eps = 5.71e-08  iters = 1    tol = 1.0e-04  min(eta) =  -1.27      
      Iteration 5:   deviance = 7.2001e+03  eps = 8.60e-16  iters = 1    tol = 1.0e-05  min(eta) =  -1.27   S O
      ------------------------------------------------------------------------------------------------------------
      (legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance)
      Converged in 5 iterations and 5 HDFE sub-iterations (tol = 1.0e-08)
      
      PPML regression                                   No. of obs      =        516
                                                        Residual df     =        464
                                                        Wald chi2(51)   =    7495.94
      Deviance             =  7200.080291               Prob > chi2     =     0.0000
      Log pseudolikelihood = -6093.557735               Pseudo R2       =     0.9747
      ---------------------------------------------------------------------------------
                      |               Robust
                 died |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      ----------------+----------------------------------------------------------------
                 year |          0  (omitted)
        ...
                _cons |  -2.073217   .5057867    -4.10   0.000    -3.064541   -1.081893
      ---------------------------------------------------------------------------------
      No facility fixed effects, or any other variables

      Code:
      . ppmlhdfe died year if year!=2020, offset(ln_survivor)
      note: year is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-06)
      Iteration 1:   deviance = 5.8633e+04  eps = .         iters = 1    tol = 1.0e-04  min(eta) =  -1.30  P   
      Iteration 2:   deviance = 5.3110e+04  eps = 1.04e-01  iters = 1    tol = 1.0e-04  min(eta) =  -1.36      
      Iteration 3:   deviance = 5.3105e+04  eps = 8.73e-05  iters = 1    tol = 1.0e-04  min(eta) =  -1.36      
      Iteration 4:   deviance = 5.3105e+04  eps = 6.64e-11  iters = 1    tol = 1.0e-05  min(eta) =  -1.36      
      Iteration 5:   deviance = 5.3105e+04  eps = 1.17e-16  iters = 1    tol = 1.0e-06  min(eta) =  -1.36   S O
      ------------------------------------------------------------------------------------------------------------
      (legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance)
      Converged in 5 iterations and 5 HDFE sub-iterations (tol = 1.0e-08)
      
      PPML regression                                   No. of obs      =        516
                                                        Residual df     =        515
                                                        Wald chi2(0)    =          .
      Deviance             =  53105.10649               Prob > chi2     =          .
      Log pseudolikelihood = -29046.07084               Pseudo R2       =     0.8795
      ------------------------------------------------------------------------------
                   |               Robust
              died |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
              year |          0  (omitted)
             _cons |  -3.185151   .0093252  -341.57   0.000    -3.203428   -3.166874
       ln_survivor |          1  (offset)
      ------------------------------------------------------------------------------
      Year included as dummies

      Code:
      . ppmlhdfe died i.year `controls' if year!=2020, offset(ln_survivor) absorb(facility)
      Iteration 1:   deviance = 7.2930e+03  eps = .         iters = 1    tol = 1.0e-04  min(eta) =  -0.70  P   
      Iteration 2:   deviance = 1.4683e+03  eps = 3.97e+00  iters = 1    tol = 1.0e-04  min(eta) =  -1.08      
      Iteration 3:   deviance = 1.4040e+03  eps = 4.58e-02  iters = 1    tol = 1.0e-04  min(eta) =  -1.17      
      Iteration 4:   deviance = 1.4040e+03  eps = 5.14e-05  iters = 1    tol = 1.0e-04  min(eta) =  -1.17      
      Iteration 5:   deviance = 1.4040e+03  eps = 1.73e-10  iters = 1    tol = 1.0e-05  min(eta) =  -1.17      
      Iteration 6:   deviance = 1.4040e+03  eps = 1.38e-16  iters = 1    tol = 1.0e-06  min(eta) =  -1.17   S O
      ------------------------------------------------------------------------------------------------------------
      (legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance)
      Converged in 6 iterations and 6 HDFE sub-iterations (tol = 1.0e-08)
      
      HDFE PPML regression                              No. of obs      =        516
      Absorbing 1 HDFE group                            Residual df     =        333
                                                        Wald chi2(54)   =     479.64
      Deviance             =  1403.959419               Prob > chi2     =     0.0000
      Log pseudolikelihood = -3195.497299               Pseudo R2       =     0.9867
      ---------------------------------------------------------------------------------
                      |               Robust
                 died |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      ----------------+----------------------------------------------------------------
                 year |
                2017  |  -.0861966   .0293674    -2.94   0.003    -.1437556   -.0286377
                2018  |  -.1131845   .0330952    -3.42   0.001      -.17805    -.048319
                2019  |  -.1211958   .0395994    -3.06   0.002    -.1988092   -.0435824
      ...
                _cons |  -2.009282   5.636472    -0.36   0.721    -13.05656       9.038
      ---------------------------------------------------------------------------------
      No fixed effects, using -ppml-

      Code:
      . ppml died year `controls' if year!=2020, offset(ln_survivor)
      
      note: checking the existence of the estimates
      WARNING: year has very large values, consider rescaling  or recentering
      WARNING: drivedistancepc has very large values, consider rescaling  or recentering
      
      Number of regressors excluded to ensure that the estimates exist: 0
      Number of observations excluded: 0
      
      note: starting ppml estimation
      
      Iteration 1:   deviance =   12241.7
      Iteration 2:   deviance =  7012.154
      Iteration 3:   deviance =  6994.436
      Iteration 4:   deviance =  6994.435
      Iteration 5:   deviance =  6994.435
      
      Number of parameters: 53
      Number of observations: 516
      Pseudo log-likelihood: -15702158
      R-squared: 4.496e-06
      Option strict is: off
      ---------------------------------------------------------------------------------
                      |               Robust
                 died |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      ----------------+----------------------------------------------------------------
                 year |  -.0240183   .0057341    -4.19   0.000     -.035257   -.0127796
          ...
                _cons |   46.21975   11.53539     4.01   0.000     23.61081    68.82869
      ---------------------------------------------------------------------------------

      Comment


      • #4
        Dear Yevgeniy Feyman,

        What is the offset variable? Do you still have the problem without the offset? Also, this may have nothing to do with it, but the R2 in PPML is incredibly low; is there a reason for that (not that the R2 is important, but such low value may indicate a problem somewhere)?

        Best wishes,

        Joao

        Comment


        • #5
          Hi Joao Santos Silva ,

          The offset variable is the natural log of the number of patients alive at that facility at the beginning of that calendar year (patients are attributed to a facility).

          Removing it still makes year drop out.

          That's an interesting point about the R2 in PPML. I'm not sure why it would be that low. It's possible that the fixed effects do a LOT of the heavy lifting. (That's fine for my purposes because I'm using this to generate observed/expected ratios rather than inferring causality for any one variable.)

          Comment


          • #6
            Hi Sergio Correia, any thoughts on why this might be happening? Could it be a bug with PPMLHDFE?

            Comment


            • #7
              Hi Yevgeniy,

              First, what versions of ppmlhdfe and reghdfe are you using? ("which <packagename>" does the trick).

              Second, if the offset() does not matter, then this also causes an error no?


              Code:
              . ppmlhdfe died year if year!=2020
              I would first try to remove the -if- condition to ensure there's no problem with the sample selection:

              Code:
              drop if year==2020
              tab1 died year, m
              ppmlhdfe died year
              Here, it would be useful to have an idea of what are the results of the tabulation, to be sure everything looks ok.

              Also, you can try racking up the tolerance

              Code:
              ppmlhdfe died year, tol(1e-10)
              I also noted that the message "<var> is probably collinear with the fixed effects" is actually from reghdfe, which gets called by ppmlhdfe. So you can try

              Code:
              reghdfe died year, tol(1e-10)
              To see if you get the same error.

              Lastly, the other thing I can think of is to run the ppmlhdfe command with the "verbose(1)" or "verbose(2)" options, to see if anything stands out. Usually the output of verbose is just technical stuff about variable parsing ,so there shouldn't be any problem in sharing this

              -Sergio

              PS: apologies in advance for the non-straightforward debugging advice. It's really hard to remote debug something with confidential data, because without an example I can reproduce I can't really tell if its a bug or not.

              Comment


              • #8
                Thanks for the quick response Sergio Correia !

                The version of ppmlhdfe is 2.3.0 from Feb 25 2021.
                The version of reghdfe is 6.12.1 from June 27 2021.

                Second, if the offset() does not matter, then this also causes an error no?
                That's right. The error is produced with and without the offset variable.

                I would first try to remove the -if- condition to ensure there's no problem with the sample selection:
                Good idea! This doesn't change anything unfortunately.

                Here, it would be useful to have an idea of what are the results of the tabulation, to be sure everything looks ok.
                This produces a fairly large table of unique values of "died." I can share that but it's quite substantial. It is a balanced panel (130 facilities each year).

                The ppmlhdfe gives the same error message here too.

                Also, you can try racking up the tolerance
                Same error here.

                I also noted that the message "<var> is probably collinear with the fixed effects" is actually from reghdfe, which gets called by ppmlhdfe. So you can try
                Reghdfe does successfully estimate this! No error messages.

                Lastly, I've pasted the ppmlhdfe verbose(2) log below:

                Code:
                . ppmlhdfe died year, tol(1e-10) verbose(2)
                
                - Techniques used for detecting and fixing separation: fe simplex relu
                
                ## Parsing varlist: died year
                
                macros:
                           r(basevars) : "died year"
                          r(indepvars) : "year"
                          r(fe_format) : "%9.0g"
                             r(depvar) : "died"
                
                ## Parsing vce()
                
                macros:
                       s(num_clusters) : "0"
                            s(vcetype) : "unadjusted"
                
                - Parsing absorb() and creating HDFE object:
                
                - Parsing absorb() and creating HDFE object:
                
                ## Parsing absvars and HDFE options
                
                macros:
                       s(precondition) : "1"
                           s(poolsize) : "."
                        s(compute_rre) : "0"
                     s(dofadjustments) : "pairwise clusters continuous"
                    s(report_constant) : "1"
                                  s(G) : "1"
                      s(has_intercept) : "1"
                        s(save_any_fe) : "0"
                        s(save_all_fe) : "0"
                            s(absvars) : " """
                              s(ivars) : "_cons"
                              s(cvars) : " """
                            s(targets) : " """
                         s(intercepts) : "1"
                         s(num_slopes) : "0"
                
                ## Initializing Mata object for 1 fixed effects
                
                   +-----------------------------------------------------------------------------------+
                   |  i | g |  Name | Int? | #Slopes |    Obs.   |   Levels   | Sorted? | #Drop Singl. |
                   |----+---+-------+------+---------+-----------+------------+---------+--------------|
                   |  1 | 1 |       | Yes  |    0    |       520 |          1 |     Yes |          0   |
                   +-----------------------------------------------------------------------------------+
                
                ## Initializing panelsetup() for each fixed effect
                
                   - panelsetup()
                ## Loading weights [iweight=died]
                ## Sorting weights for each absvar
                   - loading iweight weight from variable died
                   - sorting weight for factor 
                
                ## Saving e(sample)
                
                - Loading regression variables into Mata
                
                macros:
                        r(not_omitted) : "1"
                            r(varlist) : "year"
                     r(fullvarlist_bn) : "year"
                        r(fullvarlist) : "year"
                 @@ Standardizing variables
                 @@ Removing collinear variables
                 $$ - Finding separated variables
                ## Loading weights [iweight=died]
                ## Sorting weights for each absvar
                   - loading iweight weight from variable died
                   - sorting weight for factor 
                
                 $$ No boundary observations (y=0), no separation tests required.
                 @@ Starting GLM::solve
                 @@ Setting initial values
                ## Loading weights [aweight=<placeholder for mu>]
                ## Sorting weights for each absvar
                   - loading aweight weight from variable <placeholder for mu>
                   - sorting weight for factor 
                 @@ Starting IRLS
                    Target HDFE tolerance:1.00e-11 
                 @@@ HDFE.update_sorted_weights()
                   - loading aweight weight from variable <placeholder for mu>
                   - sorting weight for factor 
                 @@@ HDFE._partial_out()
                   - Running solver (acceleration=none, transform=symmetric_kaczmarz tol=1.0e-04)
                   - Iterating:note: year is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-06)
                 @@@ reghdfe_solve_ols()
                  0
                 @@@ updating eta/mu/deviance
                Iteration 1:   deviance = 4.9777e+05  eps = .         iters = 1    tol = 1.0e-04  min(eta) =   0.55  P   
                 @@@ HDFE.update_sorted_weights()
                   - loading aweight weight from variable <placeholder for mu>
                   - sorting weight for factor 
                 @@@ HDFE._partial_out()
                 @@@ reghdfe_solve_ols()
                  0
                 @@@ updating eta/mu/deviance
                Iteration 2:   deviance = 4.9517e+05  eps = 5.25e-03  iters = 1    tol = 1.0e-04  min(eta) =   0.51      
                 @@@ HDFE.update_sorted_weights()
                   - loading aweight weight from variable <placeholder for mu>
                   - sorting weight for factor 
                 @@@ HDFE._partial_out()
                 @@@ reghdfe_solve_ols()
                  0
                 @@@ updating eta/mu/deviance
                Iteration 3:   deviance = 4.9517e+05  eps = 2.12e-06  iters = 1    tol = 1.0e-04  min(eta) =   0.51      
                 @@@ HDFE.update_sorted_weights()
                   - loading aweight weight from variable <placeholder for mu>
                   - sorting weight for factor 
                 @@@ HDFE._partial_out()
                 @@@ reghdfe_solve_ols()
                  0
                 @@@ updating eta/mu/deviance
                Iteration 4:   deviance = 4.9517e+05  eps = 3.66e-13  iters = 1    tol = 1.0e-05  min(eta) =   0.51      
                 @@@ HDFE.update_sorted_weights()
                   - loading aweight weight from variable <placeholder for mu>
                   - sorting weight for factor 
                 @@@ HDFE._partial_out()
                 @@@ reghdfe_solve_ols()
                
                ## Solving least-squares regression of partialled-out variables
                
                                  1              2
                    +-------------------------------+
                  1 |             0   -5.95983e-14  |
                    +-------------------------------+
                 @@@ updating eta/mu/deviance
                Iteration 5:   deviance = 4.9517e+05  eps = 2.01e-16  iters = 1    tol = 1.0e-07  min(eta) =   0.51   S O
                ------------------------------------------------------------------------------------------------------------
                (legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance)
                Converged in 5 iterations and 5 HDFE sub-iterations (tol = 1.0e-10)
                 @@ Computing DoF
                
                ## Estimating degrees-of-freedom absorbed by the fixed effects
                
                   - there are 1 fixed intercepts and slopes in the 1 absvars
                 @@ Computing final betas and standard errors
                
                ## Solving least-squares regression of partialled-out variables
                
                
                ## Estimating Robust Variance-Covariance Matrix of the Estimators (VCE)
                
                   - VCE type: robust
                   - Weight type: aweight
                   - Small-sample-adjustment: q = N / (N-df_m-df_a) = 520 / (520 - 0 - 1) = 1.00192678
                
                ## Adding _cons to varlist
                
                ## Saving e(sample)
                
                PPML regression                                   No. of obs      =        520
                                                                  Residual df     =        519
                                                                  Wald chi2(0)    =          .
                Deviance             =  495173.2955               Prob > chi2     =          .
                Log pseudolikelihood = -250094.1298               Pseudo R2       =    -0.0000
                ------------------------------------------------------------------------------
                             |               Robust
                        died |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                        year |          0  (omitted)
                       _cons |   7.982076   .0262533   304.04   0.000     7.930621    8.033532
                ------------------------------------------------------------------------------

                Comment


                • #9
                  Thanks for going through this. Grasping at straws here, but just to be sure, how many years are there in the sample? This almost looks like there is only one year (e.g. 2019) but the fact that reghdfe estimates this correctly puzzles me (can you show the output of "reghdfe year"?)

                  Also, can you share just year and randomized data on died?

                  EG, if you create fake data on died and send that fake data and the actual data on year maybe I can have an idea of what is going on? (this assumes the list of years is not confidential)

                  Comment


                  • #10
                    A few more thoughts:
                    1. This seems to happen when ppmlhdfe calls HDFE._partial_out() with IRLS weights. So it's not surprising there are no collinearities detected with reghdfe (which does not use weights) than with ppmlhdfe (which uses weights).
                    2. I was surprised that your note still had a tolerance of 1e-6 even though you stated tol(1e-10) ("year is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-06)". If I run something similar on my computer (on the same versions of ppmlhdfe/reghdfe) the warning error I get states correctly tol=1e-10 instead of tol=1e-6.
                    3. As a simple workaround, what if you directly edit the code and see what happens? More specifically, in Stata type "which reghdfe5.mata". Then, open that file in your favorite editor, and look for the line "kept2 = (diagonal(cross(y, y))' :/ kept2) :> (collinear_tol)" (line 1276). Replace that line with "kept2 = (diagonal(cross(y, y))' :/ kept2) :> 0" (i.e. replace collinear_tol with zero). This should effectively disable this check, so it shouldn't drop year anymore. Of course, be mindful that regressors that should be dropped might then stay in the regression, although this is usually very obvious to splot (you see regressors with huuge betas like 1e+10).
                    Best,
                    S

                    Comment


                    • #11
                      1. As a simple workaround, what if you directly edit the code and see what happens? More specifically, in Stata type "which reghdfe5.mata". Then, open that file in your favorite editor, and look for the line "kept2 = (diagonal(cross(y, y))' :/ kept2) :> (collinear_tol)" (line 1276). Replace that line with "kept2 = (diagonal(cross(y, y))' :/ kept2) :> 0" (i.e. replace collinear_tol with zero). This should effectively disable this check, so it shouldn't drop year anymore. Of course, be mindful that regressors that should be dropped might then stay in the regression, although this is usually very obvious to splot (you see regressors with huuge betas like 1e+10).
                      This option worked great! Thanks so much.

                      PPML and PPMLHDFE both give the same coefficient for year now (~0.02). Surprising that it was being dropped though because that seems like a very small coefficient.

                      Comment


                      • #12
                        As a simple workaround, what if you directly edit the code and see what happens? More specifically, in Stata type "which reghdfe5.mata". Then, open that file in your favorite editor, and look for the line "kept2 = (diagonal(cross(y, y))' :/ kept2) :> (collinear_tol)" (line 1276). Replace that line with "kept2 = (diagonal(cross(y, y))' :/ kept2) :> 0" (i.e. replace collinear_tol with zero). This should effectively disable this check, so it shouldn't drop year anymore. Of course, be mindful that regressors that should be dropped might then stay in the regression, although this is usually very obvious to splot (you see regressors with huuge betas like 1e+10).
                        Dear Sergio Correia,
                        I also have the problem that the trend variable gets dropped when using ppmlhdfe.
                        I'm trying to run the following code:
                        Code:
                        ppmlhdfe F6CitesAcc Cites_2YearL died c.year , absorb(i.paper)  vce(cluster paper author_id_Sole)
                        The variables have the following meaning:
                        • F6CitesAcc is the number of citations received by a paper from year t to year t+6,
                        • died is a dummy variable taking the value 1 if the author of the paper is dead and 0 otherwise,
                        • Cites_2YearL is the number of citations received by a paper from year t-2 to year t-1,
                        • paper is an identifier for each paper,
                        • author_id_Sole is an identifier for each author.
                        I get the following result
                        Code:
                        . ppmlhdfe F6CitesAcc Cites_2YearL died c.year , absorb(i.paper)  vce(cluster paper author_id_Sole)
                        (dropped 3307 observations that are either singletons or separated by a fixed effect)
                        note: year is probably collinear with the fixed effects (all partialled-out values are close to zero;
                        > tol = 1.0e-06)
                        Iteration 1:   deviance = 1.3292e+04  eps = .         iters = 1    tol = 1.0e-04  min(eta) =  -3.61  P
                        >    
                        Iteration 2:   deviance = 4.2797e+03  eps = 2.11e+00  iters = 1    tol = 1.0e-04  min(eta) =  -4.52  
                        >    
                        Iteration 3:   deviance = 3.2794e+03  eps = 3.05e-01  iters = 1    tol = 1.0e-04  min(eta) =  -5.38  
                        >    
                        Iteration 4:   deviance = 3.1633e+03  eps = 3.67e-02  iters = 1    tol = 1.0e-04  min(eta) =  -6.11  
                        >    
                        Iteration 5:   deviance = 3.1549e+03  eps = 2.65e-03  iters = 1    tol = 1.0e-04  min(eta) =  -6.54  
                        >    
                        Iteration 6:   deviance = 3.1547e+03  eps = 6.20e-05  iters = 1    tol = 1.0e-04  min(eta) =  -6.66  
                        >    
                        Iteration 7:   deviance = 3.1547e+03  eps = 1.12e-07  iters = 1    tol = 1.0e-05  min(eta) =  -6.67  
                        >    
                        Iteration 8:   deviance = 3.1547e+03  eps = 8.76e-13  iters = 1    tol = 1.0e-06  min(eta) =  -6.67  
                        > S O
                        ------------------------------------------------------------------------------------------------------
                        > ------
                        (legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance)
                        Converged in 8 iterations and 8 HDFE sub-iterations (tol = 1.0e-08)
                        
                        HDFE PPML regression                              No. of obs      =      4,794
                        Absorbing 1 HDFE group                            Residual df     =        169
                        Statistics robust to heteroskedasticity           Wald chi2(2)    =     136.90
                        Deviance             =  3154.701608               Prob > chi2     =     0.0000
                        Log pseudolikelihood = -8405.615872               Pseudo R2       =     0.9478
                        
                        Number of clusters (paper)  =      1,006
                        Number of clusters (author_id_Sole)=       170
                                         (Std. Err. adjusted for 170 clusters in paper author_id_Sole)
                        ------------------------------------------------------------------------------
                                     |               Robust
                          F6CitesAcc |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                        Cites_2YearL |   .0021969   .0002726     8.06   0.000     .0016627    .0027311
                                died |   .4135046   .0481128     8.59   0.000     .3192052     .507804
                                year |          0  (omitted)
                               _cons |   4.552471   .0307828   147.89   0.000     4.492138    4.612804
                        ------------------------------------------------------------------------------
                        
                        Absorbed degrees of freedom:
                        -----------------------------------------------------+
                         Absorbed FE | Categories  - Redundant  = Num. Coefs |
                        -------------+---------------------------------------|
                               paper |      1006        1006           0    *|
                        -----------------------------------------------------+
                        * = FE nested within cluster; treated as redundant for DoF computation
                        I tried to apply the solution you suggested in post #10, which worked for Yevgeniy Feyman. However, it did not work for me, but I may be doing something wrong. Following your suggestion, I typed "which reghdfe5.mata" (without the quotes) on the Stata command line, but I got the following error message.
                        file reghdfe5.mata not found along ado-path
                        r(111);
                        Further comments: The problem does not appear if the fixed effects are dropped – i.e., with no absorb(i.paper).
                        I did follow the suggestions made in post #7 by increasing the tolerance, but the problem persists. The problem does not appear if year dummies are included instead the trend (i.e., i.year).
                        I am using Stata/MP 14.2
                        The version of ppmlhdfe is 2.2.0 02aug2019
                        I would be happy to provide more information if needed.

                        Comment


                        • #13
                          well, I meet similar problem of low R2 when I am using PPMLHDFE. I understand that R2 is not important, but lots of paper suggest high R2 in their PPML or PPMLHDFE result. So I wonder if I did something wrong? And I want to know the possible reasons cuz it’s weird

                          Comment


                          • #14
                            Dear Karen Jyo,

                            An obvious thing to check is that you are not using a dependent variable in logs (that happens sometimes!). Also, I suggest that you compute the R2 as the square of the correlation between the dependent variable and its fitted values. Is this R2 still low? Finally, it would halp if you could share you estimation results to help us try to figure out what is going on.

                            Best wishes,

                            Joao

                            Comment

                            Working...
                            X