Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • fracreg pseudo R square

    Dear community,

    I am modelling a fractional response regression on aggregated country-level data.
    A pairwise correlation of the independent variable (per capita alcohol consumption) and the outcome variable (percentage deaths attributable to alcoholic cardiomyopathy among all cardiomyopathic deaths) yields the following:

    Code:
    . pwcorr  apc_total acd_perc100
    
                 | apc_to~l acd_~100
    -------------+------------------
       apc_total |   1.0000
     acd_perc100 |   0.4925   1.0000
    In a usual linear regression, the R² would consequently be something around .25. However, in a fractional response model, the corresponding pseudo R² amounts to something lower than .1. To be more precisely:

    Code:
    . fracreg logit acd_perc100 apc_total
    
    Iteration 0:   log pseudolikelihood = -28.239363  
    Iteration 1:   log pseudolikelihood =  -10.12332  
    Iteration 2:   log pseudolikelihood = -9.7813189  
    Iteration 3:   log pseudolikelihood =  -9.775209  
    Iteration 4:   log pseudolikelihood = -9.7752052  
    Iteration 5:   log pseudolikelihood = -9.7752052  
    
    Fractional logistic regression                  Number of obs     =         39
                                                    Wald chi2(1)      =      31.27
                                                    Prob > chi2       =     0.0000
    Log pseudolikelihood = -9.7752052               Pseudo R2         =     0.0606
    
    ------------------------------------------------------------------------------
                 |               Robust
     acd_perc100 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
       apc_total |   .2229683   .0398757     5.59   0.000     .1448134    .3011232
           _cons |  -4.902157   .4217877   -11.62   0.000    -5.728846   -4.075468
    ------------------------------------------------------------------------------
    As I could not figure out which kind of Pseudo R² the fracreg command calculates (and fitstat does not work with fracreg), I find it hard to interpret this figure. I could not find any details in the respective documentation.
    This leads me to a more general question: Is there any way to get details on the formulae used by the respective commands? Or do I have to search in the references given in the documentation?

    Many thanks!
    Jakob

  • #2
    It is the same pseudo r2 Stata pretty much always uses. The only trick is ll0 is not obvious in the output.

    Code:
    webuse 401k, clear
    * get ll0
    fracreg probit prate
    scalar ll0 = -1787.5477
    * get LL
    fracreg probit prate mrate c.ltotemp##c.ltotemp c.age##c.age i.sole
    scalar ll = -1674.6232
    di "pseudo r2 = " 1 - (ll/ ll0)
    Results:

    Code:
    . webuse 401k, clear
    
    . 
    . * get ll0
    
    . 
    . fracreg probit prate
    
    Iteration 0:   log pseudolikelihood = -1810.0397  
    Iteration 1:   log pseudolikelihood = -1787.5784  
    Iteration 2:   log pseudolikelihood = -1787.5477  
    Iteration 3:   log pseudolikelihood = -1787.5477  
    
    Fractional probit regression                    Number of obs     =      4,075
    Log pseudolikelihood = -1787.5477               Pseudo R2         =     0.0000
    
    ------------------------------------------------------------------------------
                 |               Robust
           prate |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           _cons |   .9969556   .0121009    82.39   0.000     .9732383    1.020673
    ------------------------------------------------------------------------------
    
    . 
    . scalar ll0 = -1787.5477
    
    . 
    . * get LL
    
    . 
    . fracreg probit prate mrate c.ltotemp##c.ltotemp c.age##c.age i.sole
    
    Iteration 0:   log pseudolikelihood = -1769.6832  
    Iteration 1:   log pseudolikelihood = -1675.2763  
    Iteration 2:   log pseudolikelihood = -1674.6234  
    Iteration 3:   log pseudolikelihood = -1674.6232  
    Iteration 4:   log pseudolikelihood = -1674.6232  
    
    Fractional probit regression                    Number of obs     =      4,075
                                                    Wald chi2(6)      =     815.88
                                                    Prob > chi2       =     0.0000
    Log pseudolikelihood = -1674.6232               Pseudo R2         =     0.0632
    
    -------------------------------------------------------------------------------------
                        |               Robust
                  prate |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    --------------------+----------------------------------------------------------------
                  mrate |   .5859715   .0387616    15.12   0.000     .5100002    .6619429
                ltotemp |  -.6102767   .0615052    -9.92   0.000    -.7308246   -.4897288
                        |
    c.ltotemp#c.ltotemp |   .0313576    .003975     7.89   0.000     .0235667    .0391484
                        |
                    age |   .0273266   .0031926     8.56   0.000     .0210691     .033584
                        |
            c.age#c.age |  -.0003159   .0000875    -3.61   0.000    -.0004874   -.0001443
                        |
                   sole |
             only plan  |   .0683196   .0272091     2.51   0.012     .0149908    .1216484
                  _cons |    3.25991   .2323929    14.03   0.000     2.804429    3.715392
    -------------------------------------------------------------------------------------
    
    . 
    . scalar ll = -1674.6232
    
    . 
    . di "pseudo r2 = " 1 - (ll/ ll0)
    pseudo r2 = .06317286
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    Stata Version: 17.0 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      FYI, e(ll) and e(ll_0) are stored in the ereturned results. So you don't have to run two models like I did to get the numbers.

      Code:
      . webuse 401k, clear
      
      . fracreg probit prate mrate c.ltotemp##c.ltotemp c.age##c.age i.sole, nolog
      
      
      Fractional probit regression                    Number of obs     =      4,075
                                                      Wald chi2(6)      =     815.88
                                                      Prob > chi2       =     0.0000
      Log pseudolikelihood = -1674.6232               Pseudo R2         =     0.0632
      
      -------------------------------------------------------------------------------------
                          |               Robust
                    prate |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      --------------------+----------------------------------------------------------------
                    mrate |   .5859715   .0387616    15.12   0.000     .5100002    .6619429
                  ltotemp |  -.6102767   .0615052    -9.92   0.000    -.7308246   -.4897288
                          |
      c.ltotemp#c.ltotemp |   .0313576    .003975     7.89   0.000     .0235667    .0391484
                          |
                      age |   .0273266   .0031926     8.56   0.000     .0210691     .033584
                          |
              c.age#c.age |  -.0003159   .0000875    -3.61   0.000    -.0004874   -.0001443
                          |
                     sole |
               only plan  |   .0683196   .0272091     2.51   0.012     .0149908    .1216484
                    _cons |    3.25991   .2323929    14.03   0.000     2.804429    3.715392
      -------------------------------------------------------------------------------------
      
      . di "pseudo r2 = " 1 - (e(ll)/ e(ll_0))
      pseudo r2 = .06317286
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      Stata Version: 17.0 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        Thanks a lot.
        Based on your response, I would assume that it is a mere comparison of the achieved LogLikelhoods compared to the Null-model. It appears to be McFadden's calculation of R square.
        However, where is this kind of information documented?

        Comment


        • #5
          I can't find the documentation either. I would have expected to find it in the .pdf manual in the section "methods and formulas", but it is not there.

          An imperfect solution is that you can look inside the code. What I did is typed viewsource fracreg.ado. Inside that code I found that it refers to _fractional_estimates.ado, so I typed viewsource _fractional_estimates.ado. Most of the estimation is done in Mata, but I found the lines:

          Code:
                  if !missing(e(ll_0)) {
                          ereturn scalar r2_p = 1 - e(ll)/e(ll_0)
                  }
          By the way, I assume you are familiar with the ecological fallacy, and that you won't draw a conclusion about an individual's chance of death due to drinking based on such aggregate data.
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            Thanks Maarten.
            I think this is a very user-unfriendly way to obtain the information but it helps me for now. In my opinion, the Stata documentation should inform users about the way estimates are arrived at. Especially for Pseudo R² estimates, I find this important to facilitate the correct interpretation.
            Thanks also for your comment on the content of this analysis. The model results will clearly not be used for inferences about individuals' risk but rather to obtain estimates for countries with lacking information on these deaths through the given covariates.

            Comment


            • #7
              It is documented in the help for maximize. Of course; where else would it be? ;-) In Stata 14.2, see p. 1483 of r.pdf.

              Let L1 be the log likelihood of the full model (that is, the log-likelihood value shown on the output), and let L0 be the log likelihood of the “constant-only” model. The likelihood-ratio χ2 model test is defined as 2(L1 − L0). The pseudo-R2 (McFadden 1974) is defined as 1 − L1/L0. This is simply the log likelihood on a scale where 0 corresponds to the “constant-only” model and 1 corresponds to perfect prediction for a discrete model (in which case the overall log likelihood is 0).
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              Stata Version: 17.0 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                In fairness to Stata, pseudo R2 appears in countless programs, so I can see why they don't repeat the formula over and over. I wouldn't have guessed that pseudo R2 would be covered as part of maximize, but if you look at the subject index it does tell you it is covered there. The index is in i.pdf.
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                Stata Version: 17.0 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://www3.nd.edu/~rwilliam

                Comment


                • #9
                  Just would like to add on, this page introduces a few other pseudo R^2s: https://stats.idre.ucla.edu/other/mu...do-r-squareds/

                  Comment

                  Working...
                  X