Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata's log-likelihood and likelihood ratio test for gamma GLM

    I'm trying to make sense of the log-likelihood reported by Stata for a glm with family(gamma), which doesn't match that produced by R/statsmodels and seems to give a strange likelihood ratio test result.

    As a minimal example, I am generating some random gamma-distributed response values y, whose mean varies with the predictor variable x:

    Code:
    . set obs 1000
    . generate x = mod(_n, 2)
    . set seed 12345
    . generate y = rgamma(500, 2)
    . replace y = rgamma(500, 2.1) if x == 1
    The means are significantly different according to x: a t-test is significant, and fitting a log-normal GLM shows a significant coefficient for x, and a significant overall likelihood ratio test:

    Code:
    . ttest y, by(x)
    
    Two-sample t test with equal variances
    ------------------------------------------------------------------------------
       Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
    ---------+--------------------------------------------------------------------
           0 |     500    1001.813     1.94138    43.41059     997.999    1005.628
           1 |     500    1049.878    2.138717    47.82317    1045.676     1054.08
    ---------+--------------------------------------------------------------------
    combined |   1,000    1025.846    1.631506    51.59274    1022.644    1029.047
    ---------+--------------------------------------------------------------------
        diff |           -48.06466    2.888437               -53.73277   -42.39655
    ------------------------------------------------------------------------------
        diff = mean(0) - mean(1)                                      t = -16.6404
    Ho: diff = 0                                     degrees of freedom =      998
    
        Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
     Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000
    
    . glm y x, family(gaussian) link(log)
    [...]
    Generalized linear models                         Number of obs   =      1,000
    Optimization     : ML                             Residual df     =        998
                                                      Scale parameter =   2085.767
    Deviance         =   2081595.82                   (1/df) Deviance =   2085.767
    Pearson          =   2081595.82                   (1/df) Pearson  =   2085.767
    
    Variance function: V(u) = 1                       [Gaussian]
    Link function    : g(u) = ln(u)                   [Log]
    
                                                      AIC             =   10.48277
    Log likelihood   = -5239.383583                   BIC             =    2074702
    
    ------------------------------------------------------------------------------
                 |                 OIM
               y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               x |   .0468623    .002818    16.63   0.000     .0413391    .0523854
           _cons |   6.909567   .0020387  3389.14   0.000     6.905571    6.913563
    ------------------------------------------------------------------------------
    
    . estimates store m1
    . glm y, family(gaussian) link(log)
    [...]
    . estimates store m2
    . lrtest m1 m2
    
    Likelihood-ratio test                                 LR chi2(1)  =    244.87
    (Assumption: m2 nested in m1)                         Prob > chi2 =    0.0000
    However, curiously, if I fit a gamma GLM, even though the coefficient for x is significant, the likelihood ratio test is very much not:

    Code:
    . glm y x, family(gamma) link(log)
    [...]
    Generalized linear models                         Number of obs   =      1,000
    Optimization     : ML                             Residual df     =        998
                                                      Scale parameter =   .0019763
    Deviance         =  1.968478435                   (1/df) Deviance =   .0019724
    Pearson          =  1.972334751                   (1/df) Pearson  =   .0019763
    
    Variance function: V(u) = u^2                     [Gamma]
    Link function    : g(u) = ln(u)                   [Log]
    
                                                      AIC             =      15.87
    Log likelihood   = -7932.998033                   BIC             =  -6891.971
    
    ------------------------------------------------------------------------------
                 |                 OIM
               y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               x |   .0468622   .0028138    16.65   0.000     .0413472    .0523772
           _cons |   6.909567   .0019897  3472.72   0.000     6.905667    6.913466
    ------------------------------------------------------------------------------
    
    . estimates store m1
    . glm y, family(gamma) link(log)
    [...]
    . estimates store m2
    . lrtest m1 m2
    
    Likelihood-ratio test                                 LR chi2(1)  =      0.55
    (Assumption: m2 nested in m1)                         Prob > chi2 =    0.4587
    If I perform the same regression in R or statsmodels, the likelihood ratio test is significant (p < 0.001).

    I am wondering if this is to do with what is on page 33 of the glm manual, which says Stata calculates the log-likelihood for the j-th observation (for the gamma family) as -(yj/μj) + ln(1/μj).

    It's my understanding that the true log-likelihood for the gamma family would be -(k*yj/μj) + k*ln(k*yj/μj) - ln(Γ(k)) - ln(yj), given the shape parameter k, which is the formula statsmodels uses. So Stata's formula would be the case for k = 1. But because k in this case is >> 1 (it is 500), this would suggest that Stata's likelihood ratio statistic is underestimated. When I change the setup to instead have k ≈ 0, the opposite happens, and Stata's likelihood ratio test indicates significance when R/statsmodels does not, even when I draw values for y independently of x.

    Have I missed something here? I could not find any previous discussion about likelihood ratio tests on Stata's gamma GLMs.

  • #2
    You're right: glm here defines likelihood only in terms of the mean. The other parameter is treated as ancillary. So, your R calculation jumps the other way and is not directly comparable.

    Comment

    Working...
    X