Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bug when linktest is used after glm

    There is a bug when the Stata command linktest is used after fitting a logistic regression model with the glm command. If used after the logit command it works correctly, calculating the linear predictor, squaring this and then fitting a new LOGISTIC regression model with the linear predictor and its square as predictors. However, if exactly the same logistic regression model is fitted using glm then linktest gives a different result. The result is erroneous, it relates to a LINEAR regression of the binary outcome on the linear predictor and its square from the logistic regression model. This is an inappropriate way to test for an incorrectly specified link function in the logistic regression model. This is all illustrated below. Can this be fixed?

    use https://www.stata-press.com/data/r17/auto, clear

    logit foreign mpg weight, nolog
    linktest, nolog

    predict xb, xb
    gen xb2= xb*xb
    logit foreign xb xb2, nolog
    * correctly agrees with linktest


    use https://www.stata-press.com/data/r17/auto, clear

    glm foreign mpg weight, family(binomial) nolog
    linktest, nolog

    predict xb, xb
    gen xb2= xb*xb
    glm foreign xb xb2, family(binomial) nolog
    * does not agree with linktest following glm
    * does agree with linktest following logit

    glm foreign xb xb2, family(gaussian) nolog
    * agrees with linktest following glm
    * but this is linear not logistic regression!

    Chris Frost

  • #2
    This isn't a bug, but it's also not immediately apparent what happened. To see that the results do agree, see the reproducible example below. Running the -linktest- following logistic regression automatically takes the link function into consideration such that you only need to type -linktest-. However, fitting the same model with -glm- requires that both the model and -linktest- be told which link function and family are required, because -linktest- does not automatically pick this up.

    The PDF documentation state, under the Technical Note for -linktest-

    You should specify the same options with linktest that you do with the estimation command [...]

    If you are not sure which options are important, duplicate exactly what you specified on the
    estimation command. (edits are my own)
    Which does tell you, indirectly, of the need to specify things like link functions and family specifications.

    To demonstrate this, I estimate a logistic regression model below using -glm-, and then run -linktest- with and without the family and link options specified. When they are omitted, it defaults to plain linear regression using -glm-, because -glm- was the estimation command and nothing else was inferred by -linktest- about it's syntax.

    Code:
    clear *
    cls
    
    sysuse auto, clear
    
    logit foreign mpg weight, nolog
    predict logit_xb, xb
    linktest, nolog
    logit foreign c.logit_xb##c.logit_xb, nolog
    reg foreign c.logit_xb##c.logit_xb  // wrong model
    
    
    glm foreign mpg weight, nolog fam(binomial) link(logit)
    predict glm_xb, xb
    assert float(logit_xb)==float(glm_xb)
    
    linktest, nolog // perceived error
    linktest, nolog fam(binomial) link(logit) // correct model specification
    glm foreign c.glm_xb##c.glm_xb, nolog fam(binomial) link(logit) // double check
    Selected results:

    Code:
    . logit foreign mpg weight, nolog
    
    Logistic regression                                     Number of obs =     74
                                                            LR chi2(2)    =  35.72
                                                            Prob > chi2   = 0.0000
    Log likelihood = -27.175156                             Pseudo R2     = 0.3966
    
    ------------------------------------------------------------------------------
         foreign | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             mpg |  -.1685869   .0919175    -1.83   0.067    -.3487418     .011568
          weight |  -.0039067   .0010116    -3.86   0.000    -.0058894    -.001924
           _cons |   13.70837   4.518709     3.03   0.002     4.851859    22.56487
    ------------------------------------------------------------------------------
    
    . linktest, nolog
    
    Logistic regression                                     Number of obs =     74
                                                            LR chi2(2)    =  36.83
                                                            Prob > chi2   = 0.0000
    Log likelihood = -26.615714                             Pseudo R2     = 0.4090
    
    ------------------------------------------------------------------------------
         foreign | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
            _hat |   .8438531   .2738759     3.08   0.002     .3070662     1.38064
          _hatsq |  -.1559115   .1568642    -0.99   0.320    -.4633596    .1515366
           _cons |   .2630557   .4299598     0.61   0.541      -.57965    1.105761
    ------------------------------------------------------------------------------
    
    . glm foreign mpg weight, nolog fam(binomial) link(logit)
    
    Generalized linear models                         Number of obs   =         74
    Optimization     : ML                             Residual df     =         71
                                                      Scale parameter =          1
    Deviance         =   54.3503124                   (1/df) Deviance =   .7654974
    Pearson          =  52.17160139                   (1/df) Pearson  =   .7348113
    
    Variance function: V(u) = u*(1-u)                 [Bernoulli]
    Link function    : g(u) = ln(u/(1-u))             [Logit]
    
                                                      AIC             =   .8155448
    Log likelihood   =  -27.1751562                   BIC             =  -251.2383
    
    ------------------------------------------------------------------------------
                 |                 OIM
         foreign | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             mpg |  -.1685869   .0919175    -1.83   0.067    -.3487418     .011568
          weight |  -.0039067   .0010116    -3.86   0.000    -.0058894    -.001924
           _cons |   13.70837   4.518709     3.03   0.002     4.851859    22.56487
    ------------------------------------------------------------------------------
    
    . linktest, nolog // perceived error
    
    Generalized linear models                         Number of obs   =         74
    Optimization     : ML                             Residual df     =         71
                                                      Scale parameter =   .1268094
    Deviance         =  9.003470464                   (1/df) Deviance =   .1268094
    Pearson          =  9.003470464                   (1/df) Pearson  =   .1268094
    
    Variance function: V(u) = 1                       [Gaussian]
    Link function    : g(u) = u                       [Identity]
    
                                                      AIC             =   .8125032
    Log likelihood   = -27.06261708                   BIC             =  -296.5852
    
    ------------------------------------------------------------------------------
                 |                 OIM
         foreign | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
            _hat |   .1833813   .0320402     5.72   0.000     .1205836     .246179
          _hatsq |   .0160177   .0068068     2.35   0.019     .0026765    .0293588
           _cons |   .4747777   .0523313     9.07   0.000     .3722103    .5773451
    ------------------------------------------------------------------------------
    
    . linktest, nolog fam(binomial) link(logit) // correct model specification
    
    Generalized linear models                         Number of obs   =         74
    Optimization     : ML                             Residual df     =         71
                                                      Scale parameter =          1
    Deviance         =  53.23142752                   (1/df) Deviance =   .7497384
    Pearson          =  48.06852432                   (1/df) Pearson  =   .6770215
    
    Variance function: V(u) = u*(1-u)                 [Bernoulli]
    Link function    : g(u) = ln(u/(1-u))             [Logit]
    
                                                      AIC             =   .8004247
    Log likelihood   = -26.61571376                   BIC             =  -252.3572
    
    ------------------------------------------------------------------------------
                 |                 OIM
         foreign | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
            _hat |   .8438531   .2738759     3.08   0.002     .3070662     1.38064
          _hatsq |  -.1559115   .1568642    -0.99   0.320    -.4633596    .1515366
           _cons |   .2630557   .4299598     0.61   0.541      -.57965    1.105761
    ------------------------------------------------------------------------------

    Comment


    • #3
      Dear Leonardo

      Thank you for pointing out how linktest needs to be used after glm. I also now see that this issue is mentioned under options and in a technical note in the manual.

      However, I still think this is a bug. At the very least it is an unnecessary trap for the unwary (and clearly this includes me). The whole purpose of a link test is to test the fit of the model that has just been fitted. Using the linear predictor from a GLM and its square in a GLM with a different link function is clearly not a valid test of the adequacy of the GLM that has just been fitted. It serves no purpose at all. The default (arguably the only) option for linktest should be to use the same link function and family options as used in the model just fitted. When one uses other post-estimation commands such as predict (or test or lincom or margins or any number of other commands) after glm one doesn't have to respecify the link function, as this is (sensibly) deemed unnecessary. Why should it be required when using linktest?

      Chris

      Comment


      • #4
        I agree with you that this behaviour is confusing and the default should be to pick the same link function and family following glm. However, I don't think it qualifies as a bug because this behaviour is documented and expected. It is also hinted in the documentation that this is a goodness of link test, so the link function should be flexible, especially where binary models are concerned (e.g., logit, probit and cloglog).

        Comment

        Working...
        X