Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Margins after GLM with gamma distribution and log link

    Dear all,
    I run the following glm on cost data and I'm interested in the marginal effect of a categorical variable.

    glm totalcost i.y1 y2 i.y3 , family(gamma) link(log) robust

    where y1 is a categorical variable coded into three classes (A B C)

    Code:
    glm totalcost i.y1  y2  i.y3 , family(gamma) link(log) robust
    
    Iteration 0:   log pseudolikelihood = -2154.0474  
    Iteration 1:   log pseudolikelihood = -1983.0575  
    Iteration 2:   log pseudolikelihood = -1979.9279  
    Iteration 3:   log pseudolikelihood = -1979.8941  
    Iteration 4:   log pseudolikelihood = -1979.8941  
    
    Generalized linear models                          No. of obs      =       250
    Optimization     : ML                              Residual df     =       245
                                                       Scale parameter =   10.3413
    Deviance         =  975.3459595                    (1/df) Deviance =  3.981004
    Pearson          =  2533.619027                    (1/df) Pearson  =   10.3413
    
    Variance function: V(u) = u^2                      [Gamma]
    Link function    : g(u) = ln(u)                    [Log]
    
                                                       AIC             =  15.87915
    Log pseudolikelihood = -1979.894074                BIC             =  -377.412
    
    -------------------------------------------------------------------------------------
                        |               Robust
    totalcost           |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    --------------------+----------------------------------------------------------------
                     y1 |
                     B  |   .1712582   .4287108     0.40   0.690    -.6689995    1.011516
                     C  |   1.384857   .5072588     2.73   0.006     .3906478    2.379066
                        |
                     y2 |   .0417224    .014174     2.94   0.003     .0139419    .0695029
                        |
                     y3 |
                   yes  |  -.0345493    .436378    -0.08   0.937    -.8898344    .8207358
                  _cons |    4.20057   .8834935     4.75   0.000     2.468954    5.932185
    -------------------------------------------------------------------------------------
    
    . margins  y1,
    
    Predictive margins                                Number of obs   =        250
    Model VCE    : Robust
    
    Expression   : Predicted mean costototalepaziente, predict()
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
              y1 |
              A  |   895.8442   289.5275     3.09   0.002     328.3806    1463.308
              B  |   1063.185   323.9021     3.28   0.001      428.349    1698.022
              C  |   3578.229   1460.978     2.45   0.014     714.7648    6441.693
    ------------------------------------------------------------------------------
    the difference between C and A is statistically significant in the glm model. But if I look at the confidence interval for the marginal effect of C, this is large and it largely overlaps with the marginal effect of A. The difference between C and A does not seem to be statistically significant. Is there something wrong in what I've done? Or am I misinterpreting the results?
    thanks

  • #2
    Marcella,
    welcome to the list.
    The fact that two CIs overlap doesn't necessarily implies the absence of evidence of a statistical significant difference.
    If you're interested in this topic, you can take a look at the valuable: van Belle G. Statistical Rules of Thumb. 2nd ed. Hoboken, NY: Wiley, 2008: 38-40.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Another reference for the phenomenon Carlo describes is:

      Andrew Gelman and Hall Stern (2006) The difference between "significant" and "not siginficant" is not itself statistically significant. The American Statistician, 60(4): 328-331.
      http://www.stat.columbia.edu/~gelman...ed/signif4.pdf
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        In addition to Carlo's reply, the significance of the marginal effects does not only depend on the significance of the regression coefficients:

        http://www.stata.com/statalist/archi.../msg00142.html

        the standard error of the marginal effect is obtained by the delta method, which means that the standard error for the marginal effect of one independent variable involves the whole variance-covariance matrix from the estimation together with the appropriate entries from the Jacobian. In other words, there is a lot of stuff that goes into the calculation beyond just the standard error of the coefficient of that variable, and these other things can cause it to be greater than 0.05, even when the coefficient standard error is less than 0.05.

        Comment


        • #5
          What you wanted to type is

          Code:
          margins y1, dydx
          What you typed are the predicted means for each setting of A, with other variables averaged out. You want the difference in means.The overlapping CIs does not mean the difference is not statistically significant.

          Comment


          • #6
            Thankyou all for the answers,


            As suggested by Jeff Wooldridge, I've tried this

            Code:
            . margins ,dydx(y1)
            
            Average marginal effects                          Number of obs   =        250
            Model VCE    : Robust
            
            Expression   : Predicted mean costototalepaziente, predict()
            dy/dx w.r.t. : 2.B 3.Ct
            
            ------------------------------------------------------------------------------
                         |            Delta-method
                         |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                      y1 |
                      B  |   167.3412   419.4134     0.40   0.690    -654.6941    989.3764
                      C  |   2682.385   1474.415     1.82   0.069     -207.415    5572.184
            ------------------------------------------------------------------------------
            Note: dy/dx for factor levels is the discrete change from the base level.
            The difference between C and A is not statistically significant, in contrast with the results of the glm model.
            I think it's a problem related with the calculation as suggested by Scott Merryman, is there an alternative way of calculating this values?
            thanks


            Comment

            Working...
            X