Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Predicted margins are uninterpretable negative values after glm with binomial distribution

    Hello,

    My dependent variable (methylation level) is expressed as a proportion so I am fitting a glm model with family(binomial) link(logit) and robust variance estimation, based on the method proposed by Pake and Wooldrige (1996). For ease of interpretation, I would like to present the adjusted means and the adjusted mean difference between levels of my independent variable (overweight/obese vs. normal BMI), coded as 0 and 1. When I run the glm model followed by the margins command, one of the returned predicted means is a negative value, which is not possible for my data. So, my questions are, 1) Am I using the correct post estimation command? 2) Would I use the margins r. command to calculate the adjusted mean difference in proportions of my dependent variable? I have posted my code below, thank you very much for your help!

    glm methylation i.over_ob (other covariates), family(binomial) link(logit) robust

    margins over_ob (here is where I get a negative value for over_ob = 1)

    margins r.over_ob, contrast

  • #2
    Welcome to Statalist, Megan! Please read FAQ 14, on how to write questions (required reading!) which asks 1) that you show all results, not just commands; and 2) that you put these between CODE delimiters. First, please run and show the results of:
    Code:
    tab methylation
    sum methylation
    tab over_ob methylation, row
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      Thank you for your reply, and I apologize for not reading carefully enough! In my question above I used the variable name "methylation" for simplicity, but I really have over 500 sites that I am testing individually. Below is a representative example where the variable CAB39_177 is the methylation proportion for that particular gene/position. Most of my genes have a similar overall distribution of methylation values (highly skewed).

      Code:
       tab CAB39_177
      
              177 |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |         22       22.00       22.00
             .001 |         51       51.00       73.00
             .002 |         13       13.00       86.00
             .003 |          3        3.00       89.00
             .004 |          1        1.00       90.00
             .008 |          1        1.00       91.00
             .015 |          1        1.00       92.00
             .078 |          1        1.00       93.00
             .122 |          1        1.00       94.00
             .131 |          1        1.00       95.00
              .25 |          1        1.00       96.00
             .388 |          1        1.00       97.00
             .401 |          1        1.00       98.00
             .727 |          1        1.00       99.00
             .853 |          1        1.00      100.00
      ------------+-----------------------------------
            Total |        100      100.00
      
      
      sum CAB39_177
      
          Variable |       Obs        Mean    Std. Dev.       Min        Max
      -------------+--------------------------------------------------------
         CAB39_177 |       100      .03063    .1261203          0       .853
      
      
      tab over_ob CAB39_177, row
      
      
                 |                                     177
         over_ob |      .078       .122       .131        .25       .388       .401       .727 |     Total
      -----------+-----------------------------------------------------------------------------+----------
               0 |         1          1          1          1          1          1          1 |        60 
                 |      1.67       1.67       1.67       1.67       1.67       1.67       1.67 |    100.00 
      -----------+-----------------------------------------------------------------------------+----------
               1 |         0          0          0          0          0          0          0 |        40 
                 |      0.00       0.00       0.00       0.00       0.00       0.00       0.00 |    100.00 
      -----------+-----------------------------------------------------------------------------+----------
           Total |         1          1          1          1          1          1          1 |       100 
                 |      1.00       1.00       1.00       1.00       1.00       1.00       1.00 |    100.00 
      
      
                 |    177
         over_ob |      .853 |     Total
      -----------+-----------+----------
               0 |         1 |        60 
                 |      1.67 |    100.00 
      -----------+-----------+----------
               1 |         0 |        40 
                 |      0.00 |    100.00 
      -----------+-----------+----------
           Total |         1 |       100 
                 |      1.00 |    100.00

      Comment


      • #4
        I agree with Steve. It will help to see output. As a sidelight, I will add that, if you are using Stata 14, you can use the new fracreg command, e.g.

        Code:
        fracreg logit methylation i.over_ob (other covariates)
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 19.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          My post crossed with Megan's. Megan, now please show the output from your glm and margins commands.

          Not that it necessarily matters, but I take it much of the output from the tab command was deleted?
          Last edited by Richard Williams; 19 Oct 2015, 20:25.
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            Sure, please see the code and output below. CAB39_177 is my dependent variable, it represents in this case, the methylation proportion for the CAB39 gene at position 177. In looking through my old Stata code I think I may have been originally trying to predict means from a linear model, which was giving me the negative predicted values. If I run the code below, I get numbers that make sense, but is my interpretation correct? Does the adjusted mean methylation proportion = 0.078 in my normal weight group (over_ob = 0) and 0.00056 in my overweight/obese group (over_ob=1)? Similarly, can I interpret -0.077 as the adjusted mean difference in methylation proportion? Also, just for my understanding, how would you interpret the exponentiated Beta coefficient for over_ob in this model?

            Thank you again for your help, I really appreciate it!
            Code:
            glm CAB39_177 i.over_ob age parity smoking mom_race mr_babygender termwks, link(logit) family(binomial) robust nolog
            
            note: CAB39_177 has noninteger values
            
            Generalized linear models                          No. of obs      =        94
            Optimization     : ML                              Residual df     =        86
                                                               Scale parameter =         1
            Deviance         =  11.08930411                    (1/df) Deviance =  .1289454
            Pearson          =  14.27334541                    (1/df) Pearson  =  .1659691
            
            Variance function: V(u) = u*(1-u/1)                [Binomial]
            Link function    : g(u) = ln(u/(1-u))              [Logit]
            
                                                               AIC             =  .3596707
            Log pseudolikelihood = -8.904521402                BIC             =  -379.634
            
            -------------------------------------------------------------------------------
                          |               Robust
                CAB39_177 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            --------------+----------------------------------------------------------------
                1.over_ob |  -5.335388   .9859778    -5.41   0.000    -7.267869   -3.402907
                      age |     .25576   .1020443     2.51   0.012     .0557568    .4557632
                   parity |  -.3729316   1.058553    -0.35   0.725    -2.447657    1.701793
                  smoking |   .0190477   .4628575     0.04   0.967    -.8881363    .9262316
                 mom_race |  -2.017673   1.034765    -1.95   0.051    -4.045775    .0104291
            mr_babygender |  -1.309167   .7618347    -1.72   0.086    -2.802335     .184002
                  termwks |  -.0148079   .0226223    -0.65   0.513    -.0591469     .029531
                    _cons |   -8.12149   2.203107    -3.69   0.000     -12.4395    -3.80348
            -------------------------------------------------------------------------------
            
            margins over_ob
            
            Predictive margins                                Number of obs   =         94
            Model VCE    : Robust
            
            Expression   : Predicted mean CAB39_177, predict()
            
            ------------------------------------------------------------------------------
                         |            Delta-method
                         |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                 over_ob |
                      0  |   .0775873   .0320744     2.42   0.016     .0147226    .1404519
                      1  |   .0005566   .0002227     2.50   0.012       .00012    .0009932
            ------------------------------------------------------------------------------
            
            margins r.over_ob, contrast
            
            Contrasts of predictive margins
            Model VCE    : Robust
            
            Expression   : Predicted mean CAB39_177, predict()
            
            ------------------------------------------------
                         |         df        chi2     P>chi2
            -------------+----------------------------------
                 over_ob |          1        5.74     0.0166
            ------------------------------------------------
            
            --------------------------------------------------------------
                         |            Delta-method
                         |   Contrast   Std. Err.     [95% Conf. Interval]
            -------------+------------------------------------------------
                 over_ob |
               (1 vs 0)  |  -.0770307   .0321437     -.1400312   -.0140301
            --------------------------------------------------------------

            Comment


            • #7
              I don't know if it all makes substantive sense, but it certainly makes more sense than your original claim of a negative mean. I think your interpretations are correct.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              StataNow Version: 19.5 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                Incidentally, this is why it is so important to post code and output. Sometimes what people think they did or got is not what they actually did or got. Also what you you wanted to do may not be what you actually asked for.
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                StataNow Version: 19.5 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://www3.nd.edu/~rwilliam

                Comment

                Working...
                X