Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Estimating probability using margins after mixed effect logistic regression

    Dear statalists,

    I'm looking to generate the estimation of the outcome y (people reporting having gotten a certain healthcare) per region through the years.

    I am using the command:

    melogit y i.region age i.sex i.urban, or
    margins i.region, by(year)
    year#region Margin std. err. [95% conf. interval]
    2011#Region A 0.176543 0.00068 0.175211 0.177875
    2013#Region A 0.174814 0.000676 0.173489 0.17614
    2015#Region A 0.177897 0.00068 0.176565 0.17923
    2018#Region A 0.226306 0.000822 0.224695 0.227917
    I also tried using:

    melogit y i.region age i.sex i.urban if year==2013, or
    margins i.region

    I get the following:
    Region Margin std. err. [95% conf. interval]
    Region A 0.137029 0.001171 0.134733 0.139324

    There are 4 regions in total, but to spare the confusion i'm only giving the results given for Region A. As you can see, these two commands give very different results for Region A in 2013, for example. How do i know which one I should be using? Thanks.


    Edit: I'm using Stata 17.0.
    Last edited by Angeline Wong; 05 Sep 2023, 05:47.

  • #2
    The samples in the two regressions differ. Here

    melogit y i.region age i.sex i.urban if year==2013, or
    you are restricting the sample to only the year 2013 whereas here

    melogit y i.region age i.sex i.urban, or
    you are using all the data. So the estimated coefficients differ across the regressions. The margins are just the predicted values based on these estimates. See the following for how to calculate the margins "by hand":

    Code:
    sysuse auto, clear
    *SUBSAMPLE
    regress mpg weight i.foreign if rep78==3
    margins foreign if rep78==3, by(rep78) atmeans
    di (_b[1.foreign]*1)+_b[_cons]+ (_b[weight]*3299)
    
    *FULL SAMPLE
    regress mpg weight i.foreign
    margins foreign if rep78==3, by(rep78) atmeans
    di (_b[1.foreign]*1)+_b[_cons]+ (_b[weight]*3299)
    Res.:

    Code:
    . *SUBSAMPLE
    
    .
    . regress mpg weight i.foreign if rep78==3
    
          Source |       SS           df       MS      Number of obs   =        30
    -------------+----------------------------------   F(2, 27)        =     27.92
           Model |  335.255272         2  167.627636   Prob > F        =    0.0000
        Residual |  162.111394        27  6.00412572   R-squared       =    0.6741
    -------------+----------------------------------   Adj R-squared   =    0.6499
           Total |  497.366667        29  17.1505747   Root MSE        =    2.4503
    
    ------------------------------------------------------------------------------
             mpg | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
          weight |  -.0051142   .0007429    -6.88   0.000    -.0066385   -.0035899
                 |
         foreign |
        Foreign  |  -2.991368   1.831882    -1.63   0.114     -6.75008    .7673443
           _cons |   36.60429   2.600289    14.08   0.000     31.26893    41.93964
    ------------------------------------------------------------------------------
    
    .
    . margins foreign if rep78==3, by(rep78) atmeans
    
    Adjusted predictions                                        Number of obs = 30
    Model VCE: OLS
    
    Expression: Linear prediction, predict()
    Over:       rep78
    At: weight    = 3299 (mean)
        0.foreign =   .9 (mean)
        1.foreign =   .1 (mean)
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
         foreign |
       Domestic  |   19.73247   .4834206    40.82   0.000     18.74057    20.72437
        Foreign  |    16.7411   1.708312     9.80   0.000     13.23594    20.24627
    ------------------------------------------------------------------------------
    
    .
    . di (_b[1.foreign]*1)+_b[_cons]+ (_b[weight]*3299)
    16.741102
    
    .
    .
    .
    . *FULL SAMPLE
    
    .
    . regress mpg weight i.foreign
    
          Source |       SS           df       MS      Number of obs   =        74
    -------------+----------------------------------   F(2, 71)        =     69.75
           Model |   1619.2877         2  809.643849   Prob > F        =    0.0000
        Residual |  824.171761        71   11.608053   R-squared       =    0.6627
    -------------+----------------------------------   Adj R-squared   =    0.6532
           Total |  2443.45946        73  33.4720474   Root MSE        =    3.4071
    
    ------------------------------------------------------------------------------
             mpg | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
          weight |  -.0065879   .0006371   -10.34   0.000    -.0078583   -.0053175
                 |
         foreign |
        Foreign  |  -1.650029   1.075994    -1.53   0.130      -3.7955    .4954422
           _cons |    41.6797   2.165547    19.25   0.000     37.36172    45.99768
    ------------------------------------------------------------------------------
    
    .
    . margins foreign if rep78==3, by(rep78) atmeans
    
    Adjusted predictions                                        Number of obs = 30
    Model VCE: OLS
    
    Expression: Linear prediction, predict()
    Over:       rep78
    At: weight    = 3299 (mean)
        0.foreign =   .9 (mean)
        1.foreign =   .1 (mean)
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
         foreign |
       Domestic  |   19.94627   .4726151    42.20   0.000      19.0039    20.88863
        Foreign  |   18.29624   .9591353    19.08   0.000     16.38377     20.2087
    ------------------------------------------------------------------------------
    
    .
    . di (_b[1.foreign]*1)+_b[_cons]+ (_b[weight]*3299)
    18.296236
    
    .
    Notice that the only differences in the calculation of the estimated margins are the values of the coefficients.


    How do i know which one I should be using?
    You probably want to use the coefficients estimated from the full sample.
    Last edited by Andrew Musau; 05 Sep 2023, 06:55.

    Comment


    • #3
      I see, thank you for the explanation. The years that I have are cross-sectional years, so I'm thinking it would make more sense to restrict the sample to each individual years when estimating margins, rather than have them pooled, right?

      Some further questions regarding what you've posted there Andrew, what is the difference between the default option (predict) and the atmeans that you're using here? I assume that atmeans would not be the option to go for since i'm using a logistic regression?

      Also, if I want to assess the trends of the years for each region, I read on the guide that I can use the contrast option, but what is the method used by Stata to assess the trend?

      Code:
      . margins i.rep78, by(year) contrast
      
      Contrasts of predictive margins                             Number of obs = 59
      Model VCE: OIM
      
      Expression: Predicted mean, predict()
      Over:       year
      
      ------------------------------------------------
                   |         df        chi2     P>chi2
      -------------+----------------------------------
        rep78@year |
             2005  |          2       16.24     0.0003
             2006  |          2        9.37     0.0092
             2008  |          2       10.76     0.0046
             2009  |          2       14.06     0.0009
             2010  |          2       18.17     0.0001
             2011  |          2       14.04     0.0009
             2012  |          2       18.16     0.0001
             2013  |          2        9.22     0.0100
             2014  |          2       10.91     0.0043
             2015  |          2        9.00     0.0111
             2016  |          2       12.16     0.0023
             2017  |          2        6.59     0.0370
             2018  |          2        4.96     0.0836
             2020  |          2        2.82     0.2446
             2021  |          2        2.99     0.2246
             2022  |          2        2.41     0.2998
             2023  |          2        3.89     0.1431
            Joint  |          5       18.16     0.0028
      ------------------------------------------------
      Sorry for the mess in the copied output earlier, I didn't realize that I could use the code option.

      Comment


      • #4
        Originally posted by Angeline Wong View Post
        I see, thank you for the explanation. The years that I have are cross-sectional years, so I'm thinking it would make more sense to restrict the sample to each individual years when estimating margins, rather than have them pooled, right?
        For the regression, use the full sample. For the margins, you may compute these by year if such comparisons are of interest.

        what is the difference between the default option (predict) and the atmeans that you're using here? I assume that atmeans would not be the option to go for since i'm using a logistic regression?
        Richard Williams illustrates the calculations here and outlines the merits and demerits of each.

        Also, if I want to assess the trends of the years for each region, I read on the guide that I can use the contrast option, but what is the method used by Stata to assess the trend?
        If I understand you correctly, you are asking what test is Stata doing? It is just a standard Wald test which you can replicate using the test command.

        Code:
        webuse lbw, clear
        logit low age smoke i.race
        margins race, contrast
        
        *USING TEST
        margins race, post
        test (1.race=2.race) (1.race=3.race)
        Res.:

        Code:
        . margins race, contrast
        
        Contrasts of predictive margins                            Number of obs = 189
        Model VCE: OIM
        
        Expression: Pr(low), predict()
        
        ------------------------------------------------
                     |         df        chi2     P>chi2
        -------------+----------------------------------
        race |          2        9.07     0.0107
        ------------------------------------------------
        
        . 
        . 
        . 
        . *USING TEST
        
        . 
        . margins race, post
        
        Predictive margins                                         Number of obs = 189
        Model VCE: OIM
        
        Expression: Pr(low), predict()
        
        ------------------------------------------------------------------------------
                     |            Delta-method
                     |     Margin   std. err.      z    P>|z|     [95% conf. interval]
        -------------+----------------------------------------------------------------
                race |
              white  |   .2151541   .0405659     5.30   0.000     .1356464    .2946619
              black  |   .4129277   .0922105     4.48   0.000     .2321984     .593657
              other  |   .4231076   .0616934     6.86   0.000     .3021908    .5440244
        ------------------------------------------------------------------------------
        
        . 
        . test (1.race=2.race) (1.race=3.race)
        
         ( 1)  1bn.race - 2.race = 0
         ( 2)  1bn.race - 3.race = 0
        
                   chi2(  2) =    9.07
                 Prob > chi2 =    0.0107

        Comment

        Working...
        X