Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Predictive margins

    Dear fellow members

    I wish to have some clarification regarding estimation and interpretations of predictive margin after logit
    example:
    Code:
    webuse margex
    *case 1
    logit outcome i.sex
    margins i.sex // gives adjusted predictions as we have only one regressor
    
    *case 2
    logit outcome i.sex age
    margins i.sex // gives predictive margins but does not say anything about adjustment. Does it takes into account the age, I think no. I am asking because we have used age as control in initial logit estimation.
    margins i.sex, at(age)  // gives predictive margins which are different in magnitude and shown as adjusted predictions.
    I my query is, shall we adjust for controls used in regression specifications while estimating predictive margins using margins?
    The reason I am asking is because in my multivariable logit regression equation I have a categorical variable for 100 districts. The data is district level representative with total observation around 250000.

    My purpose is to estimate probability of outcome (y) at each wealthdecile

    which type (adjusted or unadjusted) of margins one should report? Do we need to use sampling weight as well?

    Code:
     logit y i.sex i.urban i.wealthdecile i.district[pw=wt] // y is a binary (0 or 1)  outcome variable
    case 1.A
    Code:
    margins i.wealthdecile // or
    case 2.A
    Code:
    margins i.wealthdecile at (sex urban district )
    Below are the results from case 1.A and 2.A

    Code:
    *Case 1.A
    
    margins i.wealthdecile
    
    Predictive margins                                  Number of obs = XXXXX
    Model VCE: Robust
    
    Expression: Pr(y), predict()
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
      wealthdecile |
              1  |   .5457612   .0038229   142.76   0.000     .5382684     .553254
              2  |   .5752248   .0035628   161.45   0.000     .5682419    .5822077
              3  |   .5786737   .0035924   161.08   0.000     .5716327    .5857148
              4  |   .5867294   .0035887   163.49   0.000     .5796956    .5937632
              5  |   .5775975   .0037508   153.99   0.000      .570246    .5849489
              6  |   .5815214   .0038654   150.44   0.000     .5739452    .5890975
              7  |   .5560213   .0040913   135.90   0.000     .5480025      .56404
              8  |   .5411535   .0048161   112.36   0.000     .5317141    .5505929
              9  |   .4778415   .0047144   101.36   0.000     .4686013    .4870816
             10  |   .3870589   .0046531    83.18   0.000      .377939    .3961789
    ------------------------------------------------------------------------------
    
    *Case 2.A
    . margins i.wealthdecile, at(sex urban district)
    
    Adjusted predictions            Number    of    obs    =    XXXXXX
    Model VCE: Robust
    
    Expression: Pr(y), predict()
    
    At:    1.sex    =    .5216951    (mean)
        2.sex    =    .4783049    (mean)
        1.urban    =    .2847464    (mean)
        2.urban    =    .7152536    (mean)
        1.district    =    .0005732    (mean)
        2.district    =    .0003696    (mean)
        3.district    =    .0000688    (mean)
        4.district    =    .0000826    (mean)
        .
        .
        .
        100.district    =    .0000826    (mean)                    
    
            Delta-method
        Margin    std. err.    z    P>z    [95% conf.    interval]
                            
    wealthdecile
    1    .5519841    .0044273    124.68    0.000    .5433068    .5606615
    2    .5859534    .0040909    143.23    0.000    .5779353    .5939714
    3    .5899154    .0041158    143.33    0.000    .5818487    .5979822
    4    .5991553    .0041073    145.88    0.000    .5911052    .6072055
    5    .5886794    .0043027    136.82    0.000    .5802463    .5971125
    6    .593184    .0044396    133.61    0.000    .5844826    .6018854
    7    .5638349    .0047301    119.20    0.000    .5545641    .5731057
    8    .5466562    .0055888    97.81    0.000    .5357024    .5576099
    9    .4733113    .0054759    86.44    0.000    .4625788    .4840438
    10    .3692428    .0052384    70.49    0.000    .3589758    .3795099

    *Magnitude of difference is small but time cost of computation large

    Thank you!
    Stata 18.0
    Last edited by Mukesh Punia; 25 Jun 2025, 04:58.
    Best regards,
    Mukesh

  • #2
    If you want to show the predictions while keeping control variables constant, then you have to keep those control variables constant. This is what you do with the at() option. You need to do a bit more than what you showed in your example: it does not make sense to fix categorical variables at the mean.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Thank you Dear Maarten Buis for you response with clarification. 'You need to do a bit more...' Should I go for 100 district dummies?

      Code:
      logit y i.sex i.urban i.wealthdecile i.district[pw=wt]
      Code:
       margins i.wealthdecile
      What will be interpretation of predictive margin obtained in this way? *for example for wealth decile 1 using above specification
      Code:
      Predicted Margin is  .5457612
      Best regards,
      Mukesh

      Comment


      • #4
        Adding on to @Maarten's suggestions, it's not clear why you show only a single predicted margin for the margins call you posted. However, I think it might be helpful to take a step back and make sure you understand what margins is doing. So building off some of the great work of Richard Williams and Maarten, I put the following together, working with the simple case 2 you showed in your first post.
        Code:
        webuse margex, clear
        ** Case 2
        logit outcome i.sex age
        margins i.sex // gives predictive margins but does not say anything about adjustment. Does it takes into account the age? 
        * Look under the hood at what margins is doing
        clonevar cv_sex = sex
        logit outcome i.cv_sex age
        margins i.cv_sex 
        /*
        Predictive margins                              Number of obs     =      3,000
        Model VCE    : OIM
        
        Expression   : Pr(outcome), predict()
        
        ------------------------------------------------------------------------------
                     |            Delta-method
                     |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 sex |
               male  |   .1108003   .0089957    12.32   0.000      .093169    .1284315
             female  |   .2073327   .0089077    23.28   0.000      .189874    .2247914
        ------------------------------------------------------------------------------
        */
        replace cv_sex = 0
        predict adj_male_pred 
        replace cv_sex = 1
        predict adj_fem_pred 
        sum adj_male_pred adj_fem_pred
        /*------------------------------------------------------------------------------
                     |            Delta-method
                     |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              cv_sex |
               male  |   .1108003   .0089957    12.32   0.000      .093169    .1284315
             female  |   .2073327   .0089077    23.28   0.000      .189874    .2247914
        ------------------------------------------------------------------------------
        */
        So margins is getting the male predictive margin by assigning everyone to sex == 0 and leaving their age alone and then re-assigning everyone to sex == 0, leaving age alone, and getting a female estimate. Age is adjusted for in the initial regression model and margins leaves everyone's age as it truly is. What about when you specify that age is some value - in your case, at(age) sets age at the mean value in the sample.
        Code:
        drop cv_sex 
        clonevar cv_sex = sex 
        clonevar cv_age = age 
        qui logit outcome i.cv_sex cv_age
        margins i.cv_sex, at(cv_age)  // gives predictive margins which are different in magnitude and shown as adjusted predictions.
        /*
        Adjusted predictions                            Number of obs     =      3,000
        Model VCE    : OIM
        
        Expression   : Pr(outcome), predict()
        at           : cv_age          =      39.799 (mean)
        
        ------------------------------------------------------------------------------
                     |            Delta-method
                     |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              cv_sex |
               male  |   .0699539   .0069337    10.09   0.000     .0563641    .0835438
             female  |   .1512261   .0108367    13.95   0.000     .1299865    .1724658
        ------------------------------------------------------------------------------
        */
        replace cv_sex = 0
        sum cv_age
        replace cv_age = r(mean)
        predict adj_male_pred2
        replace cv_sex = 1
        predict adj_fem_pred2
        sum *2
        /*    Variable |        Obs        Mean    Std. Dev.       Min        Max
        -------------+---------------------------------------------------------
        adj_male_p~2 |      3,000    .0699539           0   .0699539   .0699539
        adj_fem_pr~2 |      3,000    .1512261           0   .1512261   .1512261
        */
        In this case, you are going through the same process as before, but you are forcing every case to take on the mean age value. So are you adjusting for age? Yes, because just as before, you adjust for age in the model. However, the adjusted predictions given in the margins estimates provide the male and female probabilities of outcome when everyone is assigned the mean age. You should probably only take that approach if that value is particularly noteworthy or meaningful. You could instead look at how the male and female predictions change across the age spectrum in your sample and get a nice graph of it:
        Code:
        logit outcome i.sex age
        margins i.sex, at(age = (24(4)56))
        marginsplot

        Comment


        • #5
          Dear Erik Ruzek thank you for your response!

          I think I am unable to raise my point clearly.

          My query is: As you shown is first part of your response-

          Code:
           logit outcome i.sex age
          Code:
          margin i.sex
          So, the interpretation will be the predicted probability of the outcome (i.e. obesity) is 0.11 for male and 0.20 for female after controlling for age. Or the probability of obesity is 0.11 if all are male and 0.20 if all are female after controlling for age.

          I am following this interpretation from a chapter https://www.perraillon.com/ph/Perrai..._March2025.pdf Last paragraph of section 6.7 (page 116.) "Predictive margins are also called adjusted predictions. In this example, we controlled for age and sex, although we did not hold these covariates fixed at any specific value."

          Hope this clarify.

          Thank you - Mukesh
          Last edited by Mukesh Punia; 25 Jun 2025, 11:02.
          Best regards,
          Mukesh

          Comment


          • #6
            The author made a mistake. It happens.

            What that margins call does is compute a prediction for each female, and than compute the average prediction, and than compute a prediction for each male and compute the average.

            Say you controlled for work in your logit model. Women are less likely to be in paid employment. The logit model controlled for that, but the margins command ignores that: people not in paid employment will get different predictions, just as the model says. So part of the difference in average predicted outcome between men and women is due to differences in work. This is exactly what you don't want, when you control for work.
            ---------------------------------
            Maarten L. Buis
            University of Konstanz
            Department of history and sociology
            box 40
            78457 Konstanz
            Germany
            http://www.maartenbuis.nl
            ---------------------------------

            Comment


            • #7
              I greatly appreciate & thank to Maarten Buis and Erik Ruzek for jumping in and giving wonderful clarification and leads. This raised doubt in my mind about I thought of margins (predictive margins) and actually it is. I turned back the pages of the sources referred to and Stata user manual. May I ping Richard Williams here!

              Now, using the Stata margex example dataset I can clarify my question about I wanted to do with my original data.

              Code:
              webuse margex, clear
              logistic outcome i.sex i.group age
              margins, over(sex) at(age == (20(10)60) sex=(0 1))
              marginsplot, name(oversex_sex01) // give prediction for [male male; male female; female male; female female]
              margins, over(sex) at(age == (20(10)60) sex=(0))
              marginsplot, name(oversex_sex0) // give prediction for [male; female ]
              margins, over(sex) at(age == (20(10)60) sex=(1))
              marginsplot, name(oversex_sex1) // give prediction for [male; female]
              I want to predict the marginal probability (predictive margins) for male and female considering male as male and female as female. Will any of the above work for that?
              Last edited by Mukesh Punia; 26 Jun 2025, 00:31.
              Best regards,
              Mukesh

              Comment


              • #8
                If you want to "considering male as male and female as female" then you do not want to control for other variables. What you want is often called the raw or naive effects.

                Don't worry about the negative connotations of those words; they are just the symbols used for those effects. We could define at the beginning of your paper the symbol "BIG BEAUTIFUL EFFECT" (in all capitals) for this effect, and than use that symbol instead. (Even though it is true, I am not seriously suggesting you should do that. This joke is more to illustrate how irrelevant the emotional value of those words should be. Instead, I recommend just to use the conventional terms.)

                There is a case to be made for raw/naive effects: Controlling for variables is by its very nature counterfactual: what is the effect of being female when all control variables remain constant (ceteris paribus, as the economists tend to say). In real live the control variables do not remain constant,that is why we want to add them to our model. However, there is also a good case to be made for controlling for variables. So you need to make a decision: do you want to control for variables (and live with the counterfactual nature of your effect), or do you want the raw effects (and live with the fact that you did not control for anything). An effect cannot be both. However, you can report both effects, but than you need to do something with both in your text. The extra statistics need to serve a purpose in your argument, i.e. the extra text and space in a table needs to be helpful in answering your research question and not (as is often the case) unnecessary fluff.

                If you are serious about wanting the raw effect, then you don't need a regression model. All you want is just a cross-tabulation:

                Code:
                table (outcome) (sex), stat(percent, across(outcome))
                We do so much with regression models, that we often think everything serious needs to be done with regression models, like if you have a hammer, all problems look like nails. Sometimes it is good to take a step back, and consider if you really need all that machinery to solve your problem. Our work is scientific because others can understand and replicate what steps we took to come to our conclusions. So removing unnecessary complications is really improving the scientific value of your paper rather than diminishing it.
                Last edited by Maarten Buis; 26 Jun 2025, 02:58.
                ---------------------------------
                Maarten L. Buis
                University of Konstanz
                Department of history and sociology
                box 40
                78457 Konstanz
                Germany
                http://www.maartenbuis.nl
                ---------------------------------

                Comment


                • #9
                  I agree with Maarten Buis about the value of descriptive statistics. We don't need a regression for everything. That said, if you really want to get margins based only on the sex a person actually was, there is a way to do it:
                  Code:
                  webuse margex, clear
                  logistic outcome i.sex i.group age
                  
                  margins if sex == 0, at(sex = (0 1) age = (20(10)60)) saving(male_only, replace)
                  margins if sex == 1, at(sex = (0 1) age = (20(10)60)) saving(female_only, replace)
                  
                  use male_only, clear
                  gen male = 1
                  append using female_only
                  replace male = 0 if male==.
                  drop if _at1==1 & male ==1 
                  drop if _at1==0 & male==0
                  twoway scatter _margin _at3 if male==0, msymbol(msymbol1) || ///
                          scatter _margin _at3 if male==1, msymbol(msymbol2) || ///
                          line _margin _at3 if male==0, msymbol(msymbol1) || ///
                          line _margin _at3 if male==1, msymbol(msymbol2) ///
                          legend(lab(1 "female") lab(2 "male") lab(3) lab(4))
                  You can add confidence interval bands to the graph and beautify it in other ways that you like.

                  Comment

                  Working...
                  X