Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • margins: difference between at() and subpop(if...)

    Hello,

    I would be grateful if you could explain to me why I get different results from margins when I use the at() option and when I use the subpop(if...) option.

    I am running logistic regression with four dummy variables as predictors (and their interactions). My syntax is

    margins r.dummy1, at (dummy2 = 1)
    margins, dydx(dummy1) subpop(if dummy2==1)

    I am trying to find out how the effect of dummy1 changes the probability of the outcome but only for records who are '1' on dummy 2.

    I have read various materials and I thought it is done with the at() option, but then I stumbled upon the subpop(if..) option and now I am well confused.

    I want to get Average Marginal Effects rather than Marginal Effects at the Means / marginal effects at representative values.

    Many thanks in advance!





  • #2
    The difference is:

    1. With subpop(if...) (or, equivalently, with over()) you are using only the observations that have dummy2 == 1 to calculate the margins. All other observations are excluded.

    2. With at(dummy2 == 1), the entire regression estimation sample is used to calculate the margins, but with the value of dummy2 temporarily reset to 1 in every observation.

    In statistical language, with subpop(if..) you are getting margins conditional on dummy2 == 1. With at(dummy2 == 1) you are getting margins adjusted to dummy 2 == 1. They are only rarely the same.

    In addition to that, you have another difference between your commands. In your first command you are calculating predicted margins, and in the second you are calculating marginal effects. Those very different things, so even if you did them both with at() or both with subpop(), you would get different results.

    I think the clearest presentation of how -margins- works is in the excellent Richard Williams' https://www3.nd.edu/~rwilliam/stats/Margins01.pdf.

    Comment


    • #3
      Thank you very much Clyde for the clarification. I have decided that what I need is

      margins, dydx(dummy1) at (dummy2=1)

      The only remaining questions I have are: (1) should the results be interpreted as Average Marginal Effects or as Marginal Effects at Representative values? In the Stata output it says 'average marginal effects' but from what I have read it is in fact MER? (2) Is it correct to interpret the results in this way: 'On average, for people who are dummy2=1, the probability of outcome=1 is x percentage points higher(lower) if they are dummy1=1 than if they are dummy1=0'?

      Thanks again!

      Comment


      • #4
        Your interpretation of the results is correct.

        As for what to call it, if dummy1 and dummy2 are the only variable in the original logistic regresion, then it is a marginal effect at a representative value (MER). But if there are also other variables, then it is a hybrid with no simple name: it is averaged over the other variables but at a representative value of dummy2.

        Comment


        • #5
          I have an additional question. I am also running a probit model and ivprobit. I ran the following command to get the coefficients

          svy, subpop (indicatorvariable): probit outcome predictor1 predictor2 predictor3.

          I want to display the average marginal effects using margins, dydx(*). However, I get an error (not estimable). But when I ran

          margins, dydx(*) subpop(indicatorvariable) I get some estimates. Are these the right estimates for my probit regression?

          Comment


          • #6
            All -

            I have a few questions on some similar topics, and thought I'd post them here instead of starting a brand new post.

            I am conducting a choice experiment and using Stata's CM commands. I am new to both, so learning a tremendous amount from Statlist posts such as those above, and other sources as I progress. I've collected data, with 360 observations from males and 456 from females. In each, they were asked to choose one of two alternatives and the chosen alternative was recorded in Choice as "1" if chosen, "0" if not chosen.

            I would like to know the effect of sex on a respondents likelihood of choosing each alternative, and my hypothesis is that there is no difference in the likelihood to choose Alt#1 vs Alt#2 for either males or females.

            I've taken the steps below and (1) can't explain the difference between output from Steps 1-4 (which appear to support my hypothesis) and the output of the margins commands in Step 5 and 7 (which does not), (2) am unclear why the manual calculation in Step 9 does not replicate the margins output but Step 10 does, and therefore (3) unsure how to interpret the manual output in Step 9, which is very similar to Steps 1-4 (and also appears to support my hypothesis.)

            My specific questions are at the end.

            Code:
            . * Step 1
            . table Sex Choice Alt_Num, row column
            
            ------------------------------------------------------------------------------------------------------
                      | Alternative number in any given Choice Set (1 or 2) 
            Sex (1    |                                                                                
            Male 2    | -------------------- 1 --------------------    -------------------- 2 --------------------
            Female)   | 0. Not Chosen      1. Chosen          Total    0. Not Chosen      1. Chosen          Total
            ----------+-------------------------------------------------------------------------------------------
              1. Male |           189            171            360              171            189            360
            2. Female |           229            227            456              227            229            456
                      | 
                Total |           418            398            816              398            418            816
            ------------------------------------------------------------------------------------------------------
            Overall, Alt#1 was chosen in 398/816=48.8% of cases and Alt#2 in 51.2% of cases. Males chose Alt#1 in 171/360=47.5% of cases and Alt#2 in 52.5% of cases. Females chose Alt#1 in 227/456=49.8% of cases and Alt#2 in 50.2% of cases.

            Running the cm command for a conditional logit choice model:
            Code:
            . * Step 2
            . cmclogit Choice <vars omitted>, casevars( Sex <remaining vars omitted>
            And subsequently the margins command:
            Code:
            . * Step 3
            . margins
            
            Predictive margins                              Number of obs     =      1,632
            Model VCE    : OIM
            
            Expression   : Pr(Alt_Num|1 selected), predict()
            
            ------------------------------------------------------------------------------
                         |            Delta-method
                         |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                _outcome |
                      1  |   .4877451   .0141982    34.35   0.000     .4599172     .515573
                      2  |   .5122549   .0141982    36.08   0.000      .484427    .5400828
            ------------------------------------------------------------------------------
            Confirms the overall values calculated manually above. Using the contrast(outcomecontrast(r)) option, this difference is not significant (z=0.86, p=0.388).

            Including the over(Sex) option:
            Code:
            . * Step 4
            . margins, over(Sex)
            
            Predictive margins                              Number of obs     =      1,632
            Model VCE    : OIM
            
            Expression   : Pr(Alt_Num|1 selected), predict()
            over         : Sex
            
            ------------------------------------------------------------------------------
                         |            Delta-method
                         |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
            _outcome#Sex |
              1#1. Male  |       .475   .0212878    22.31   0.000     .4332767    .5167233
            1#2. Female  |    .497807   .0190548    26.13   0.000     .4604604    .5351537
              2#1. Male  |       .525   .0212878    24.66   0.000     .4832767    .5667233
            2#2. Female  |    .502193   .0190548    26.36   0.000     .4648463    .5395396
            ------------------------------------------------------------------------------
            Confirms the Male and Female values calculated manually above. Again using the contrast option, these differences are not significant (Male: z=1.17, p=0.240; Female: z=0.12, p=0.908).

            I was going to stop here, believing that I had demonstrated that there was no difference due to Sex, but when I read this post and some others, I continued. Calculating the AAPs:
            Code:
            . * Step 5
            . margins Sex
            
            Predictive margins                              Number of obs     =      1,632
            Model VCE    : OIM
            
            Expression   : Pr(Alt_Num|1 selected), predict()
            
            ------------------------------------------------------------------------------
                         |            Delta-method
                         |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
            _outcome#Sex |
              1#1. Male  |   .7662833   .0117093    65.44   0.000     .7433335    .7892331
            1#2. Female  |   .2799825   .0122283    22.90   0.000     .2560155    .3039496
              2#1. Male  |   .2337167   .0117093    19.96   0.000     .2107669    .2566665
            2#2. Female  |   .7200175   .0122283    58.88   0.000     .6960504    .7439845
            ------------------------------------------------------------------------------
            Produces results that I don't expect. I interpret this output as - the probability that Males choose Alt#1 is 76.6% vs 23.4% for Alt#2, and the probability that Females choose Alt#1 is 28% and Alt#2 is 72%. This does not make sense intuitively given the results in Step 3 and 4 above, which appear to show otherwise.

            To confirm these results, following Richard Williams approach (link, p.34) to calculating AAP/AME without using the margins command, I rerun the model using a new xSex variable:

            Code:
            . * Step 6
            . clonevar xSex = Sex
            . cmclogit Choice <vars omitted>, casevars( xSex <remaining vars omitted>
            Confirm that the AAPs are unchanged:

            Code:
            . * Step 7
            . margins xSex
            
            Predictive margins                              Number of obs     =      1,632
            Model VCE    : OIM
            
            Expression   : Pr(Alt_Num|1 selected), predict()
            
            -------------------------------------------------------------------------------
                          |            Delta-method
                          |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
            --------------+----------------------------------------------------------------
            _outcome#xSex |
               1#1. Male  |   .7662833   .0117093    65.44   0.000     .7433335    .7892331
             1#2. Female  |   .2799825   .0122283    22.90   0.000     .2560155    .3039496
               2#1. Male  |   .2337167   .0117093    19.96   0.000     .2107669    .2566665
             2#2. Female  |   .7200175   .0122283    58.88   0.000     .6960504    .7439845
            -------------------------------------------------------------------------------
            And calculate the AME:

            Code:
            . * Step 8
            . margins, dydx(xSex)
            
            Average marginal effects                        Number of obs     =      1,632
            Model VCE    : OIM
            
            Expression   : Pr(Alt_Num|1 selected), predict()
            dy/dx w.r.t. : 2.xSex
            
            ------------------------------------------------------------------------------
                         |            Delta-method
                         |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
            1.xSex       |  (base outcome)
            -------------+----------------------------------------------------------------
            2.xSex       |
                _outcome |
                      1  |  -.4863007   .0190004   -25.59   0.000    -.5235408   -.4490607
                      2  |   .4863007   .0190004    25.59   0.000     .4490607    .5235408
            ------------------------------------------------------------------------------
            Note: dy/dx for factor levels is the discrete change from the base level.
            Continuing manually, I calculate AAPs for Male (xSex==1) and Female (xSex==2) and the AME:

            Code:
            . * Step 9
            . replace xSex = 1
            (912 real changes made)
            
            . predict adjpredmale
            (option pr assumed; Pr(Alt_Num))
            
            . replace xSex = 2
            (1,632 real changes made)
            
            . predict adjpredfemale
            (option pr assumed; Pr(Alt_Num))
            
            . gen mexSex = adjpredfemale - adjpredmale
            
            . sum adjpredfemale adjpredmale mexSex
            
                Variable |        Obs        Mean    Std. Dev.       Min        Max
            -------------+---------------------------------------------------------
            adjpredfem~e |      1,632          .5    .3947444   .0000137   .9999863
             adjpredmale |      1,632          .5    .4196069   .0000136   .9999864
                  mexSex |      1,632   -7.80e-10    .5650355  -.9355675   .9355675
            Which does not replicate the output from the margins commands. Intuitively, this output is closer to what I expected originally, as I interpret it to mean that - Respondents' sex does not have a substantial effect on likelihood to choose Alt#1 vs Alt#2 - the mean likelihood is 0.5 for both female and male. Subject to further tests of course.

            Code:
            . * Step 10
            . bysort Alt_Num: sum adjpredfemale adjpredmale mexSex
            
            --------------------------------------------------------------------------------------------------------------------------------------
            -> Alt_Num = 1
            
                Variable |        Obs        Mean    Std. Dev.       Min        Max
            -------------+---------------------------------------------------------
            adjpredfem~e |        816    .2799825    .3277982   .0000137   .9879042
             adjpredmale |        816    .7662833    .3243211   .0122354   .9999864
                  mexSex |        816   -.4863007    .2875454  -.9355675  -.0120822
            
            --------------------------------------------------------------------------------------------------------------------------------------
            -> Alt_Num = 2
            
                Variable |        Obs        Mean    Std. Dev.       Min        Max
            -------------+---------------------------------------------------------
            adjpredfem~e |        816    .7200175    .3277982   .0120958   .9999863
             adjpredmale |        816    .2337167    .3243211   .0000136   .9877646
                  mexSex |        816    .4863007    .2875454   .0120823   .9355675
            However, Step 10 does replicate Step 5 and 7,8.

            I have a couple of questions:
            (1) Am I misusing or misunderstanding the margins command, given my hypothesis?
            (2) Am I misinterpreting the output in Step 1-4?
            (3) How do I interpret Step 5 output given the output from Step 1-4?
            (4) Why does Step 9 output differ from Step 7 and 8?
            (5) Am I misinterpreting the output in Step 9?

            Thank you for any guidance you can provide or recommendations for additional resources I should study.

            Regards,
            Marc

            Comment


            • #7
              Nice!

              Comment

              Working...
              X