Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • difference between margins group vs. margins, over(group)

    Hello, I am running a model:
    Code:
    reg outcome IV i.group IV#i.group
    I would like to produce marginsplots and tried two different margins codes and got two different plots as follows. I don't know why these two commands produce different plots and which one is correct. I was wondering if anyone could help me out. Thank you!
    The first code is:
    Code:
    margins, at(IV=(-2.1(0.5)1.9) over(group)
    The corresponding graph is:
    Click image for larger version

Name:	1.png
Views:	1
Size:	40.5 KB
ID:	1620189

    the second code is:
    Code:
    margins group, at(IV=-2.1(0.5)1.9)
    The corresponding graph is:
    Click image for larger version

Name:	2.png
Views:	1
Size:	40.6 KB
ID:	1620190

  • #2
    -margins group- computes the marginal means of each level inside "group", setting all other independent variables at the overall mean.

    -margins, over(group)- computes the marginal means, setting all other independent variable at the mean of each level inside "group" instead of using the overall mean.

    For example, if "group" has two levels (yes and no) and mean age is 35 (mean for yes is 30, mean for no is 37):

    margins group gets you the means for yes and no, assuming both age set at 35.

    margins, over(group) gets you the means in yes and no, assuming the age for "group = yes" is set at 30 and "group = no" is 37.

    Here, if you need a data example for further illustration:
    Code:
    clear
    sysuse auto
    
    * Overall mean ----------
    quietly sum weight
    scalar m1 = r(mean)
    
    * Compute the means manually
    reg mpg i.foreign weight
    display _b[_cons] + `=m1'*_b[weight]
    display _b[_cons] + `=m1'*_b[weight] + _b[1.foreign]
    * Same as:
    margins i.foreign
    
    * Group specific mean ----------
    quietly sum weight if foreign == 0
    scalar m20 = r(mean)
    quietly sum weight if foreign == 1
    scalar m21 = r(mean)
    
    * Compute the means manually
    reg mpg i.foreign weight
    display _b[_cons] + `=m20'*_b[weight]
    display _b[_cons] + `=m21'*_b[weight] + _b[1.foreign]
    * Same as:
    margins, over(foreign)

    Comment


    • #3
      Originally posted by Ken Chui View Post
      -margins group- computes the marginal means of each level inside "group", setting all other independent variables at the overall mean.

      -margins, over(group)- computes the marginal means, setting all other independent variable at the mean of each level inside "group" instead of using the overall mean.

      For example, if "group" has two levels (yes and no) and mean age is 35 (mean for yes is 30, mean for no is 37):

      margins group gets you the means for yes and no, assuming both age set at 35.

      margins, over(group) gets you the means in yes and no, assuming the age for "group = yes" is set at 30 and "group = no" is 37.

      Here, if you need a data example for further illustration:
      Code:
      clear
      sysuse auto
      
      * Overall mean ----------
      quietly sum weight
      scalar m1 = r(mean)
      
      * Compute the means manually
      reg mpg i.foreign weight
      display _b[_cons] + `=m1'*_b[weight]
      display _b[_cons] + `=m1'*_b[weight] + _b[1.foreign]
      * Same as:
      margins i.foreign
      
      * Group specific mean ----------
      quietly sum weight if foreign == 0
      scalar m20 = r(mean)
      quietly sum weight if foreign == 1
      scalar m21 = r(mean)
      
      * Compute the means manually
      reg mpg i.foreign weight
      display _b[_cons] + `=m20'*_b[weight]
      display _b[_cons] + `=m21'*_b[weight] + _b[1.foreign]
      * Same as:
      margins, over(foreign)
      Thank you so much, Ken! It is very helpful!

      Comment


      • #4
        Ken Chui has it almost right, but not quite. Neither -margins group- nor -margins, over(group)- sets other variables to their overall or within-group means unless the -atmeans- option is specified.

        -margins group- leaves all variables but group at their observed values. It then iterates over the levels of variable group. At each level, it replaces the group variable by that level in the entire data set and calculates the modeled outcomes. The resulting predicted values are averaged over the entire data set. The average predicted outcome is then reported as the predictive margin for that level of group.

        -margins, over(group)- also leaves all variables, including group, at their observed values. It then iterates over the levels of variable group. At each level it calculates the modeled outcomes. The resulting predicted values are averaged over only the observations with the current level of group. The average predicted outcome is then reported as the predictive margin for that level of group.

        Putting it somewhat more succinctly, -margins, over(group)- produces average model-predicted outcomes restricted to the observations having each value of group. -margins group- produces model-predicted outcomes as if the entire data set had its values of group changed to the current value of group and held everything else the same averaged over the entire data set.

        Another way to say it is that the result of -margins, over(group)- are conditional on the value of group and are averaged over the joint distribution of other variables conditional on that value of group. -margins group- produces results that are adjusted to the entire sample's joint distribution of other variables.

        Another way to say it is that the results of -margins, group- are adjusted for group differences in the other variables' distribution in the sample, whereas the results of -margins, over(group)- are not. In fact, the results of -margins, over(group)- are completely unaware of group differences in other variables' distribution.

        Added: crossed with #3.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          Ken Chui has it almost right, but not quite. Neither -margins group- nor -margins, over(group)- sets other variables to their overall or within-group means unless the -atmeans- option is specified.

          -margins group- leaves all variables but group at their observed values. It then iterates over the levels of variable group. At each level, it replaces the group variable by that level in the entire data set and calculates the modeled outcomes. The resulting predicted values are averaged over the entire data set. The average predicted outcome is then reported as the predictive margin for that level of group.

          -margins, over(group)- also leaves all variables, including group, at their observed values. It then iterates over the levels of variable group. At each level it calculates the modeled outcomes. The resulting predicted values are averaged over only the observations with the current level of group. The average predicted outcome is then reported as the predictive margin for that level of group.

          Putting it somewhat more succinctly, -margins, over(group)- produces average model-predicted outcomes restricted to the observations having each value of group. -margins group- produces model-predicted outcomes as if the entire data set had its values of group changed to the current value of group and held everything else the same averaged over the entire data set.

          Another way to say it is that the result of -margins, over(group)- are conditional on the value of group and are averaged over the joint distribution of other variables conditional on that value of group. -margins group- produces results that are adjusted to the entire sample's joint distribution of other variables.

          Another way to say it is that the results of -margins, group- are adjusted for group differences in the other variables' distribution in the sample, whereas the results of -margins, over(group)- are not. In fact, the results of -margins, over(group)- are completely unaware of group differences in other variables' distribution.

          Added: crossed with #3.
          Hi Clyde, thank you so much! Could I ask you a follow-up question? In terms of my question that I would like to compare the effect of IV across groups, I should use -margins group- (since it is adjusted to the entire sample's joint distribution of other variables), right?

          Comment


          • #6
            In terms of my question that I would like to compare the effect of IV across groups, I should use -margins group- (since it is adjusted to the entire sample's joint distribution of other variables), right?
            In most contexts, yes, -margins group- would be what you want. In particular, if you are hoping to get something like a causal effect of group, you definitely want adjustment for group differences in other variables' distributions across groups.

            But if, for example, you are not looking for (an approximation to) a causal effect but are most interested in predicting what outcomes will be observed in the different groups, then you would want to use -margins, over(group)- because the actual outcomes in the groups will, in fact, differ, in part, due to the distributions of other variables in your model.

            So it really depends on what you're trying to do. Most research, I think, is trying to get at causal effects, so -margins group- is most often what is needed. But not all research is about that. And as I didn't see any explanation in the thread about what the purpose of your research is, I have to leave it up in the air for you to resolve.

            To put it in terms of the original question posed in #1, "...which one is correct," the answer is that they are both correct for different purposes. They are correct answers to different questions.

            Comment


            • #7
              Originally posted by Clyde Schechter View Post

              In most contexts, yes, -margins group- would be what you want. In particular, if you are hoping to get something like a causal effect of group, you definitely want adjustment for group differences in other variables' distributions across groups.

              But if, for example, you are not looking for (an approximation to) a causal effect but are most interested in predicting what outcomes will be observed in the different groups, then you would want to use -margins, over(group)- because the actual outcomes in the groups will, in fact, differ, in part, due to the distributions of other variables in your model.

              So it really depends on what you're trying to do. Most research, I think, is trying to get at causal effects, so -margins group- is most often what is needed. But not all research is about that. And as I didn't see any explanation in the thread about what the purpose of your research is, I have to leave it up in the air for you to resolve.

              To put it in terms of the original question posed in #1, "...which one is correct," the answer is that they are both correct for different purposes. They are correct answers to different questions.
              Got it! Thank you sooo much!

              Comment


              • #8
                Originally posted by Clyde Schechter View Post
                Ken Chui has it almost right, but not quite. Neither -margins group- nor -margins, over(group)- sets other variables to their overall or within-group means unless the -atmeans- option is specified.
                Added: crossed with #3.
                Hello Clyde, thanks for the correction!
                Last edited by Ken Chui; 22 Jul 2021, 17:13.

                Comment

                Working...
                X