Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Are margins an approbate tool for measuring interaction effects?

    Hello people,

    I run a linear regression model to explore the influence of some independent variables of the output of firms. I also included an interaction term, to show if the effect of variable x (continuous) depends on the rural/urban location of a firm (dichotomous). I got significant coefficients, so I want to take a closer look at this interaction effect, especially I want to visualize it. I read about margins, to be a good option for that, but most sources are related to non-linear or logit regression. So I am not sure, if this really is an approbate tool for a linear regression.

    After storing the regression model. I run the margins command
    Code:
    margins, rural_urban, at(X=(0(10)100))
    and let Stata draw the graph via
    Code:
    marginsplot, x(X)
    .
    This yields to the following result:

    Click image for larger version

Name:	Testgraph .png
Views:	1
Size:	100.0 KB
ID:	1683148


    I would interpret the connection somehow like: An increasing X yields to an increasing y. The location has an effect on X and the effects, X has on y, is stronger in rural areas.

    I would appreciate, if someone could give me a short feedback, if this is an approbate tool, and if so, if my interpretation is correct.

    Thanks in advance.
    KR

  • #2
    I would appreciate, if someone could give me a short feedback, if this is an approbate tool, and if so, if my interpretation is correct.
    Yes, and maybe.

    My reservation about the interpretation is that the confidence intervals around the urban margins are very wide, and overlap the rural margins, and indeed often extend nearly to the upper end of the rural confidence intervals. Now, that, in itself, is not definitive. But what would be important to look at, and is not found in the output of -margins-, is the confidence interval around the interaction term coefficient. That will give you the range of values of the difference in the rural and urban salary:X slopes that are compatible with the data and model. It may well be that that confidence interval is also very large and covers both positive and negative territory (as well as zero). Or not. So take look at that before reaching your conclusion, and, if appropriate, modify it. And when you disseminate your findings to others, be sure to show the interaction coefficient and its confidence interval.

    Comment


    • #3
      The thing that worries me is that there is no difference between urban and rural at x=0. This to me suggest that you forgot to include the main effect urban/rural variable.
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Thank you both for your replies.

        The confidence intervals for the interaction of rural*x is [0.0198; 0.0714] and for urban*x [-0.0372; 0.7119]. So the operator changes for the second interaction. I guess that is what you, Mr Schechter, has seen in the graph. As result, I would assume that there's no interaction term, instead the fit of the model is responsible for making it seem so?

        Mr Buis, you're right. But if I try the same commands including the variable I get the table containing the margins, but if I try to plot it, I get the following error message: "invalid at() dimension information;
        using variable urban_rural as a factor variable and a regular variable is not supported"

        Comment


        • #5
          The confidence intervals for the interaction of rural*x is [0.0198; 0.0714] and for urban*x [-0.0372; 0.7119]. So the operator changes for the second interaction. I guess that is what you, Mr Schechter, has seen in the graph. As result, I would assume that there's no interaction term, instead the fit of the model is responsible for making it seem so?
          Something is wrong. If urban_rural is a dichotomous variable, there should be only one interaction term in the regression output. I think what you are showing me is the confidence intervals of the marginal effects in the -margins- output. That is of interest in its own right, but does not address the issue of how much the two slopes differ.

          using variable urban_rural as a factor variable and a regular variable is not supported
          You have to introduce urban_rural as i.urban_rural, otherwise, by default, Stata interprets it as a continuous variable. When it then sees urban_rural#c.X, it, by default in an interaction term, interprets urban_rural as a factor variable. So it has contradictory information about urban_rural.

          Your regression should look like this:
          Code:
          regress outcome_variable i.urban_rural##c.X // AND PERHAPS OTHER VARIABLES
          margins urban_rural, at(X = (0(10)100))
          marginsplot  // GRAPHS OF MODEL PREDICTED OUTCOMES
          
          margins urban_rural, dydx(X) // "SIMPLE SLOPES"  MARGINAL EFFECTS OF X IN URBAN & RURAL AREAS
          You should examine the coefficient of 1.urban_rural#c.X and its confidence interval in the regression output to determine the range of compatible values for the difference between urban and rural outcome:X slopes.
          Last edited by Clyde Schechter; 24 Sep 2022, 13:19.

          Comment


          • #6
            Ok, I think the question no longer arises. My regression looked like this

            Code:
            regress outcome_variable i.urban_rural#c.X
            So I only put one hashtag between both variables and got both, 1.urban_rural x X and 0.urban_rural x X, in the output, where 0 = rural and 1 = urban. I interpret the coefficient 0.urban_rural x X, which was significant and I think that was wrong.

            With both hashtags I only get the coefficient of 1.urban_rural x X and its not significant. So I think one can assume that there's no significant interaction effect.

            Comment


            • #7
              Your current model, as noted by Clyde and Maarten does not allow for a "main effect" of the two variables you are interacting. It is often (almost always) advisable to include these main effects. You can do so using Clyde's code or the following would also accomplish this:
              Code:
              regress outcome_variable i.urban_rural c.X i.urban_rural#c.X
              Note that I included both of the variables you are interacting in the model. You can avoid doing this if you use two hashtags (##) when you construct the interaction. Now you can run the margins commands that Clyde suggested to get the predicted outcome for graphing and for getting the marginal effect of x.

              Comment


              • #8
                Thank you, Erik.

                I already tried that. As result, I only got the interaction of urban (=1) and X displayed in the regression table, which is not significant. Before, I don't include the main effects (only one hash) and got both interactions, rural (=0) and urban (=1), where the interaction of rural and X was significant. But now I am wondering, if one could assume that there is no interaction effect between urban_rural and X at all.

                Comment


                • #9
                  With your previous model based only on i.urban_rural#c.X, the coefficients of 1.urban_rural#c.X and 2.urban_rural#c.X are the separate marginal effects of X in urban and rural observations. That both are "significant" means that the data and model are not compatible with either of those marginal effects being zero. But nothing in that model tells you about how different they are.

                  In the model using ##, the coefficient of 1.urban_rural#c.X means something different from what it meant in the other model. It is the estimate of the difference between the urban and rural marginal effects of X. The fact that it is not "significant" does not mean that there is no effect at all. A non-significant result never means that in any regression or other statistical context. That is a common misunderstanding of the meaning of statistical significance. What it does mean is that there being no difference between the urban and rural marginal effects is compatible with the data and the model. But it does not mean that there is no difference, only that that possibility is not ruled out. This is the correct interpretation of a "non-significant" finding in any statistical analysis. The frequency with which this misunderstanding arises is one of the reasons that the leadership of the American Statistical Association recommended that the entire concept of statistical significance be abandoned. See https://www.tandfonline.com/doi/full...5.2019.1583913 for the "executive summary" and https://www.tandfonline.com/toc/utas20/73/sup1 for all 43 supporting articles. Or https://www.nature.com/articles/d41586-019-00857-9 for the tl;dr.

                  A better approach to this than misinterpreting p-values is to ignore p-values and focus on the confidence intervals. From what you have told me, I can infer that 0 lies inside it. But what else does? Does it cover a wide range of positive and negative territory--which would mean that the data are really quite uninformative about the urban rural difference (even though they may be fairly precise about each of the marginal effects separately). Or is the confidence interval pretty narrow around 0? If it is so narrow that even if the correct value of the difference were at one of the limits of the confidence interval then the urban rural difference would be too small to matter for any real-world purpose, then you would be correct inconcluding that, while there may or may not be a difference, it is negligible. If the confidence interval, though containing 0, lies almost entirely on one side and extends to territory where such a difference might be of real-world importance, it might be worth commenting that although the data are compatible with a contrary conclusion, the imprecision of the data leave one in a position where an effect of interesting size cannot be ruled out, although neither can it be said to be supported. In short the study would be inconclusive, but perhaps promising and grounds for further research with better data.

                  Comment


                  • #10
                    Thank you for the detailed explanation. I will have a look on the articles.

                    That both are "significant" means that the data and model are not compatible with either of those marginal effects being zero. But nothing in that model tells you about how different they are.
                    In the previous model only 0.urban_rural#c.X was significant, 1.urban_rural#c.X was not.

                    With your previous model based only on i.urban_rural#c.X, the coefficients of 1.urban_rural#c.X and 2.urban_rural#c.X are the separate marginal effects of X in urban and rural observations.
                    I think I don't understand that point. In the model using # the coefficient of 0.urban_rural#c.X was significant and 0.123. If this is the separate marginal effects of X in rural observations, doesn't that mean that X has an impact of y only in rural bot not in urban locations?

                    And if we suppose in the model using ## the coefficient of 1.urban_rural#c.x would be significant and 0.123, that would mean that the effect of X on y in urban areas is the difference between the main effect of X minus 0.123?

                    Comment


                    • #11
                      In the model using # the coefficient of 0.urban_rural#c.X was significant and 0.123. If this is the separate marginal effects of X in rural observations, doesn't that mean that X has an impact of y only in rural bot not in urban locations?
                      I won't repeat everything I said in #9. But I'll just re-emphasize that this is a completely false interpretation of statistical significance. In the model that used only #, 0.urban_rural#c.X is the marginal effect of X in rural observations. Full stop. It tells you nothing about what happens in urban locations. For that you would look at the coefficient of 1.urban_rural#c.X. And if it is "not significant" it says only that the absence of any effect of X in urban locations is consistent with the data, but it does not mean that there is no effect, only that no effect is a possibility that is compatible with the data. To conclude that there is no effect (of any meaningful size) you would have to look at the confidence interval around that coefficient. If both ends of that confidence interval are small enough that even if they were the correct values of the effect it would be negligible for practical purposes, then you could draw that conclusion. But if the confidence interval is wide enough that it extends into territory that includes meaningfully large effects, then all you can say is that the data and model are inconclusive with respect to the marginal effect of X in urban areas.

                      And if we suppose in the model using ## the coefficient of 1.urban_rural#c.x would be significant and 0.123, that would mean that the effect of X on y in urban areas is the difference between the main effect of X minus 0.123?
                      The marginal effect of X on y in urban areas would be the coefficient of the "main" effect of X plus 0.123. But rather than doing the calculation by hand (or with -lincom-) it is easiest to get the separate marginal effects from -margins urban_rural, dydx(C)-.

                      Comment

                      Working...
                      X