Average marginal effects of continuous variables by levels of a dichotomous variable

Lutfi Ozturker

Join Date: Apr 2017

Posts: 40
#1

Average marginal effects of continuous variables by levels of a dichotomous variable

29 Dec 2022, 15:33

Hi,

I run this four-way interaction model in Stata 14:

xtreg Y c.X1##c.X2##c.X3##X4 Controls i.country i.year,r

Then, I want to get the average marginal effects (AMEs) of continuous X1, X2 and X3 when dichotomous X4 equals 0 and 1 respectively. My options are:

(i) margins X4, dydx(X1 X2 X3)

(ii) margins if X4==0, dydx(X1 X2 X3) & margins if X4==1, dydx(X1 X2 X3)

Is there any reason I'd prefer (i) to (ii)?

Since (i) treats all observations as if X4==0 and X4==1 respectively, is (ii) more accurate/preferable to accounting for real X4==0 and X4==1 observations separately one by one?

Best,

Lütfi
Tags: average marginal effect, interaction, margins, panel data regression
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#2

29 Dec 2022, 15:45

Approach (i) uses the entire estimation sample to calculate the average marginal effects. Approach (ii) restricts each calculation to the subset of the estimation sample with the corresponding values of X4.

These are two different things, and while both might be described as average marginal effects, you need to figure out which is appropriate to your research goals. I think the most important consideration here is that approach (i) fully adjusts for the joint distribution of all variables other than X4, whereas (ii) only partially adjusts for them. The adjustment in (ii) is only partial because the joint distributions of the variables may be different after you condition on X4.

So both are things that might be reasonably done under different circumstances. Which you should do in your situation depends on the extent to which you wish the result to be adjusted for the other variables.
Comment
Lutfi Ozturker

Join Date: Apr 2017

Posts: 40
#3

29 Dec 2022, 18:06

Thank you for your reply.

X4 is the firm size, 1 for very big ones, and 0 for others. I assume the effect of X1 on Y varies across levels of X2 and X3 differently for very big firms and smaller ones.

You may skip the following.

-----------------------------------------QUOTE--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

I want to visualize that varying effect but marginsplot cannot help because there are four covariates. So, I am planning to make two coloured contour graphs for which I use;

twoway contour _mar _at2 _at3 where _mar are the contours coloured average marginal effects of X1 on Y, _at2 and _at3 are the values of X2 and X3 (at which _mar are calculated) on x and y axis, so that I draw two contour graphs for X4=0 and X4=1 seperately, for which my options are (as far as I can foresee):

(a) margins X4, dydx(X1) at (X2=(min(0.1)max) X3=(min(0.1)max) X4=0) min/max means observed min/max of X2 and X3 only when x4=0
margins X4, dydx(X1) at (X2=(min(0.1)max) X3=(min(0.1)max) X4=1) min/max means observed min/max of X2 and X3 only when x4=1

(b) margins if X4==0, dydx(X1) at (X2=(min(0.1)max) X3=(min(0.1)max)) min/max means observed min/max of X2 and X3 only when x4=0
margins if X4==1, dydx(X1) at (X2=(min(0.1)max) X3=(min(0.1)max)) min/max means observed min/max of X2 and X3 only when x4=1

(c) margins if X4==0, dydx(X1) at (X2=(min(0.1)max) X3=(min(0.1)max) X4=0) min/max means observed min/max of X2 and X3 only when x4=0
margins if X4==1, dydx(X1) at (X2=(min(0.1)max) X3=(min(0.1)max) X4=1) min/max means observed min/max of X2 and X3 only when x4=1

But of course, I need to decide on the model first between (i) and (ii) in the first inquiry before predicting the average marginal effects via a, b, or c.

I'm afraid I couldn't yet.

------------------------------------------------------------------UNQUOTE-----------------------------------------------------------------------------------------------------------------------------------------------------

Is there a trade-off between (i) and (ii) such that the former fully adjusts for the joint distribution of all variables (but treats both sizes of firms as if they were the same) whereas the latter partially adjusts (but treats only big firms as big firms)?

Best,

Lütfi
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#4

29 Dec 2022, 18:38

Is there a trade-off between (i) and (ii) such that the former fully adjusts for the joint distribution of all variables (but treats both sizes of firms as if they were the same) whereas the latter partially adjusts (but treats only big firms as big firms)?

Well, I suppose if there is a really bright line that separates "very big ones" from "others," so that the difference between them is, in effect, qualitative, not just the arbitrary imposition of a dichotomy on a continuous variable (size), then I would lean towards (ii) because you could argue that out-of-sample very big firms would never live in the joint distribution of the other variables that other firms live in, and vice versa. But if size is really a continuum, and you have just picked a place to draw a line, but there are cases near that boundary on one or both sides, then that argument would not apply and I'd lean towards (i).

Actually, if size is really a continuum, I might make the case that you shouldn't have created a dichotomy in the first place and should have used continuous size in the analysis. That is normally what I do with continuous variables. In this particular context there are drawbacks to that approach: I suspect that the size distribution of all firms extends over several orders of magnitude, so there is a very good chance that the relationship of size to outcomes is appreciably non-linear and then one gets involved in transformations, perhaps even complicated things like splines. And on top of that you are looking for its moderating effect on a triple interaction. OMG! Could you dream up anything more complicated than that? So I might actually use a dichotomy (or maybe a trichotomy, or maybe even 5 groups) to get a little more fidelity and less information loss) just to evade those complications. To be completely honest, even with the simplification of a dichotomy, this model is very complicated and I think you face a substantial challenge in trying to present this to others, or even to truly understand it yourself. As you've noted, the usual margins plots will not be very helpful here. I suppose contour plots buy you an extra dimension although, frankly, contour plots just don't work for me. I'd probably do a series of "small repeat" plots in a 2-dimensional grid indexed by two of the variables, and ordinary line plots of the other two variables for the repeats. But those don't work for everybody either.

Well, I digress. Enough!
Comment
Lutfi Ozturker

Join Date: Apr 2017

Posts: 40
#5

29 Dec 2022, 19:37

Thanks again.
Actually, the nominal firm size is separately one of the control variables in the model and it's significant at the 1% level.
X4, on the other hand, equals 1 only for the biggest firms of each country (the international dataset includes 160 firms from 25 countries with 1446 firm-year observations unequally distributed among countries). There is a reason (in the banking industry where all firms operate) biggest firm is potentially number one to be bailed out by the national regulator which is factually relevant to the Y. The model also uses country and year-fixed effects.
Can I justify my preference for (ii), for instance, by investigating whether the statistics (which ones?) for big firms are significantly different than others (at the country level)? It would probably only then be argued that very big firms would never live in the joint distribution of the other variables that other firms live in.
Hint: AMEs of X1 are negative and significant at 1% level when X4=0 for both (i) and (ii). AMEs of X1 are positive and significant at 10% and 1% levels when X4=1 for (i) and (ii) respectively. If the model excludes X4 (all else being equal) then running (ii) yields a negative and at 1% level significant X1 for X4=0 firms (thus nothing changes for X4=0 firms) but an insignificant and negative X1 for X4=1 firms.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#6

30 Dec 2022, 08:56

Can I justify my preference for (ii), for instance, by investigating whether the statistics (which ones?) for big firms are significantly different than others (at the country level)? It would probably only then be argued that very big firms would never live in the joint distribution of the other variables that other firms live in.
Hint: AMEs of X1 are negative and significant at 1% level when X4=0 for both (i) and (ii). AMEs of X1 are positive and significant at 10% and 1% levels when X4=1 for (i) and (ii) respectively. If the model excludes X4 (all else being equal) then running (ii) yields a negative and at 1% level significant X1 for X4=0 firms (thus nothing changes for X4=0 firms) but an insignificant and negative X1 for X4=1 firms.

I wouldn't rely on this. Statistical significance, in particular, has nothing to do with this issue. The difference between the means of two distributions can be highly statistically significant, although the distributions greatly overlap. This would be a total misapplication of statistical significance.

I do now agree that the dichotomization of size was appropriate in this case because you expect it to relate strongly to a qualitative difference in the way the firms are treated by the government. As for whether they also live in a different world with regard to the covariates, however, requires looking directly at those distributions and the extent to which they overlap. Unless they are completely separated, which is unlikely to be the case, it becomes a judgment call. Not a judgment call that I can make, as this is nowhere close to my area of expertise. One that you can make, perhaps with input from others in your field.
Comment
Lutfi Ozturker

Join Date: Apr 2017

Posts: 40
#7

30 Dec 2022, 12:21

Thank you.

Actually, when I obtain predictive margins for X4 by

margins X4
test 0.X4 = 1.X4
Prob > chi2 = 0.1328

suggests we cannot find evidence that the predictive margins differ for very big firms. Is that a strong indicator that distributions overlap to a large extent and thus I'd rather prefer (i) to (ii).

xtreg Y c.X1##c.X2##c.X3##X4 Controls i.country i.year,r
(i) margins X4, dydx(X1 X2 X3)
(ii) margins if X4==0, dydx(X1 X2 X3) & margins if X4==1, dydx(X1 X2 X3)

What worries me most is, as I summarized in the previous inquiry, the preference over (i) and (ii) changes very little for the aforementioned results when it comes to how average marginal effects (AMEs) of X1 on Y differ for very big firms and others (they differ across two sizes of firms largely in both i and ii), but rather the exclusion of X4 from the model (all else being equal) for very big firms (X4=1) causes sarcastic change (positive and significant AMEs in both i and ii turns negative and insignificant) whereas nothing changes for others (X4=0) as they remain negative and significant at 1% level when I run (ii) after the model.

And my last inquiry on the topic:

No matter if I prefer (i) or (ii), why did you find the interpretation of AMEs by coloured contours graph too complicated? (I don't say otherwise just got curious why so) Isn't it pretty straightforward:

xtreg Y c.X1##c.X2##c.X3##X4 Controls i.country i.year,r
margins X4, dydx(X1) at (X2=(min(0.1)max) X3=(min(0.1)max) X4=(0 1))
twoway contour _mar _at2 _at3 if _pvalue<0.05 & X4==0
twoway contour _mar _at2 _at3 if _pvalue<0.05 & X4==1

Then the interpretation of the first graph when X4=0 is something like this (based on the actual graph): As seen in the graph with X2 on the y axis and X3 on the X where AMEs of X1 on Y are coloured in contours, AMEs of X1 on Y are significantly conditional on X2 and X3 such that for larger X3 and smaller X4 this effect is more detractive etc. Same for the second graph.

Many thanks again for your insightful comments and a happy new year.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#8

30 Dec 2022, 13:48

margins X4
test 0.X4 = 1.X4
Prob > chi2 = 0.1328

The -test- here does not contrast the predictive margins at X4 = 0 with X4 = 1. I tests whether the regression coefficient for 1.X4 = regression coefficient for 0.X4. Those are different. To use -test- to contrast margins, you have to add the -post- option to the -margins- command (and this erases the regression's results in e(), replacing them with the -margins- results). Better ways to do this sort of thing are -margins r.X4- or -margins X4, pwcompare-.

Added { That said, you really need to look at the distributions as a whole. Even when the means of two distributions are the same, they might not overlap. For example a discrete distribution that is a spike at 0.5 has exactly the same mean as a discrete distribution with two equal spikes at 0 and 1. But they do not overlap.
}

but rather the exclusion of X4 from the model (all else being equal) for very big firms (X4=1) causes sarcastic change (positive and significant AMEs in both i and ii turns negative and insignificant) whereas nothing changes for others (X4=0) as they remain negative and significant at 1% level when I run (ii) after the model.

I don't quite understand what you are saying here, nor exactly what you have done. If you post the code and outputs, I think it would be clearer.

No matter if I prefer (i) or (ii), why did you find the interpretation of AMEs by coloured contours graph too complicated? (I don't say otherwise just got curious why so)

I don't know if I can explain it. I think it has to do with the fact that, in general, I'm not strong on visual representation of things. I just have a lot of difficulty looking at a contour plot and understanding what it's trying to tell me. I know other people find them easy to understand. I just don't. There are other kinds of graphs that I also don't grasp without great effort (or at all). Some cognitive limitation I have.

Last edited by Clyde Schechter; 30 Dec 2022, 13:53.
Comment
Lutfi Ozturker

Join Date: Apr 2017

Posts: 40
#9

30 Dec 2022, 15:31

1-) contrasting the predictive margins

xtreg Y c.X1##c.X2##c.X3##X4 Controls i.country i.year,r
margins X4, dydx(X1) pwcompare (effects)

dy/dx Std. Err. z P>z [95% Conf. Interval]
X1
X4
1 vs 0 0.056 0.017 3.27 0.001 0.023 0.090

2-) Average marginal effects (AMEs) of X1 on Y conditional on X4 in two different approaches (i and ii)

xtreg Y c.X1##c.X2##c.X3##X4 Controls i.country i.year,r

(i) margins X4, dydx(X1)

dy/dx Std. Err. z P>z [95% Conf. Interval]
X1
X4
0 -0.029 0.008 -3.72 0.000 -0.044 -0.014
1 0.027 0.016 1.69 0.091 -0.004 0.059

(ii) margins if X4==0, dydx(X1) at (X4=(0 1))

1._at : X4 = 0
2._at : X4 = 1

dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
X1
_at |
1 | -0.031 0.008 -4.07 0.000 -0.046 -0.016
2 | 0.022 0.015 1.41 0.159 -0.009 0.052
------------------------------------------------------------------------------

margins if X4==1, dydx(X1 X2 X3) at (X4=(0 1))

1._at : X4 = 0
2._at : X4 = 1

dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
X1
_at |
1 | -0.018 0.010 -1.78 0.075 -0.037 0.002
2 | 0.061 0.024 2.56 0.010 0.014 0.108

Approaches (i) and (ii) don't make a great difference for AMEs of X1 on Y for both sizes of firms except that it is larger in magnitude (0.061 vs 0.027) and more significant (z=2.56 vs 1.69) for very big firms (X4=1) in approach (ii)

3-) AMEs above change sarcastically only for big firms (X4=1) when X4 is excluded from the model (all else being equal)

xtreg Y c.X1##c.X2##c.X3 Controls i.country i.year,r

margins if X4==0, dydx(X1)

dy/dx Std. Err. z P>z [95% Conf. Interval]

X1 -0.021 0.007 -3.13 0.002 -0.035 -0.008

margins if X4==1, dydx(X1)

dy/dx Std. Err. z P>z [95% Conf. Interval]

X1 -0.004 0.009 -0.44 0.657 -0.021 0.013

Positive and significant AMEs of X1 for the biggest firms (X4=1) in both approaches (i and ii of item 2 above) turned negative (-0.004) and insignificant (z=-0.44) in item 3 whereas nothing changed for other firms (X4=0) as they remain negative (-0.021) and significant at 1% level.

Questions remain the same, I just tried to clarify them above. Thank you!

Last edited by Lutfi Ozturker; 30 Dec 2022, 15:45.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#10

30 Dec 2022, 15:42

OK, much clearer thanks.

The results in 1) and 2) i) are straightforward.

(ii) margins if X4==0, dydx(X1) at (X4=(0 1))

This makes no sense. -if X4 == 0- contradicts -at(X4 = (0 1))-. To be honest, I don't know what Stata is calculating here.

As for 3) these are also straightforward. But you have to remember that the model in 3), because it does not include X4, is a different model from the earlier calculations. Consequently X1, X2, and X3 do not mean the same things in 3) as they do in the earlier models. So it actually doesn't make sense to compare the results of 3) to those of the earlier models.
Comment

Lutfi Ozturker

Join Date: Apr 2017
Posts: 40

#11

30 Dec 2022, 16:41

Please ignore my previous post and sorry for not making it clear. I hereby give it a better try to make it readable.

I am using xtreg in Stata 14.1.

1-) contrasting the predictive margins

Code:

xtreg Y c.X1##c.X2##c.X3##X4 Controls i.country i.year,r

Code:

margins X4, dydx(X1) pwcompare (effects)

HTML Code:

                    dy/dx          Std. Err.       z         P>z       [95% Conf. Interval]
X1
         X4
        1 vs 0      0.056            0.017        3.27      0.001        0.023   0.090

2-) Average marginal effects (AMEs) of X1 on Y conditional on X4 in two different approaches (i and ii)

Code:

xtreg Y c.X1##c.X2##c.X3##X4 Controls i.country i.year,r

(i)

Code:

margins X4, dydx(X1)

HTML Code:

                    dy/dx          Std. Err.       z         P>z       [95% Conf. Interval]
X1
         X4
          0        -0.029          0.008        -3.72       0.000         -0.044   -0.014
          1         0.027          0.016         1.69       0.091         -0.004    0.059

(ii)

Code:

margins if X4==0, dydx(X1) at (X4=(0 1))

HTML Code:

1._at : X4 = 0
2._at : X4 = 1

                    dy/dx          Std. Err.       z         P>z       [95% Conf. Interval]
X1
at|   
1 |                -0.031          0.008        -4.07        0.000         -0.046   -0.016
2 |                 0.022          0.015         1.41        0.159         -0.009    0.052

Code:

margins if X4==1, dydx(X1) at (X4=(0 1))

HTML Code:

1._at : X4 = 0
2._at : X4 = 1

                    dy/dx          Std. Err.       z         P>z       [95% Conf. Interval]
X1
at|   
1 |                -0.018          0.010        -1.78        0.075         -0.037    0.002
2 |                 0.061          0.024         2.56        0.010          0.014    0.108

Approaches (i) and (ii) don't make a great difference for AMEs of X1 on Y for both sizes of firms except that it is larger in magnitude (0.061 vs 0.027) and more significant (z=2.56 vs 1.69) for very big firms (X4=1) in approach (ii)

3-) AMEs above change sarcastically only for big firms (X4=1) when X4 is excluded from the model (all else being equal)

Code:

xtreg Y c.X1##c.X2##c.X3 Controls i.country i.year,r

Code:

margins if X4==0, dydx(X1)

HTML Code:

                    dy/dx          Std. Err.       z         P>z       [95% Conf. Interval]
X1                 -0.021            0.007       -3.13      0.002        -0.035   -0.008

Code:

margins if X4==1, dydx(X1)

HTML Code:

                    dy/dx          Std. Err.       z         P>z       [95% Conf. Interval]
X1                 -0.004            0.009       -0.44      0.657        -0.021    0.013

Positive and significant AMEs of X1 for the biggest firms (X4=1) in both approaches (i and ii of item 2 above) turned negative (-0.004) and insignificant (z=-0.44) in item 3 whereas nothing changed for other firms (X4=0) as they remain negative (-0.021) and significant at 1% level.

Questions remain the same, I just tried to clarify them above. Thank you!

Comment

Lutfi Ozturker

Join Date: Apr 2017

Posts: 40
#12

30 Dec 2022, 16:57

This makes no sense. -if X4 == 0- contradicts -at(X4 = (0 1))-. To be honest, I don't know what Stata is calculating here.

I guess the answer is on page 17 of https://www.stata.com/manuals/rmargins.pdf

It decomposes AME of approach (i) where all firms are treated as 0 into those two discrete amounts from which AME comes;

a. treating small firms as small firms, and
b. treating small firms as big firms

separately.

Code:

margins if X4==0, dydx(X1)

would only give the required result in a.
Final conclusion: Preferance over approach (i) and (ii) requires conceptual judgment and cannot be decided solely based on this discussion. Correct? I'd still like to your preferance, if you had to choose one. Reporting both should be absurd in a paper, right? Many thanks.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#13

30 Dec 2022, 17:01

2 ii) still makes no sense to me. And I really don't know what those results mean, if anything. If you want separate, partially adjusted AMEs for X1 conditional on X4 you can get that with

Code:

margins, dydx(X1) over(X4) or, equivalently, margins if X4 = 1, dydx(X1) margins if X4 = 0, dydx(X1)

The results from 3 are based on a different underlying model from the others and comparison across those models is not meaningful. The model that includes X4 has a level-effect from X4 and also includes regression-adjustment for confounding by X4. The results in the model that doesn't include X4 lacks those things (not to mention lacking interaction with X4). The extent to which results change when you include or exclude a variable(s) is a matter of the degree to which the excluded variables X4 and all its interactions with X1, X2, and X3 are confounders (or colliders) of the relationship between the dependent and independent variables. What kind of conclusion other than that are you trying to draw from this finding in your data?

Added: Crossed with #12.

Preferance over approach (i) and (ii) requires conceptual judgment and cannot be decided solely based on this discussion. Correct?

Correct. It is not a statistical issue. Once you understand what each set of statistics means, the choice between them depends on the substance being investigated, and what we understand about the underlying real-world data generating model. I have no basis for having a preference between the two because, a) I don't know what the variables Y, X1, X2, and X3 are, and even if I did, as we are in the realm of finance or economics, I lack the knowledge to appreciate what would be reasonable to believe about the relationships among these variables. (I'm an epidemiologist.) So I lack everything needed to know which of these models makes more sense as a reflection of real life.

Last edited by Clyde Schechter; 30 Dec 2022, 17:07.
Comment
Lutfi Ozturker

Join Date: Apr 2017

Posts: 40
#14

30 Dec 2022, 19:34

Very well.

Let's give it a final try, if you agree, about under which circumstances one would prefer (ii) to (i) for our baseline model:

Code:

xtreg Y c.X1##c.X2##c.X3##X4 Controls,r

For the sake of the argument, say Y is a health measure, X1 is the number of cigarettes smoked per day, X2 is age, X3 is the number of hours spent for exercise per week and X4 is gender.

I guess you'd definitely prefer (i) to (ii) without hesitation though I cannot explain skillfully in a statistical manner why.

Please feel free to suggest a set-up in which, contrary to that above, you'd definitely prefer (ii) to (i). It could be anything on the common grounds or even from your field, epidemiology. That real-life example would help swallow otherwise hard staff.

Thanks again!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#15

30 Dec 2022, 20:09

Keeping everything else the same, if X4 were an indicator distinguishing, say people living in a modern industrialized country vs people living in an impoverished, technologically lagging country. In this case, the other variables would take on values so different from each other that it would make sense to do separate adjustments to the predictive margins, and your (ii) would be preferred.
Comment

Announcement