Correct use of year-dummy in margins command

Guest
#1

Correct use of year-dummy in margins command

06 Dec 2019, 11:14

I am using a Panel Data with two waves and I have a year dummy indicator for the second year: y.2017. I am using an RE-Logit model with a binary dependent variable for political participation.

My question:
If I want to estimate the predicted probabilities of participation for females and males in the first year (y2017=0) , which of these commands should I use?

Code:

margins female, at(y2017=0)

Code:

margins female if y.2017==0

From my understanding, the first code uses all my observations, no matter if they are in 2013 or 2017, assumes it was 2013, and then predicts participation for females and males ?
And the second one only uses observations from 2013. I'm guessing I should use the first ?

I used the following command to estimate the average changes in predicted probabilities between the two years:

Code:

margins female, dydx(y2017)

If I want to analyse the gender-gap in each year and also the change in the gender gap, how would I incorporate that ?
Is this command correct ? :

Code:

margins, dydx(female) at(y2017=(0 1))

or

Code:

margins female, over(y2017)

?

I did read up about Margins here: http://www.stata-journal.com/article...article=st0260 However, I am still not sure how to incorporate my year dummy correctly.

Last edited by sladmin; 16 Dec 2019, 09:22. Reason: anonymize original poster
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#2

06 Dec 2019, 11:29

As for your first question, the answer is that it depends on what you want to estimate. The two commands answer different questions, and when speaking casually both questions sound the same, even though they are not.

-margins female, at(y2017 = 0)- will give you the expected outcomes in year 2017 for males and females, adjusted to the joint distribution in both years of all variables in the regression other than female and y2017.. In particular, the results will be adjusted for any differences between the distributions of the other variables during years 2013 and 2017.

-margins female if y2017 == 0- will give you the expected outcomes for males and females in 2017, adjusted to the joint distribution in year 2017 only, of all other variables. In particular, if the distributions of other variables differ between 2013 and 2017, only the year 2017 distributions will be taken into account for adjustment.

Moreover, in the (unlikely) event that all the other model variables have identical joint distribution in 2013 and 2017, then the estimated expected outcomes will be the same, but the standard errors will differ because the first is based on a larger sample than the second.

In most contexts, the -margins female, at(y2017 = 0)- version is what people are interested in, but not always. So you need to think about the specific purpose of doing the analysis and how you plan to use the results to see which type of adjustment is appropriate.

-margins, dydx(female) at (y2017 = (0 1)- will get you the gender gap in each year. A cleaner coding of the same results would be -margins y2017, dydx(female)-. If you then want to focus on the change in the gender gap between those years, you would use -margins y2017, dydx(female) pwcompare-.

Last edited by Clyde Schechter; 06 Dec 2019, 11:32.
1 like
Comment

Guest

06 Dec 2019, 11:52

Great ! Thank you. margins female, at(y2017 = 0) then definitely is what I was looking for.

My supervisor actually just sent me a Code that he used for my Data when I asked him about the testing of gender gaps in both years:

Code:

. margins female, over(y2017) coefleg post

Predictive margins                              Number of obs     =     21,444
Model VCE    : OIM

Expression   : Pr(interested=1), predict(pr)
over         : y2017

------------------------------------------------------------------------------
             |     Margin  Legend
-------------+----------------------------------------------------------------
y2017#female |
  2013#Male  |   .4937853  _b[0bn.y2017#0bn.female]
2013#Female  |   .3101679  _b[0bn.y2017#1.female]
  2017#Male  |   .5131496  _b[1.y2017#0bn.female]
2017#Female  |   .3445576  _b[1.y2017#1.female]
------------------------------------------------------------------------------

Code:

. lincom _b[0.y2017#0.female] - _b[0.y2017#1.female] // gender gap in 2013

 ( 1)  0bn.y2017#0bn.female - 0bn.y2017#1.female = 0

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   .1836174    .009021    20.35   0.000     .1659366    .2012982
------------------------------------------------------------------------------

.
end of do-file

. do "C:\Users\Guest\AppData\Local\Temp\STDc9c_000000.tmp"

. lincom _b[1.y2017#0.female] - _b[1.y2017#1.female] // gender gap in 2017

 ( 1)  1.y2017#0bn.female - 1.y2017#1.female = 0

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |    .168592   .0079064    21.32   0.000     .1530957    .1840883
------------------------------------------------------------------------------

Code:

lincom _b[0.y2017#0.female] - _b[0.y2017#1.female] ///
>  - _b[1.y2017#0.female] + _b[1.y2017#1.female] // difference in gender gap

 ( 1)  0bn.y2017#0bn.female - 0bn.y2017#1.female - 1.y2017#0bn.female + 1.y2017#1.female = 0

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   .0150254   .0097488     1.54   0.123    -.0040819    .0341326
------------------------------------------------------------------------------

What different assumptions does the over() option make here ?

Last edited by sladmin; 16 Dec 2019, 09:23. Reason: anonymize original poster

Comment

Guest
#4

06 Dec 2019, 11:56

I remember from one of your answers that over() estimates conditional and not adjusted predictions, but how does one usually choose between the two from a "newbie" point of view ?
I want to compare my dependent variables before and after and for different groups, but to be honest with my undergrad knowledge right now; I find it hard to wrap my head around which margins option would be "appropriate" right now.
I'm guessing my professor definitely had some reason why he chose the over() option in this case to illustrate the example, which means that I'm probably better off also using this in my paper, BUT of course I would like to understand the more practical reason why to choose over() here.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#5

06 Dec 2019, 12:42

Using -over- is like using two margins commands, one with -if y2017 == 0- and the other with -if y2017 == 1-. It gives results for each year that are calculated only on a single year's observations and that do not adjust for differences between years in the distributions of other variables.

I would not assume that your professor had a good reason for choosing the -over()- option. Professors make mistakes; I'm a professor and I should know because I make plenty of mistakes. The -over()- option in -margins- is widely misunderstood. That's because, like the distinction you asked about in #1, it is somewhat subtle and confusing. Most of the time when people use -over()- they are doing so in error. Usually people do want the results fully adjusted, and that means using -margins female#y2017- in this instance.

Then again, you shouldn't assume your professor made a mistake either. Go talk to him/her and ask why -over()- was used and see if it was correctly aligned with the research questions or whether it was a mistake.
1 like
Comment
Guest
#6

06 Dec 2019, 13:22

Ok great. I actually think he only used over because this is the option I used in my E-Mail to him.
To make it less tedious I'm probably using the pwcompare option.

What actually matters the most to me is that I have some form of consistency in the assumptions I make when using margins etc. So for instance have the same assumptions on joint distributions etc. when estimating (1) predicted probabilities in both years (2) gender-gaps, east-west gaps etc. in both years and (3) predicting the change in probabilities

If I use the following sequence of Code, do you think it looks fine ?

Code:

*PREDICTED PROBABILITIES FOR CAT. VAR. IN BOTH WAVES margins hhinc_group west female party_pref unemployed worried, at y2017=(0 1)

Code:

* ADJUSTED DIFFERENCES BETWEEN WAVES FOR CATEGORICAL VARIABLES margins hhinc_group west female party_pref unemployed worried, dydx(y2017)

-> Is it kind of "information overload" if I include margins for both years as well as the change in A publication style table ? I though of instead just presenting margins for the first year along with the change.

Code:

* MARGINSPLOT TO DISPLAY CHANGE IN THE EFFECT OF AGE margins, at(age=(20(5)80) y2017=(0 1)) marginsplot

Code:

*EVALUATING THE GENDER-GAP & CHANGE IN GENDER-GAP margins y2017, dydx(female) dydx(female) pwcompare

One interpretational question regarding time-dummy interactions in the RE-Logit output: What can I actually say about them when evaluating the output ? y.2017#female for instance. Initially I assumed that I can use it to answer the question if the gender gap has significantly changed or not (by looking at the ß of the interaction term). However, now I know that to evaluate gender gaps etc I should use margins because I need to control for the values of other variables. That being said, what can I actually say about this term in terms of interpretation.

And again, thank you for your great help!
I have never worked with Panel Data, let alone Logistic Panel models so there definitely appears some confusion here and there
Comment
Guest
#7

06 Dec 2019, 15:12

Edit: I noticed my error in the very last command. should be

Code:

margins y2017, dydx(female) pwcompare
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#8

06 Dec 2019, 16:29

If I use the following sequence of Code, do you think it looks fine ?

Yes.

Is it kind of "information overload" if I include margins for both years as well as the change in A publication style table ? I though of instead just presenting margins for the first year along with the change.

[/quote]
That depends on who your audience is and what they will be interested in. All else being equal, I tend to prefer minimalist tables that are uncluttered and easy on the eye. But if the 2013 and 2017 predicted values are important in their own right and the audience will also want to know about the difference between them, then you need to put all three in there. It is unkind, at best, to make the reader do the subtraction. Worse, even if the reader does the subtraction on predicted values, the reader cannot calculate the standard errors (or confidence intervals) around the changes. So I think you need to ask yourself what your audience wants to get out of your presentation, and then give them that. It's all about the audience (reader)!

One interpretational question regarding time-dummy interactions in the RE-Logit output: What can I actually say about them when evaluating the output ?
y.2017#female
for instance. Initially I assumed that I can use it to answer the question if the gender gap has significantly changed or not (by looking at the ß of the interaction term). However, now I know that to evaluate gender gaps etc I should use margins because I need to control for the values of other variables. That being said, what can I actually say about this term in terms of interpretation.

I see that you are using a logistic model. That makes this question bite, because in a linear model there would be no difference in these approaches. But in a logistic model they are different, usually only slightly so, but occasionally very different. Unfortunately, the question of whether to look at differences in adjusted predicted outcomes or to look at the regression interaction coefficient is a controversial issue. For myself, I come down strongly on the side of differences in adjusted predicted outcomes because they are easy to understand and they are in the metric that a decision maker or policy analyst would find usable. But there are those who are inclined to take the logistic model as being the primary thing (rather than just a tool for making predictions) and would rely instead on the interaction coefficients. I disagree strongly with that, but am too tired right now to rant about why.
1 like
Comment

Announcement

Correct use of year-dummy in margins command

Comment

Comment

Comment

Comment

Comment

Comment

Comment