Plotting marginal effects from a binary logistic regression

Daniel Devine

Join Date: Dec 2017

Posts: 21
#1

Plotting marginal effects from a binary logistic regression

07 Dec 2017, 10:08

Hello all,

I'm wrestling with a problem of how to plot marginal effects from a binary logistic regression, that includes an interaction. I'm sure I'm making some mistakes at the end of a long day, but here is my attempt so far and why I'm facing a problem. I am trying to do similar to the graphs posted below from this article, but the code is different enough that I can't simply replicate the author's.

Both variables in the interaction are binary.

Code:

logit dv iv1 iv2 iv3 i.iv4 i.iv5 i.iv4##i.v5 margins, dydx(*) predict(pr) post coefplot, drop(_cons) xline(0)

The problem is this does not give me the interaction effect when graphed, as is my goal and as the author does in the graphs posted.

I can do it in two steps:

Code:

logit dv iv1 iv2 iv3 i.iv4 i.iv5 i.iv4##i.v5 margins, dydx(*) predict(pr) post estimates store m1 logit dv iv1 iv2 iv3 i.iv4 i.iv5 i.iv4##i.v5 margins, dydx(iv4) at(iv5=(0 1)) post estimates store m2 coefplot (m1) (m2), drop(_cons) xline(0)

But this then gives me an ugly graph (i.e, different colours for the different models), and I'm not sure whether it is statistically sound.

Ideally I would like to have similar to the below, where you can see the interaction effects in the first four rows.

Thanks in advance.
Tags: None

1 like
Bruce Weaver

Join Date: May 2014

Posts: 1144
#2

07 Dec 2017, 12:12

Daniel, here is your logit command:

Code:

logit dv iv1 iv2 iv3 i.iv4 i.iv5 i.iv4##i.v5

Is that i.v5 supposed to be i.iv5, by any chance? If so, you can simplify your command to:

Code:

logit dv iv1 iv2 iv3 i.iv4##i.iv5

Or even to:

Code:

logit dv iv1 iv2 iv3 iv4##iv5

The i. prefixes are not really needed, because variables joined by # or ## are treated as categorical by default. (For more info, type help fvvarlist, then scroll down to Basic Examples.)

HTH.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#3

07 Dec 2017, 15:22

Originally posted by Daniel Devine View Post

...

Code:

logit dv iv1 iv2 iv3 i.iv4 i.iv5 i.iv4##i.v5 margins, dydx(*) predict(pr) post estimates store m1 logit dv iv1 iv2 iv3 i.iv4 i.iv5 i.iv4##i.v5 margins, dydx(iv4) at(iv5=(0 1)) post estimates store m2 coefplot (m1) (m2), drop(_cons) xline(0)

Your syntax for margins corresponds with what the author wrote. The code above told coefplot you want m1 and m2 on the same subgraph (which is Ben Jann's terminology). However, you are referring to two different models, which are stored in two different sets of e(b) and e(V) matrices, so coefplot automatically interprets that as putting those two estimates on different plots (again, Jann's term) in the subgraph.

I went and downloaded the author's do file. How he got around it was that he 1) ran three separate margins commands, 2) for each one, he grabbed e(b), e(V), calcualated se from e(V), grabbed the correct degrees of freedom, 3) and then went and constructed his own matrices with just the coefficients, and the upper and lower bounds for a 95% CI, and 4) he then went and vertically concatenated the three matrices, and had coefplot plot him that concatenated matrix.

Neat, huh!

I know the code looks a bit gnarly, but you can actually replicate it yourself if you roughly know what you are doing. Grab his do file, then go to the line that says " * Reestimation of Model 5 "

Just keep reading downwards until just before his coefplot command. You should be able to replicate that by changing the appropriate variable names.

I do not know of an easy way to get margins to spit out the plot you need in one run. It could be done if you could tell margins to do this:

Code:

margins, dydx(*) at(iv5=(0 1 asobserved)) post

That would tell margins to run dydx for all variables, at iv5 = 0, then = 1, then as observed. Then, you'd tell coefplot to drop the coefficients you didn't need (e.g. most coefficients pertaining iv5 at 0 or 1). You can't do this, I just tested it. So, I think you are stuck with trying to replicate the code, or putting up with different colors (you do want to highlight the interaction, after all!), or else putting the two sets of plots on different subgraphs.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Daniel Devine

Join Date: Dec 2017

Posts: 21
#4

07 Dec 2017, 15:47

Thanks - I really appreciate you going through that effort. it's reassuring I wasn't barking up completely the wrong tree as well. I did have a go at replicating his code, but there were multiple issues I ran into.

The first is that his standard errors are computed differently (as far as I can tell) since he uses imputed values, shown (I think) in this part of the code:

Code:

se = diagonal(cholesky(diag(V)))

The second is that even when I think I had it figured out, the bit of the code below gave me the following error

Code:

Matrix_AMEs_M5_3 = b, b :- invttail(df,0.025):*se, b :+ invttail(df,0.025):*se invttail(): 3200 conformability error <istmt>: - function returned error

I had a hunch that this was due to his method of imputation (which I'm not using - I'm happy to sit with missing values!), but I'm not sure.

Getting around that would be great, if you have any ideas. And indeed - extremely neat from the author (if you're reading!).

Last edited by Daniel Devine; 07 Dec 2017, 16:12.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#5

07 Dec 2017, 16:48

I am not very familiar with matrix algebra, but I believe conformability error means that you're using two matrices, and their ranks (i.e. number of rows or columns) don't correspond. This led me to remember that in multiple imputation, every beta has its own degrees of freedom.(Or someting like that.)

In regular margins, you definitely use only one value of df for the entire margins computation. The documentation says that a standard normal distribution is used to compute degrees of freedom. You could just substitute 1.96 in place of the expression df (which, in the mata code, refers to the matrix called df).

I would also confirm what I said with a real statistician if I were you.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Dimitriy V. Masterov

Join Date: Mar 2014
Posts: 609

07 Dec 2017, 16:52

There is probably a less-hand-coded way of doing this, but I could not get the ordering of the coefficients to work.

Some comments. Note the use of a standard dataset in the solution. Doing that yourself in the question maximizes the chance that you get a response since tweaking someone's code is much easier than writing everything from scratch. Think of this as lowering the barrier to entry for a potential answerer.

You also do not need to re-estimate the logit each time, which can be very time-consuming. Just store and re-store the model each time you need it.

The key to this solution is to rename the m2 and m3 AMEs so that Stata does not confuse them with the overall AMEs. Then you order them by hand.

Code:

#delimit;
/* (1) Fake Data */
sysuse auto, clear;
gen high_mpg   = mpg > 18;
gen high_price = price > 7000;
replace weight = weight/1000;
replace trunk = trunk/10;

/* (2) Estimates the logit and the 3 sets of AMEs */
qui logit foreign i.high_mpg##i.high_price c.(weight trunk);
estimates store logit;
margins, dydx(*) post;
estimates store m1;
estimates restore logit;
margins high_mpg, dydx(high_price) post;
estimates store m2;
estimates restore logit;
margins high_price, dydx(high_mpg) post;
estimates store m3;

/* Plot after renaming the m2 and m3 AMEs */
coefplot
(m1, mcolor(navy) ciopts(color(navy)))
(m2, rename(0.high_mpg = "price_eff_low_mpg" 1.high_mpg = "price_eff_high_mpg") mcolor(navy) ciopts(color(navy)))
(m3, rename(0.high_price = "mpg_eff_low_price" 1.high_price = "mpg_eff_high_price") mcolor(navy) ciopts(color(navy)))
, xline(0, lcolor(gray))
coeflabel(
    1.high_mpg = "High MPG Overall" 
    mpg_eff_low_price = "High MPG at Low Price" 
    mpg_eff_high_price = "High MPG at High Price"
    1.high_price = "High Price Overall"
    price_eff_low_mpg = "High Price at Low MPG" 
    price_eff_high_mpg = "High Price at High MPG"
)
order(mpg_eff_low_price mpg_eff_high_price price_eff_low_mpg price_eff_high_mpg 1.high_mpg 1.high_price)
offset(0)
grid(within)
legend(off) plotregion(fcolor(white)) graphregion(fcolor(white));

This yields:

Click image for larger version

Name: Screen Shot 2017-12-07 at 3.39.28 PM.png
Views: 1
Size: 54.5 KB
ID: 1421508

Last edited by Dimitriy V. Masterov; 07 Dec 2017, 17:12.

Comment

Daniel Devine

Join Date: Dec 2017

Posts: 21
#7

08 Dec 2017, 02:22

Both,

You're both heroes as far as I am concerned! From what I can tell, both solutions are equivalent. I went with Dimitriy's solution as it was simpler, however, I also ran through the replication code and it worked as well. I had some problems with plotting, but I think that's my fault and easily solvable with a bit of time.

Thank you both so much.
Comment

Weiwen Ng

Join Date: Jun 2015
Posts: 1241

08 Dec 2017, 09:45

Originally posted by Dimitriy V. Masterov View Post

...

Code:

#delimit;
/* (1) Fake Data */
sysuse auto, clear;
gen high_mpg = mpg > 18;
gen high_price = price > 7000;
replace weight = weight/1000;
replace trunk = trunk/10;

/* (2) Estimates the logit and the 3 sets of AMEs */
qui logit foreign i.high_mpg##i.high_price c.(weight trunk);
estimates store logit;
margins, dydx(*) post;
estimates store m1;
estimates restore logit;
margins high_mpg, dydx(high_price) post;
estimates store m2;
estimates restore logit;
margins high_price, dydx(high_mpg) post;
estimates store m3;

/* Plot after renaming the m2 and m3 AMEs */
coefplot
(m1, mcolor(navy) ciopts(color(navy)))
(m2, rename(0.high_mpg = "price_eff_low_mpg" 1.high_mpg = "price_eff_high_mpg") mcolor(navy) ciopts(color(navy)))
(m3, rename(0.high_price = "mpg_eff_low_price" 1.high_price = "mpg_eff_high_price") mcolor(navy) ciopts(color(navy)))
, xline(0, lcolor(gray))
coeflabel(
1.high_mpg = "High MPG Overall"
mpg_eff_low_price = "High MPG at Low Price"
mpg_eff_high_price = "High MPG at High Price"
1.high_price = "High Price Overall"
price_eff_low_mpg = "High Price at Low MPG"
price_eff_high_mpg = "High Price at High MPG"
)
order(mpg_eff_low_price mpg_eff_high_price price_eff_low_mpg price_eff_high_mpg 1.high_mpg 1.high_price)
offset(0)
grid(within)
legend(off) plotregion(fcolor(white)) graphregion(fcolor(white));

Very elegant! I think the offset option was what did it, and I wasn't aware of that option - the syntax is quite complex!

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

Announcement