Comparison coefficients between different xtlogit models with the same dependent variables

Rocio Aguilar

Join Date: Jan 2016

Posts: 53
#1

Comparison coefficients between different xtlogit models with the same dependent variables

21 Jul 2017, 07:00

Dear Stata users,
I would like to ask you about how could I compare two different coefficient of xtlogit models with the same dependent variable.
Model 1:
. xtlogit a006 a016 a020_nac i.a018 i.a019 i.a023 i.a021_imput a041_l2_imput a002_new a024_l2 i.anyo if a030_4==1, nolog i( ident)
. estimate store model11
Model 2:
. xtlogit a006 a016 a020_nac i.a018 i.a019 i.a023 i.a021_imput a041_l2_imput a002_new a024_l2 i.anyo if a030_4==2, nolog i( ident)
. estimate store model12
I would like to compare the coefficient of a016 var in model 1 with a016 var in model 2.

In this case I need to run two different models instead of introducing a factor variable i.a030 in the model.
If I run:
. lincom [model11]a016 - [model12]a016
The error message is: “equation [model11] not found”
If I run previously to lincom:
. suest model11 model13
The error message is: “unable to generate scores for model model11 suest requires that predict allow the score option r(322)”
I would be very grateful if you could give me some advice.
Thanks in advance,
Rocio Aguilar
Tags: lincom, suest, xtlogit
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

21 Jul 2017, 08:55

In this case I need to run two different models instead of introducing a factor variable i.a030 in the model.

Why?
Comment
Rocio Aguilar

Join Date: Jan 2016

Posts: 53
#3

21 Jul 2017, 09:11

Dear Clyde, the reason is that in our paper we split the sample in 4 groups. We report the regressions separate into 4 models (one for each sub-sample). We would like to report the comparisson between the coefficients of the main variables, but keeping the 4 models. I thought that the use of lincom after xtlogit was easier...
Thanks for your interest!
Rocio
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

21 Jul 2017, 09:24

Well, I just wouldn't do it that way. I don't know of any Stata command that will get you the contrast across models you are looking for here. As you've already discovered, -suest- doesn't work with -xtlogit-.

I would have done the original modeling as a single model with interaction terms:

Code:

xtlogit a006 (c.a016 c.a020_c.nac i.a018 i.a019 i.a023 i.a021_c.imput c.a041_c.l2_c.imput c.a002_c.new c.a024_c.l2 i.anyo)##a30, i(ident)

Then any contrasts of coefficients across values of a30 would be readable directly from the -xtlogit- output for the corresponding interaction term.

Note: The code above assumes that all of the predictor variables that were not prefixed with i. in your original command are continuous. If that is not correct, remove the c. prefix from any that are discrete.
2 likes
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4987
#5

21 Jul 2017, 10:22

I agree with Clyde. Note too that you could still run each group separately and use the interactions approach to test for significance of differences. Or, slightly klutzy, run the model 4 times, each time changing the reference group, and report the non-interaction coefficients for each group.

Having said that, note that comparisons of coefficients across groups in logit models can be problematic:

http://www3.nd.edu/~rwilliam/xsoc73994/RW_ESRA2013.pdf

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
2 likes
Comment
Rocio Aguilar

Join Date: Jan 2016

Posts: 53
#6

22 Jul 2017, 08:28

Thanks a lot Clyde and Richard,
In fact my original idea was run the model with the interaction i.a030_4##c.a016. However, due to the theoretical approach, we decided to run the model four times (a030 = 1 to 4) as Richard said.

In this point, It's clear that the comparison between coefficients of different models is only possible with the interaction in the model and after run lincom (or test).

Please, let me ask you another quick question through an example:

The results of xtlogit regression with separate models are:
When a030_4==3, coeff (a016)=0.250 (p=0.000) and OR (a016)=1.284 (p=0.000)
When a030_4==4, coeff (a016)=0.482 (p=0.000) and OR (a016)=1.619 (p=0.000)

Including the interaction i.a030_4##c.a016 in one only model:

The interaction is significative for the groups a030_4==3 (KIS -0.409 p=0.000) and a030_4==4 (NoKIS -0.860 p=0.000).

Lincom results of 3.a030_4#c.a016 – 4.a030_4#a016 are:

Then, my question is, if in the separate models the coeff. of group 4 (0.482) is bigger than the group 3 (0.250) one, why Lincom test is significative and with positive coefficient? Does it mean that when one only model (with interaction) is considered, the coefficiente of group 3 is bigger than group 4?

I would be very glad if you could clarify my doubt.

Thanks you in advance,
Rocio Aguilar
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#7

22 Jul 2017, 10:39

Well, adding only the one interaction to the model is not equivalent to running the four separate models, so you can't really compare the output from this with the original four outputs. They could produce very different and surprising results, as you have seen. The code I recommended in #4 interacts a030_4 with everything else in the model. That is closer to an exact equivalent of four models. (It is still not exactly the same, but is closer and probably won't produce any shocking results.)

The reason they are different is that when you use only a single interaction term, you are implicitly constraining the effects of all the non-interacted variables to be the same in all groups of a030_4. That constraint will in turn affect the estimates for the interacted variables. Your four separate models evidently impose no such constraint at all. If you interact all the variables with a030_4, then you are building a model in which all of the variables can have different effects according to the value of a030_4, which is similar to what running four separate models does.

If you review your original four-separate-models output, I think you will see that there are at least some other variables whose coefficients differ appreciably across models. By forcing those coefficients to the same value in your single-interaction model, you are distorting the results.
2 likes
Comment
Rocio Aguilar

Join Date: Jan 2016

Posts: 53
#8

22 Jul 2017, 11:09

Thank you very much Clyde! I was thinking about the reference group in the interaction i.a030_4##c.a016. The ouput reports coefficients for groups 2, 3 and 4 (and group 1 would be the reference group). I thought if we sholud take in account this fact for the interpretation (I am not sure).
Next code could be ok? Is It neccesary the interaction with the time dummy variable (i.anyo)?

Code:

. xtlogit a006 c.a020_nac#i.a030_4 i.a018#i.a030_4 i.a019#i.a030_4 i.a023#i.a030_4 c.a041_l2_imput#i.a030_4 c.a024_l2#i.a030_4 c.a016#i.a030_4 i.a021_imput#i.a030_4 c.a002_new#i.a030_4 i.anyo, nolog i( ident)

Thanks again, your comments are very useful!!!!
Rocio Aguilar
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#9

22 Jul 2017, 11:49

No, that code is yet a different (and mis-specified) model: it contains only interaction terms without the constituent effects. Either you have to separately specify those individual effects or, easier in my view, replace all those #'s by ##'s. The difference between # and ## is crucial!

Also, you can simplify things by "factoring out" the i.a030_4 along the lines I showed in #4 (although doing so is optional and just for convenience.)
1 like
Comment
Rocio Aguilar

Join Date: Jan 2016

Posts: 53
#10

22 Jul 2017, 12:08

Dear Clyde, thanks a lot for your patient!!!
Smarter:

Code:

. xtlogit a006 (a020_nac i.a018 c.a016 i.a021_imput c.a002_new i.a030_4 i.a019 i.a023 c.a041_l2_imput c.a024_l2 i.anyo)##i.a030_4, nolog i(ident)

I wondering if reporting that the probability of DEPVAR in group 4 is bigger than group 3, taking the OR value (1.619 over 1.284), too risky perhaps??

The results of xtlogit regression with separate models are:
When a030_4==3, coeff (a016)=0.250 (p=0.000) and OR (a016)=1.284 (p=0.000)
When a030_4==4, coeff (a016)=0.482 (p=0.000) and OR (a016)=1.619 (p=0.000)

Thanks you for your kind assistance,
Best,
Rocio
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#11

22 Jul 2017, 12:35

Well, after running the correct model (with all terms interacted) here's where your -lincom- approach comes in.

Code:

lincom _b[3.a030_4#c.a016] - _b[4.a030_4#c.a016]

That calculates the difference between those coefficients and will give you a confidence interval as well as a test of the null hypothesis that the difference is zero.

This is what you yourself proposed in #6. That was, and remains, the correct approach to this goal. The reason I put you on a detour then is that the model itself was incorrect. But now that you have the correct model, this is the step that gets you that specific answer.

Just comparing the coefficients you got from the separate models gives you only a comparison of the estimated coefficients with no quantification of the precision. Clearly 0.482 is larger than 0.250. But how precise are those estimates? Maybe the difference between them is well within the "margin of error" of one or both of those estimates. Using separate models there is no way to know. It may well be that one or both of these estimates is so imprecise that no conclusion can be drawn about which population parameter is actually the larger one. That's why you have to use the all-terms-interacted model and then run -lincom- as suggested here.
2 likes
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#12

22 Jul 2017, 12:45

I'm puzzled why you are looking to revert to the separate models here. Here's a simple demonstration of how the all-terms-interacted model is equivalent to running separate models. I've simplified the problem by using -logit- instead of -xtlogit-, and instead of a multi-level variable, I'm using a two-level variable. But the principle is the same, and you can see with your own eyes that the all-terms-interacted model produces the same results.

Code:

sysuse auto, clear // TURN REP78 INTO A 4/5 DICHOTOMOUS VARIABLE keep if rep78 > 3 // LOGISTIC MODEL OF FOREIGN ON PRICE AND MPG // ESTIMATED SEPARATELY FOR REP78 == 4 AND == 5 logit foreign price mpg if rep78 == 4 logit foreign price mpg if rep78 == 5 // ALL VARIABLES INTERAACTED VERSION logit foreign c.(price mpg)##i.rep78 // CALCULATE (REPLAY) ESTIMATED EFFECTS FOR REP78 == 4 lincom price lincom mpg // CALCULATE ESTIMATED EFFECTS FOR REP78 == 5 lincom price + 5.rep78#price lincom mpg + 5.rep78#mpg

This model is small enough and simple enough that you can easily see exactly what is going on. I hope that seeing this will convince you that running the all-items-interacted model serves your purposes and you will finally let go of the four-separate-models approach which, although otherwise equivalent, is incapable of answering your specific research question.
2 likes
Comment
Rocio Aguilar

Join Date: Jan 2016

Posts: 53
#13

22 Jul 2017, 12:48

Certainly, you have given me the asnswer I was looking for! Thanks again!
And, a last and quick question: the group which I am interested in compare with the rest is a030_4=3. Do you think that I should recode a030_4 and choose group 3 as group1? Or is it not neccesary?
Thanks a lot!!!
Best,
Rocio
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#14

22 Jul 2017, 13:36

If group 3 is the reference category for a030_4, it definitely makes sense to use it as the reference category. You don't need to recode the variable itselfto do that, however. Leave the variable alone, and in the -xtlogit- command, replace i.a030_4 by ib3.a030_4. Stata will then use 3 as the reference level for that variable and all your comparisons to it will be quite straightforward and simple. Do read -help fvvarlist- to learn more about factor variable notation.
2 likes
Comment
Rocio Aguilar

Join Date: Jan 2016

Posts: 53
#15

23 Jul 2017, 03:52

Definitely thanks thanks and thanks!!!
Comment

Announcement

Comparison coefficients between different xtlogit models with the same dependent variables

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment