How to compare two binary logistic regression models obtained from a categorical and continuous measure of the same (predictor) variable?

Allyson Lamont

Join Date: Jun 2021

Posts: 4
#1

How to compare two binary logistic regression models obtained from a categorical and continuous measure of the same (predictor) variable?

23 Aug 2021, 12:54

Hello all,

This is my first post on StataList (so you know I must really be stuck!) and it is regarding whether a certain type of analysis is possible.

I am looking to examine if the (binary logistic) regression output obtained from using a continuous operationalization of a predictor differs from the regression output using a categorical operationalization of that same predictor. I have regressed each operationalization independently on the same (binary) outcome using logistic regression; I have also saved the estimates for both, and compared/combined them using the -suest- command. However, when I run -test- following the -suest- command (i.e., test [continuous_iv = categorical_iv]), I receive this notice from Stata: "variables differ between equations. r(111)". Even though the continuous and categorical measures are derived from the exact same data (i.e., from the exact same responses from the same respondents), Stata does not recognize them as being equivalent and is unable to compare them.

The continuous measure is an 11-point scale from 0 to 10, and the categorical measure is simply a categorization based on that same scale that produces 3 possible levels (a score of 0-1 is coded as 2; 2-5 is coded as 2; and 6-10 is coded as 3). The continuous measure only produces one OR when regressed onto the outcome, while the categorical measure produces two ORs (because 1 of the 3 level is excluded from the regression model), and I assume this is why Stata is unable to compare the two model outputs..

Does anyone know of any workarounds to this using the -test- command, or of any similar commands that may allow for me to examine my research questions?

Also, if sample data would help to understand my analyses/problem, I can certainly create some!

Thank you for your time!
Tags: categorical, continuous, regression, suest, test
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#2

23 Aug 2021, 15:04

The comparison that you have in mind is not possible in this simple way that you have in mind.

These are two different models, and the parameters estimated are not comparable.

You can compare Likelihoods, or McFadden pseudo R squares, and this would be comparison of model fit.

You cannot compare estimates parameters -- which is what you are trying to do -- because the estimated parameters are incomparable.
1 like
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#3

23 Aug 2021, 17:30

BIC tests are sometimes used to test alternative models.

i also think you could have both versions of x in the model. X only requires X to have linear effects on the log odds. Xcat would allow for some non linearity in the effects.

So, if both X and Xcat are in the model, and Xcat does not improve model fit, I.e. its effects are insignificant, that says x alone should be used.

if, on the other hand, The effects of X are not significant, then Xcat is enough.

of course, both may be significant. But why is this an either/or situation? It is not unusual to, say, have both X and X^2 in a model.I’m curious as to why you want to do this in the first place.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
2 likes
Comment
Allyson Lamont

Join Date: Jun 2021

Posts: 4
#4

23 Aug 2021, 17:44

Originally posted by Joro Kolev View Post

The comparison that you have in mind is not possible in this simple way that you have in mind.

These are two different models, and the parameters estimated are not comparable.

You can compare Likelihoods, or McFadden pseudo R squares, and this would be comparison of model fit.

You cannot compare estimates parameters -- which is what you are trying to do -- because the estimated parameters are incomparable.

Thank you for your prompt response. I figured that it likely wasn't possible to do a direct OR comparison. Comparing the likelihoods is probably a good way to go. Thank you again!
Comment
Allyson Lamont

Join Date: Jun 2021

Posts: 4
#5

23 Aug 2021, 17:49

Originally posted by Richard Williams View Post

BIC tests are sometimes used to test alternative models.

i also think you could have both versions of x in the model. X only requires X to have linear effects on the log odds. Xcat would allow for some non linearity in the effects.

So, if both X and Xcat are in the model, and Xcat does not improve model fit, I.e. its effects are insignificant, that says x alone should be used.

if, on the other hand, The effects of X are not significant, then Xcat is enough.

of course, both may be significant. But why is this an either/or situation? It is not unusual to, say, have both X and X^2 in a model.I’m curious as to why you want to do this in the first place.

Thank you for the quick response!
I'm not very familiar with BIC tests, but I'll definitely look into them! And it actually never even crossed my mind to include both X and Xcat in the model, which is a significant oversight on my part. This would likely be the easiest solution to my problem.
Thank you again!!
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#6

24 Aug 2021, 08:44

I don't know how or why you came up with xcat, but unless you have to use it there are many ways to model the effect of X:
Just use X alone. Simple and parsimonious. Assumes X has a linear effect on the log odds of Y = 1 occurring.

Use i.X. Lets every value of X have its own effect. Maximum flexibility but not too parsimonius. The pattern of coefficients may suggest other, more parsimonious things you could do, e.g. spline functions.

Use both X and X^2 and possibly higher powers. Very common when you suspect there is a curvilinear relationship

Use log of X

Use spline functions. This might be an alternative to what you had in mind when you created xcat. Allows the effect of X to vary across the range of X. For example, if X = years of education, you might believe that the effect of education is different for years of grade school education than it is for years of college education.

Interact X with other variables, e.g. maybe you suspect that the effect of years of education is different for men than it is for women.

Of course, it may be that you had great reasons for coding xcat the way you did and that it works just fine. But, unless you are absolutely wedded to using xcat, there are other (probably more common) approaches you can consider.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment
Allyson Lamont

Join Date: Jun 2021

Posts: 4
#7

24 Aug 2021, 08:54

Originally posted by Richard Williams View Post

I don't know how or why you came up with xcat, but unless you have to use it there are many ways to model the effect of X:
Just use X alone. Simple and parsimonious. Assumes X has a linear effect on the log odds of Y = 1 occurring.

Use i.X. Lets every value of X have its own effect. Maximum flexibility but not too parsimonius. The pattern of coefficients may suggest other, more parsimonious things you could do, e.g. spline functions.

Use both X and X^2 and possibly higher powers. Very common when you suspect there is a curvilinear relationship

Use log of X

Use spline functions. This might be an alternative to what you had in mind when you created xcat. Allows the effect of X to vary across the range of X. For example, if X = years of education, you might believe that the effect of education is different for years of grade school education than it is for years of college education.

Interact X with other variables, e.g. maybe you suspect that the effect of years of education is different for men than it is for women.

Of course, it may be that you had great reasons for coding xcat the way you did and that it works just fine. But, unless you are absolutely wedded to using xcat, there are other (probably more common) approaches you can consider.

I should have mentioned it in my original post, but Xcat is currently the standard measure/operationalization that is used when examining the variable of interest. The goal of the current research project is to determine whether there actually is a justification in using Xcat in my field of research, or whether it would make more sense (both from an analytical as well as a practical standpoint) to leave it simply as X (continuous). So, Xcat in my case is actually the original variable (obtained via survey data), and X is the "new"/comparison variable.

With that said, I do like your suggestions about how to treat X. I had looked at X^2 in previous research and will definitely be including it in this project. I also was previously unaware of the spline functions -- I will certainly have to look into those.

Thank you so much for your time and for your assistance!
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#8

24 Aug 2021, 13:32

I wondered if you might be doing something like that. People are often asked to give answers within ranges. For example, no one ever reports that they make $91,824 a year, but they may say their income is between $90,000 - $100,000.

In your case, I don't know why people don't just measure X. You could always collapse it to xcat if you wanted to. But it may be that people are unsure of their value on X, but are pretty sure in saying they are somewhere between 6 and 10.

Also, you have to consider the cognitive difficulty of a question. I can tell you how many cups of coffee I drink a day, but I have no idea how many pounds of coffee I drink a year. Maybe people like xcat because answering X is too hard.

You may be able to come up with a new and improved measure. But, if it is not a big improvement, it may not be worth making the change. You may want to compare your work with previous work, and that can be harder to do if you aren't using the same measures.

Hard to discuss in the abstract! But you don't just have to mindlessly do what others have. But then again, maybe you should, if they have been doing things in a good way.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment

Announcement

How to compare two binary logistic regression models obtained from a categorical and continuous measure of the same (predictor) variable?

Comment

Comment

Comment

Comment

Comment

Comment

Comment