binary logistic regression

ahmed farhan

Join Date: Jun 2023

Posts: 34
#1

binary logistic regression

06 Aug 2023, 15:21

hi all

quick query regarding binary logistic regression.
after running this command in stata, what coefficient/odds ratios are presented within the table for a categorical variable that is being adjusted for in the model in the situation that the "i." prefix is not used for this variable?
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#2

06 Aug 2023, 15:32

You are more likely to get a helpful answer from somebody if you show the actual command you are concerned about. Also explain what this categorical variable for which the i. prefix has not been used is, and what values it can take.
Comment
ahmed farhan

Join Date: Jun 2023

Posts: 34
#3

06 Aug 2023, 15:46

sure, here is the command

logistic blood_product_transfusion ib2.patient_anticoagulant_med time type_femur operation_preformed other_associated_injuries reverse_anticog_med antiplatelet_medication pre_op_hb_level tranexamic_acid cci_points, base

here is the output without using i. prefix for any categories

many of these adjusted variables are categorical, if for example, i put an i. prefix in front of type_femur variable then i get the following:

there is now an odds ratio for each category for type_femur. so my query is what odds ratio did stata present when an i. prefix was not used?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#4

06 Aug 2023, 18:41

Well, the news is not good. Had the categorical variables involved been dichotomous, I would have been able to tell you that the odds ratio you got is the same as you would have gotten with i.

But this is not the case here. The femur-type variable has four levels. The odds ratio that you got by including it without i. is what the odds ratio would be if these four femur types were actually "equally spaced." By equally spaced I mean that the odds ratio for Distal:Proximal were exactly the square of the odds ratio for Diaphyseal:Proximal, and that for 4:Proximal the cube of the odds ratio for Diaphyseal:Proximal. You can see from the results that you got when you did include i.type_femur that this is not even close to true. In fact, those odds ratio go up and then down, so you couldn't even do a monotone transformation of type_femur that would meet this criterion.

So the odds ratio you see without i. is the odds ratio that you would have gotten under a condition of the world that is demonstrably very false. It has no validity as a description of anything happening in reality. And the news is worse than that. To the extent that type_femur is not independent of the other variables, this error in estimation for type_femur also "infects" all the other estimates. (Though I see in these outputs that the differences in the other odds ratios between those models are pretty small--from which I infer that type_femur is almost independent of the other variables.) In short, the entire model is just plain invalid and you should just discard it and not interpret its findings.
Comment
ahmed farhan

Join Date: Jun 2023

Posts: 34
#5

07 Aug 2023, 03:25

thanks for your reply.
I may have misunderstood but have you assumed those levels are ordinal rather than nominal?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#6

07 Aug 2023, 08:50

have you assumed those levels are ordinal rather than nominal

No. What I'm saying is that by omitting the i., you have caused Stata to treat this as not just an ordinal variable but an interval-level variable (or, for practical purposes, as an ordinal variable that is equally spaced.) The fact that these are, in fact, nominal, makes this analysis completely invalid.
Comment
ahmed farhan

Join Date: Jun 2023

Posts: 34
#7

07 Aug 2023, 13:05

ok, thanks Clyde.
so i understand that it is invalid if i omit the i.
however are you saying the model is also invalid if i use the i. ?
if so then should i remove the type_femur variable to correct this?

Last edited by ahmed farhan; 07 Aug 2023, 13:08.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#8

07 Aug 2023, 13:13

however are you saying the model is also invalid if i use the i. ?
if so then should i remove the type_femur variable to correct this?

No, I'm not saying that. Include i.type_femur in the model. Similarly, any other categorical variables should be entered with an i. prefix.

But any model with a categorical variable that has more than 2 levels entered without the i. prefix will be invalid.
Comment
ahmed farhan

Join Date: Jun 2023

Posts: 34
#9

07 Aug 2023, 13:23

ok, thanks.

may i please also get your advice on inclusion of a continuous variable for this mutlivariable logistic regression model, say age for example.
in a similar fashion to what we have discussed regarding categorical variables, is it just as important to include the c. prefix in that hypothetical scenario? or can these levels be considered "equally spaced" intervals by default?
Comment
ahmed farhan

Join Date: Jun 2023

Posts: 34
#10

07 Aug 2023, 13:35

or should i be using the i. prefix for age too?
here is what happens if i do:
which is better practice and more accurate?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#11

07 Aug 2023, 14:12

If you want a variable to be treated as continuous, you do not need the c. prefix: by default Stata treats variables with no prefix as continuous, except if they appear in an interaction term. Variables that appear unprefixed in an interaction term are, by default treated as categorical.

Age can be treated either way--it's a modeling issue. If you think that the relationship between log odds blood transfusion and age is (more or less) linear, then treating it as continuous is the way to go. The resulting odds ratio will represent the multiplicative increment in odds of blood transfusion per year of age.

If, however, you question the linearity of the relationship, then treating it as discrete is one option. But for a variable like age that has so many levels, the resulting output is hard to work with, with each of the odds ratios representing the odds ratio of the given age vs the base category, which appears to be 60 in your case. Also, the number of people at any specific one-year age may be rather small, especially at extreme ages, so those coefficients become unreliable. So, if you are confronted with a non-linear relationship and a lage number of categories to the variable, one can do things like break the variable into groups (say, 5 or 10 year age groups), or use spline variables (-help makespline-) to capture the nonlinearity while retaining the essential continuity.
Comment
ahmed farhan

Join Date: Jun 2023

Posts: 34
#12

07 Aug 2023, 15:04

thank you for all your insightful responses and tutelage Clyde, sincerely appreciated
Comment

Announcement

binary logistic regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment