Regression with i.variable vs just variable

Sara Hansen

Join Date: Apr 2022

Posts: 30
#1

Regression with i.variable vs just variable

20 Apr 2023, 06:42

Why does analyses result in different results depending on whether you use i.variable or omit the i. ?

Example:

Code:

regress score i.city i.sex i.smoking

gives the following result:
1.city, p=0.628
2. city, p=0.013
3.city, p<0.0001
sex, p=0.015
smoking, p=0.003

Code:

regress score city sex smoking

gives the following result:
city, p<0.0001
sex, p=0.072
smoking, p=0.007

My understanding: when not using the prefix i. you get an "overall" p-value. However, if that was the case, I don't understand why the p-values are as different as they are: sex is only statistically significant (p<0.05) when using the prefix i., and two out of three cities are significant when using the prefix, but p is very low (p<0.0001) when not using the prefix.

Follow-up question: should you always specify i. before categorical variables, or when should you do it?
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#2

20 Apr 2023, 07:32

always specify "i." before categorical variables

apparently your "city" variable has 4 possible values - in your second model you are treating this as quantitative and assuming that the city variable is linearly related to your outcome variable which may not be correct

your other p-values change because they are p-values computed after adjusting for the other variables in the model and what those other variables are differs between the models

I don't know where you got "My understanding: when not using the prefix i. you get an "overall" p-value" but, to the extent that I understand it, it is not correct; if you want an overall p-value for a multi-categorical variable, you must use a post-hoc procedure such as -test- or -testparm-; see their help files
2 likes
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#3

20 Apr 2023, 08:00

Sara:
as an aside to Rich's helpful guidance, please share what you typed and (also) what Stata gave you back via CODE delimiters (as per FAQ).
Actually, your regression may have other issues that the way you posted does not allow to identify.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Regression with i.variable vs just variable

Comment

Comment