Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regression with i.variable vs just variable


    Why does analyses result in different results depending on whether you use i.variable or omit the i. ?

    Example:
    Code:
    regress score i.city i.sex i.smoking
    gives the following result:
    1.city, p=0.628
    2. city, p=0.013
    3.city, p<0.0001
    sex, p=0.015
    smoking, p=0.003


    Code:
    regress score city sex smoking
    gives the following result:
    city, p<0.0001
    sex, p=0.072
    smoking, p=0.007


    My understanding: when not using the prefix i. you get an "overall" p-value. However, if that was the case, I don't understand why the p-values are as different as they are: sex is only statistically significant (p<0.05) when using the prefix i., and two out of three cities are significant when using the prefix, but p is very low (p<0.0001) when not using the prefix.


    Follow-up question: should you always specify i. before categorical variables, or when should you do it?


  • #2
    always specify "i." before categorical variables

    apparently your "city" variable has 4 possible values - in your second model you are treating this as quantitative and assuming that the city variable is linearly related to your outcome variable which may not be correct

    your other p-values change because they are p-values computed after adjusting for the other variables in the model and what those other variables are differs between the models

    I don't know where you got "My understanding: when not using the prefix i. you get an "overall" p-value" but, to the extent that I understand it, it is not correct; if you want an overall p-value for a multi-categorical variable, you must use a post-hoc procedure such as -test- or -testparm-; see their help files

    Comment


    • #3
      Sara:
      as an aside to Rich's helpful guidance, please share what you typed and (also) what Stata gave you back via CODE delimiters (as per FAQ).
      Actually, your regression may have other issues that the way you posted does not allow to identify.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment

      Working...
      X