Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using i.prefix for categorical variables

    I have a question regarding using the i. prefix for categorical variables.

    Here is my model: ologit china_virus CN_Index i.factor_reltrad church_attendance prayer bible_authority if complete_case [pweight=WEIGHT]; The DV measures levels of agreement with Trump's use of the term "China Virus."

    Here I used the i. prefix for reltrad because its categories are different religious groups (e.g., mainline, evangelical, catholic etc.) and I want to compare each category to the baseline.

    However, church attendance ( coded from 1 to 8 where 1 =never attends and 8=attends weekly) , prayer (coded 1 to 6 with 1= never and 6=several times a day) , and bible authority (coded 1 to 4 where 4=The Bible means exactly what it says and 1=The Bible is an ancient book of history and legends) are also ordinal variables, but I am not interested in the difference within the categories, but whether the variable overall is a significant predictor of the DV (and the direction of impact). In that case, should I still use the i prefix for these categorical variables?

  • #2
    Ordinal variables are difficult. There is no ideal way to use them in regression analysis. If you treat them as merely categorical, then you are discarding the information about their ordering--and if there is some monotone relationship between that variable and the outcome, it is more difficult to recognize. To do that you have to notice that the coefficients of the levels, viewed in the variable's order, are monotone increasing or monotone decreasing. On the other hand, if you treat them as continuous variables, you are implicitly constraining your model so that the difference in outcome between X = 2 and X = 0 is twice the difference between X = 2 and X = 1. Similarly you are constraining your model so that the outcome level at X = 3 is midway between that of X = 2 and X = 4, etc. Now, sometimes that is even true, or close enough for practical purposes. But often it is not, and in that case this mis-specification of the X:outcome relationship as linear when in reality it is not produces misleading results not only for the ordinal variable itself, but sometimes it has spillover effects on the estimates of the other variables as well.

    In my own work I usually start out treating them as categorical and then examining the coefficients of all the levels. If those coefficients show a linear progression with the values of X, then I would switch over to continuous. If I find a monotone but non-linear relationship, then I would have to make a judgment call as to whether I am more interested in the simplicity of having a single number showing the direction of the relationship through its sign, and not terribly worried about the mis-specification of the model (in which case I would switch to continuous) or whether correct specification of the relationship is more important here. I would do that knowing that, either way, I am making a tradeoff between two desirable attributes of a solution which cannot be simultaneously achieved.

    Comment


    • #3
      Thank you so much. This is very helpful. I appreciate your prompt and thorough response.

      Comment

      Working...
      X