Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with the base with factor variables

    I'm stuck with something. I have two categorical independent variables, IV1 (with values 1, 2, and 3) and IV2 (a dichotomous variable with values zero and one), and a continuous dependent variable (DV1). I'm requesting the following command:
    Code:
    regress DV1 ib(1).IV1#ib(0).IV2, allbase
    In the results, the base variable is, as I asked, the value 1 of IV1 crossed with the value 0 of IV2. But if I then estimate the following:
    Code:
    gregress DV1  ib(1).IV1#ib(0).IV2 i.IV1, allbase
    Now the base variables are all those interactions that involve the value zero.
    Why, when inserting IV1 alone into the second equation, do the base variables become three?

  • #2
    Why, when inserting IV1 alone into the second equation, do the base variables become three?
    You are modeling an interaction between a 3-level variable and a 2-level variable. The total number of degrees of freedom for that interaction is (3*2)-1 = 5. Now, there are different ways of representing that interaction, your two regression commands (I assume -gregress- is a typo?) represent two of those. In addition to that, there is also the representation as ib1.iv1#ib0.iv2 i.iv1 i.iv2.

    But no matter how you represent the interaction, there are still only 5 degrees of freedom possible. So when you add i.iv1 to your model, Stata creates two non-base indicators to represent it. Well, those two degrees of freedom then have to come away from the ib1.iv1#ib0.iv2 terms: so Stata turns 2 of them into base categories. Because you stipulated that you wanted 0 to be the base category for iv2, it selected those iv1#iv2 categories where v2 = 0--there were two of them not already used as base.

    The point is that there is only five degrees of freedom in a 3*2 interaction, and if you represent it with more than five terms, some of them have to be omitted as base categories. No matter what combination of variables you try to represent this interaction with, you will end up with five, and only five, non-base categories.

    Comment


    • #3
      Dear Clyde Schechter,
      Thank you very much for your response. Indeed, there was a typographical error. In both cases, the command is "regress." I apologize for that. I understand, therefore, that the regression is correct and that there would be no robustness issues. In any case, I reiterate my gratitude.

      Comment

      Working...
      X