Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Changing the referency category for a variable that is part of an interaction term

    Hi everyone,

    I am having an issue with changing the base/reference category that is itself part of an interaction variable (I searched for it in other questions in the forum, but found nothing - if I missed something, please let me know). The very simple two lines of code below illustrate the problem:

    Code:
    sysuse auto, clear
    reg price ib4.rep78#foreign
    Those two lines generate this output:

    Click image for larger version

Name:	fig1.png
Views:	1
Size:	26.2 KB
ID:	1731313


    As the output shows, the base reference category for the variable "rep78" is not "4". Is there a way to do what I describe, while using only rep78#foreign rather than rep78##foreign?

    Thank you in advance for your time.

  • #2
    As the output shows, the base reference category for the variable "rep78" is not "4".
    Look again, the reference category is 4. It's just that you have an incorrect expectation of what that should look like.

    There are two different ways to code an interaction in Stata. One of them is i.x##i.y, which expands to i.x i.y and i.x#i.y, or just i.x#i.y. Ultimately these are just two different ways to parameterize the same thing. Let's suppose x has m different levels and y has n different levels. When you use i.x##i.y, the i.x and i.y are expand in the usual way with one reference category for each omitted. So you get m-1 x variables and n-1 y variables. You also get the products of all of those, which is (m-1)*(n-1) #1 x#y variables, which gives you a total of m*n - 1 variables for the whole thing.

    So, when you use i.x#i.y alone to represent the interaction, you should anticipate a total of m*n-1 terms to represent it. This makes sense in the usual way because there are m*n x#y combinations, and one of them has to be omitted as the reference category for the interaction. But only one of them, not two. You are, I imagine, expecting to see no 4.rep78#?.foreign terms in the model--but that would mean omitting two of the interaction levels, whereas there is only one to spare in this representation. So Stata eliminated the 4.rep78#Domestic term, because 4 is your selected reference category for rep78 and Domestic is the reference category for foreign. Look at the output carefully: 4.rep78#Domestic is not there. But it cannot also drop 4.rep78#Foreign, because then there would not be enough terms remaining to fully represent the interaction. Notice that all of the other levels of rep78 appear twice in the output, once with Foreign and once with Domestic. But because you selected 4 to be the reference category for rep78, there is only one term for the 4 level of rep78--the one paired with the reference category of foreign is omitted.

    Comment

    Working...
    X