Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpretation of a categorical by categorical interaction term

    Dear Statalist,

    I am interested in the interpretation of the interaction term of two dummy/indicator variables.

    Please, find below an illustrative example below:

    I download a wage dataset from "campus.lakeforest.edu/lemke/econ330/stata/lab5/wages.dta"

    I generate ln(wage) as dependent variable, an indicator for being black vs. non-black from the categorical race variable, and recode the female string variable as indicator variable.

    Code:
    gen lnwage = ln(wage)
    gen byte black = 0
    replace black = 1 if (race == 1)
    gen byte female = 0
    replace female = 1 if (sex == "F")
    Then I run the following regression model:

    Code:
    reg lnwage i.female##i.black, r

    . reg lnwage i.female##i.black, r

    Linear regression Number of obs = 704
    F( 3, 700) = 3.26
    Prob > F = 0.0210
    R-squared = 0.0130
    Root MSE = 1.0749

    ------------------------------------------------------------------------------
    | Robust
    lnwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    1.female | -.2639805 .0883198 -2.99 0.003 -.437384 -.0905769
    1.black | -.204688 .1329487 -1.54 0.124 -.465714 .056338
    |
    female#black |
    1 1 | .4608161 .2452047 1.88 0.061 -.0206087 .9422409
    |
    _cons | 9.880011 .0578324 170.84 0.000 9.766465 9.993557
    ------------------------------------------------------------------------------



    Question 1: What is the respective control group for a black female?
    Question 2: How do I interpret the interaction term (black female) correctly?
    Question 3: How does this interpretation differ from a classic difference-in-differences interaction term?


    Now, I change the regression by keeping the race variable with its three levels (black, hispanic, white):

    Code:
    reg lnwage i.female##i.race, r

    . reg lnwage i.female##i.race, r

    Linear regression Number of obs = 704
    F( 5, 698) = 2.69
    Prob > F = 0.0204
    R-squared = 0.0182
    Root MSE = 1.0736

    ------------------------------------------------------------------------------
    | Robust
    lnwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    1.female | .1968356 .229074 0.86 0.390 -.2529209 .6465922
    |
    race |
    2 | -.0781757 .1776793 -0.44 0.660 -.4270256 .2706742
    3 | .245223 .1355059 1.81 0.071 -.0208249 .511271
    |
    female#race |
    1 2 | -.1330989 .2964309 -0.45 0.654 -.715102 .4489042
    1 3 | -.5099186 .249273 -2.05 0.041 -.9993334 -.0205038
    |
    _cons | 9.675323 .1198826 80.71 0.000 9.439949 9.910697
    ------------------------------------------------------------------------------

    Question 4: How do I interpret the two interaction effects?

    Any help or reference is highly appreciated.

  • #2
    Question 1. There is no "control group" of any kind here. I think what you mean are reference categories. The reference category for female is male, and for black it is non-black. If you want, you can think of non-black male as being an "overall" reference category, but it would be more exact to say that a non-black male belongs to both reference categories.

    Question 2. The black#female interaction term's coefficient is an estimate of how much greater the effect on lnwage of being black is when you are female instead of male. Equivalently, it is an estimate of how much greater the effect of being female is when you are black instead of non-black.

    Question 3. It doesn't. They are exactly the same thing.

    Question 4. So this is a different model. Now the reference category for race is black. The other categories 2 and 3 are white and hispanic (though I don't know which is which from what you've shown.) So let's say, for example, that 2 = white. Then the 1 2 term for female#race estimates how much greater the effect of being female on lnwage is when you are white instead of black. The 1 3 term estimates how much greater the effect of being female on lnwage is when you are hispanic instead of black.

    All of that said, talking in these terms is, at best, non-intuitive. Your regression results will be more understandable if you follow up with -margins-.

    Code:
    margins female#race // PREDICTED VALUES OF lnwage IN EACH COMBINATION OF RACE & SEX
    
    margins race, dydx(female) // MARGINAL EFFECT ON lnwage OF BEING FEMALE IN EACH RACE
    These latter statistics are easier for most of us to think about and discuss than the raw regression output.

    Comment


    • #3
      Thank you very much for your specific answer and clarification. Highly appreciated!

      Comment


      • #4
        Dear Clyde,

        Since the interaction term 1 2 is negative, does it not mean that the effect of being female on lnwage when being white instead of black is less? And not greater.

        Kind regards

        Comment


        • #5
          I don't understand what you are saying. 1.2 is not a negative number. Do you mean that the actual coefficient is -1.2? And, if so, what is the entire context and model for your question. The 1.2 was a made-up number to illustrate a point--it is not part of the actual output of any of the models shown in this thread so far.

          Comment


          • #6
            The coefficient of the interaction term 1 2 (female#race) is -0.113 (above in the last table, before Christian asked question 4). Meaning that that the effect of being female on lnwage when being white white is less (positive) than when being black, or is that incorrect?
            Last edited by Mike Overwater; 17 Jun 2019, 14:06.

            Comment


            • #7
              Oh, I see. I misread 1 2 as the number 1.2 and couldn't find it anywhere in the outputs!

              The data set in question does not explain which value of race (1, 2, or 3) corresponds to white, black, or anything else. What we can say is that whatever race 2 is, the effect on lnwage of being female in that race is lower, by 0.113, then the effect of being female in whatever race 1 is (race 1 being the reference, or omitted, category of race in the model).

              Comment


              • #8
                Ah yes, exactly! Haha my bad for not being more clear initially. Thank you very much. I took your initial reference to transform 2 = white in your first post, but yeah in his results its not directly visible to see which number corresponds to which race.

                Comment


                • #9
                  Hi Clyde, would you be able to check my interpretation of the coefficients on these interaction variables? My basic regression structure is
                  Code:
                  xtreg fund_flow event dummy, fe
                  where “event” and “dummy” are binary indicator/dummy variables.
                  If I run it with a binary operator to specify factorial interactions, it should look like
                  Code:
                  xtreg fund_flow event##dummy, fe
                  the output is
                  Click image for larger version

Name:	interaction.PNG
Views:	1
Size:	13.2 KB
ID:	1506736

                  Am I correct in interpreting
                  (1) the -.0113236 as the difference in fund_flow for the event = 1 compared to the event = 0,
                  (2) the -.0233344 as the difference in fund_flow for the dummy = 1 compared to the dummy = 0, and
                  (3) the .0091886 as the difference in fund_flow for the event = 1 (compared to the event = 0) and the dummy = 1 (compared to the dummy = 0)
                  Thank you very much, Dan

                  Comment


                  • #10
                    Originally posted by Clyde Schechter View Post
                    Question 1. There is no "control group" of any kind here. I think what you mean are reference categories. The reference category for female is male, and for black it is non-black. If you want, you can think of non-black male as being an "overall" reference category, but it would be more exact to say that a non-black male belongs to both reference categories.

                    Question 2. The black#female interaction term's coefficient is an estimate of how much greater the effect on lnwage of being black is when you are female instead of male. Equivalently, it is an estimate of how much greater the effect of being female is when you are black instead of non-black.

                    Question 3. It doesn't. They are exactly the same thing.

                    Question 4. So this is a different model. Now the reference category for race is black. The other categories 2 and 3 are white and hispanic (though I don't know which is which from what you've shown.) So let's say, for example, that 2 = white. Then the 1 2 term for female#race estimates how much greater the effect of being female on lnwage is when you are white instead of black. The 1 3 term estimates how much greater the effect of being female on lnwage is when you are hispanic instead of black.

                    All of that said, talking in these terms is, at best, non-intuitive. Your regression results will be more understandable if you follow up with -margins-.

                    Code:
                    margins female#race // PREDICTED VALUES OF lnwage IN EACH COMBINATION OF RACE & SEX
                    
                    margins race, dydx(female) // MARGINAL EFFECT ON lnwage OF BEING FEMALE IN EACH RACE
                    These latter statistics are easier for most of us to think about and discuss than the raw regression output.
                    Dear Prof. Schechter:

                    In response to Question 4, you stated that that the 1 2 term for female#race (coefficient of -.1330989) estimates how much greater the effect of being female on lnwage is when you are white instead of black. I am just confused with the terminology of"how much greater the effect of being..." and am wondering if this can instead be stated as The lnwage decreases on-average by 0.133 for female white workers?

                    Also you suggested that, instead of interpreting the raw coefficients, one can interpret the marginal effects calculated as:

                    Code:
                     
                     margins female#race // PREDICTED VALUES OF lnwage IN EACH COMBINATION OF RACE & SEX  margins race, dydx(female) // MARGINAL EFFECT ON lnwage OF BEING FEMALE IN EACH RACE
                    In my case, the differences in the marginal effects on lnwage of being female in each race are exactly similar to the raw coefficients Stata provides for the interaction terms. Does this indicate that the raw coefficients can be indeed be interpreted as increase/decrease in the response outcome for a specific combination of the two variables (such as individual being female and white).

                    Many thanks!

                    Comment


                    • #11
                      and am wondering if this can instead be stated as The lnwage decreases on-average by 0.133 for female white workers?
                      No, that would be entirely wrong. The interaction coefficients do not represent the effects in specific groups. Rather they are differences in differences. Moreover, there is nothing in that model about anything increasing or decreasing.

                      The -0.133 regression coefficient for white females is this. The mean lnwage for black males (the reference group) is the constant term, 9.68. For a black female it would be 9.68 + .197 (the female coefficient). For a white female, you might think it would then be 9.68 + .197 - .078 (the white coefficient). But you would be wrong: this is where the interaction coefficient comes in. The combination of white and female is not the sum of white and female: you also have to add in the interaction coefficient. So the correct expected value for ln wage for white females would be 9.68 + .197 -.078 - .133. That is the meaning of the interaction term: it is the correction that needs to be applied to a simple model of adding up separate effects.
                      Last edited by Clyde Schechter; 24 Sep 2021, 09:48.

                      Comment


                      • #12
                        Hi Dr. Clyde:

                        Thanks for your response. In response to a similar question (link below), I found your response extremely useful (response # 2).

                        https://www.statalist.org/forums/for...action-results

                        Following the discussion there, wouldn't the effect of white females be: -0.0781757 + -0.1330989, the last number is being the 1.female#2.race interaction coefficient? Many thanks.

                        Comment


                        • #13
                          An effect is the difference between outcomes among two groups. We tend to name the effects after one of the group and omit the other. But that laxity in language is what is confusing you.

                          In the model we are discussing here, the base category (female = 0 and race = 0) is black males. If you add up -0.0781757 + -0.1330989 you are adding the coefficients of white race and white race#female. That isn't the effect of anything. It's a mismatch because you have neglected to include the "main" effect of being female.

                          Doing this sort of thing is confusing. That's why I recommend people not attempt to do it. The -margins- command will give you the expected values of the outcome in all combinations of race and sex, and it will give you the marginal effects of each sex in each race, and of each race in each sex. It never combines things inappropriately or mistakenly omits a term. Use it!

                          Comment

                          Working...
                          X