Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • A question of interaction terms with indicator variables (dummy variables)

    Hi everyone,

    I'm running a simple regression with earnings as my variable of interest. I want to emphasize that the coefficients are different when earnings are positive and negative. The model is like this:

    Y=a_1+ a_2NI+ a_3NI×Neg+a_4Neg+CONTROLs (1)
    NI represents net income, Neg is an indicator variable that =1 if NI<0 and 0 otherwise

    An interesting thing is that I saw a paper with the similar question did a regression test like following:

    Y=b_1+ b_2Positive_NI+ b_3Negative_NI+b_4Neg+CONTROLs (2)
    Positive_NI=NI if NI>0, and 0 otherwise
    Negative_NI=NI if NI<0, and 0 otherwise
    Neg is still the indicator variable of loss.

    Even though I understand that I can solve my question with Eq. (1), I'm very curious about whether Eq. (2) is statistically correct (Eq.2 can help emphasize my hypothesis that b_2 is negative but b_3 is positive).

    Many thanks,
    Yiting


  • #2
    The two models are completely equivalent. The two models are just algebraic linear transforms of each other, different ways of parameterizing the same model. Use whichever is more convenient for you. All conclusions you draw will be identical.

    Comment


    • #3
      Hi Clyde,

      Many thanks! I just confirmed that the conclusions are identical!

      Best,
      Yiting

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        The two models are completely equivalent. The two models are just algebraic linear transforms of each other, different ways of parameterizing the same model. Use whichever is more convenient for you. All conclusions you draw will be identical.
        I have a following question, what if I have an additional indicator variable (IND) to interact:
        Thus I think the first equation becomes:
        Y=a_1+ a_2NI+ a_3NI×Neg+a_4NI×Neg×IND+a_5Neg+a_6Neg×IND+ a_7NI×IND +a_8IND+CONTROLs (1)

        Then how about the second one?
        Should it be:
        Y=b_1+ b_2Positive_NI+ b_3Negative_NI+b_4Neg+ b_5IND +b_6Positive_NI×IND+ b_7Negative_NI×IND+CONTROLs (2)

        or

        Y=b_1+ b_2Positive_NI+ b_3Negative_NI+b_4Neg+ b_5IND +b_6Positive_NI×IND+ b_7Negative_NI×IND+b_8Neg×IND+ b_9NI×IND+CONTROLs (3)
        ?

        The question is do I still need to include the interaction between IND and Neg and IND and NI.

        Thanks again!

        Best,
        Yiting

        Comment


        • #5
          Yes, you still need to include the two way interactions, so equation (3) would be the correct model to correspond to #1.

          Even apart from your particular problem, the general rule is that wherever you have a#b#c you also need a, b, c, a#b, b#c, and a#c. There are exceptions, but when you're building models, you should start by presuming you need all of those terms, and only eliminate them if there is a compelling justification for doing so.

          Comment


          • #6
            Hi Clyde,

            Got it! Many thanks!

            best,
            Yiting

            Comment


            • #7
              Originally posted by Yiting Cao View Post

              I have a following question, what if I have an additional indicator variable (IND) to interact:
              Thus I think the first equation becomes:
              Y=a_1+ a_2NI+ a_3NI×Neg+a_4NI×Neg×IND+a_5Neg+a_6Neg×IND+ a_7NI×IND +a_8IND+CONTROLs (1)

              Then how about the second one?
              Should it be:
              Y=b_1+ b_2Positive_NI+ b_3Negative_NI+b_4Neg+ b_5IND +b_6Positive_NI×IND+ b_7Negative_NI×IND+CONTROLs (2)

              or

              Y=b_1+ b_2Positive_NI+ b_3Negative_NI+b_4Neg+ b_5IND +b_6Positive_NI×IND+ b_7Negative_NI×IND+b_8Neg×IND+ b_9NI×IND+CONTROLs (3)
              ?

              The question is do I still need to include the interaction between IND and Neg and IND and NI.

              Thanks again!

              Best,
              Yiting

              Hi everyone,

              I think the my Eq (3) is wrong. The correct version is
              Y=b_1+ b_2Positive_NI+ b_3Negative_NI+b_4Neg+ b_5IND +b_6Positive_NI×IND+ b_7Negative_NI×IND+b_8Neg×IND+CONTROLs (4)

              b_9NI×IND shouldn't be included since NI is colinear with Positive_NI and Negative_NI.

              Positive_NI=NI if NI>0, and 0 otherwise
              Negative_NI=NI if NI<0, and 0 otherwise
              Neg is still the indicator variable of loss.

              Clyde, please let me know if you I think I'm wrong.

              Best,
              Yiting

              Comment


              • #8
                Yes, it looks like you are right.

                You know, you don't need to do it this way. Basically you have 3 variables, Positive_NI, Negative_NI, and Neg, and then you have a variable IND, that you want to include along with its interactions with all of the others. So instead of risking making mistakes, you can let Stata do this for you automatically, by using factor-variable notation:

                Code:
                regression_command Y i.IND##(c.Positive_NI c.Negative_NI i.neg) // AND OTHER COVARIATES, OPTIONS AS APPROPRIATE
                Stata will generate all the appropriate combinations you need. If any of them turn out to be colinear, Stata will automatically omit something during the estimation.

                Comment


                • #9
                  Many thanks Clyde! I just want to make sure that no one will be confused by my posts .

                  Actually, in my real test I have two indicator variables besides Positive_NI, Negative_NI, and Neg: (i.e. IND_treat and IND_post), which makes the case more complicated. It's more like a 3-ways interaction (actually 4-ways).

                  Comment


                  • #10
                    If you are going up to three and four way interactions, I think the case for using factor-variable notation to get Stata to generate all the terms is even more compelling. Also, by using factor-variable notation, you will be able to use the -margins- command after estimation, which will greatly simplify figuring out predicted values in various combinations of your variables, and marginal effects.

                    Comment


                    • #11
                      Hi Clyde,

                      Thanks for your comments! I used factor-variable notation. However I can't find a way to compare coefficients when using factor-variable notation (e.g. test i.IND#posini=i.IND#negni)

                      Please let me know if I miss anything!

                      Best,
                      Yiting

                      Comment


                      • #12
                        Use the -coefl- option when running (or replaying) your estimation command. It tells you the names of parameters which makes it easier to set up the tests you want.

                        Testparm can also be handy, e.g. if i.relig is in your estimation command then -testparm i.relig- can test whether all the associated coefficients significantly differ from zero.
                        -------------------------------------------
                        Richard Williams, Notre Dame Dept of Sociology
                        StataNow Version: 19.5 MP (2 processor)

                        EMAIL: [email protected]
                        WWW: https://www3.nd.edu/~rwilliam

                        Comment


                        • #13
                          Got it!! Many thanks Richard!!

                          Best,
                          Yiting

                          Comment

                          Working...
                          X