Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Differences between interaction and subgroup analysis

    Hi,
    I want to compare the effect of one variable (var1) on dependent variable across two subgroups (e.g. female and male).
    One way is to run the same regression model separately for each group and test the differences between these two
    groups using suest.

    Another way is to run the regression model including the interaction term (var1*sex).

    Which one we should use? or both? Thanks very much for input.

    Best regards,
    wanhaiyou

  • #2
    wanhaiyou:
    I would prefer interaction:
    Code:
    regress depvar c.var1##i.sex//assuming that -var1- is continuous
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      wanhaiyou:
      I would prefer interaction:
      Code:
      regress depvar c.var1##i.sex//assuming that -var1- is continuous
      Dear Carlo, thanks very much for your help. What are the main reasons that you prefer interaction?
      Are there any differences between interaction and subgroup analysis?

      Thanks very much for your clarify.

      Best regards,
      wanhaiyou

      Comment


      • #4
        wanhaiyou:
        it's probably a matter of personal taste: I prefer interactions because -regress- allows using -robust- and -cluster- options for standard errors.
        If you plan to use -suest-
        Estimation should take place without the vce(robust) or vce(cluster clustvar)...
        (please, see page 2239 of Stata 13.1 .pdf manual, -suest- entry).
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment


        • #5
          When we are talking about linear regression, the only difference is that the two separate regression estimate separate residual variances across groups, while using interactions you have one residual variance. The point estimates will be the same and if you do inference correctly the conclusions will also be the same. This difference no longer applies when you do logistic regression, as no residual variance is estimated (this is actually a big problem, but that is a different story as that problem applies equally to both ways of estimating the interaction effect).

          The real advantage of interaction effects is that it is more flexible when you have multiple explanatory variables. By estimating seperate regressions you force an interaction between gender and all other explanatory variables, while with interaction effects you can choose which effect can change over gender and which remains constant. This is often very desirable as interaction effects tend to eat large amounts of statistical power. On a more mechanical level, with interaction effects (if you used the factor variable notation) you have all the power of margins, contrast, and marginsplot at your disposal. If you like your effects in the form of "the effect of x for men and the effect of x for women" (as you would get with seperate regressions) rather than "the effect of men and how much this effect is different for women" (the default for interaction effects) then you can look at http://maartenbuis.nl/publications/ref_cat.html or just use contrast. In short: with logistic regression you can do more with interactions than with seperate regression.
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            Originally posted by Carlo Lazzaro View Post
            wanhaiyou:
            it's probably a matter of personal taste: I prefer interactions because -regress- allows using -robust- and -cluster- options for standard errors.
            If you plan to use -suest- (please, see page 2239 of Stata 13.1 .pdf manual, -suest- entry).
            Dear Carlo,thanks very much for your help.
            As seen from Stata user guider, suest command also can report robust standard error. In the user guider of Stata 12, we can see that the following sentences
            Typical applications of suest are tests for intramodel and cross-model hypotheses using test
            or testnl, for example, a generalized Hausman specification test. lincom and nlcom may be used
            after suest to estimate linear combinations and nonlinear functions of coefficients.
            Code:
             suest may also be used to adjust a standard VCE for clustering or survey design effects.
            .......
            Code:
             The estimators should be estimated without vce(robust) or vce(cluster
            clustvar) options. suest returns the robust VCE, allows the vce(cluster clustvar) option, and
            automatically works with results from the svy prefix command (only for vce(linearized)).
            Best regards,
            wanhaiyou

            Comment


            • #7
              wanhaiyou:
              i agree with you.
              I simply focussed on the last block of your message when I replied to your post#1 (probably because I was thinking about -regress-).
              Kind regards,
              Carlo
              (Stata 18.0 SE)

              Comment


              • #8
                Originally posted by Maarten Buis View Post
                When we are talking about linear regression, the only difference is that the two separate regression estimate separate residual variances across groups, while using interactions you have one residual variance. The point estimates will be the same and if you do inference correctly the conclusions will also be the same. This difference no longer applies when you do logistic regression, as no residual variance is estimated (this is actually a big problem, but that is a different story as that problem applies equally to both ways of estimating the interaction effect).
                The real advantage of interaction effects is that it is more flexible when you have multiple explanatory variables. By estimating seperate regressions you force an interaction between gender and all other explanatory variables, while with interaction effects you can choose which effect can change over gender and which remains constant. This is often very desirable as interaction effects tend to eat large amounts of statistical power. On a more mechanical level, with interaction effects (if you used the factor variable notation) you have all the power of margins, contrast, and marginsplot at your disposal. If you like your effects in the form of "the effect of x for men and the effect of x for women" (as you would get with seperate regressions) rather than "the effect of men and how much this effect is different for women" (the default for interaction effects) then you can look at http://maartenbuis.nl/publications/ref_cat.html or just use contrast. In short: with logistic regression you can do more with interactions than with seperate regression.

                Dear Maarten, thanks very much for your help. Thank you for your clarify.

                Best regards,
                wanhaiyou

                Comment


                • #9
                  Originally posted by Maarten Buis View Post
                  When we are talking about linear regression, the only difference is that the two separate regression estimate separate residual variances across groups, while using interactions you have one residual variance. The point estimates will be the same and if you do inference correctly the conclusions will also be the same. This difference no longer applies when you do logistic regression, as no residual variance is estimated (this is actually a big problem, but that is a different story as that problem applies equally to both ways of estimating the interaction effect).

                  The real advantage of interaction effects is that it is more flexible when you have multiple explanatory variables. By estimating seperate regressions you force an interaction between gender and all other explanatory variables, while with interaction effects you can choose which effect can change over gender and which remains constant. This is often very desirable as interaction effects tend to eat large amounts of statistical power. On a more mechanical level, with interaction effects (if you used the factor variable notation) you have all the power of margins, contrast, and marginsplot at your disposal. If you like your effects in the form of "the effect of x for men and the effect of x for women" (as you would get with seperate regressions) rather than "the effect of men and how much this effect is different for women" (the default for interaction effects) then you can look at http://maartenbuis.nl/publications/ref_cat.html or just use contrast. In short: with logistic regression you can do more with interactions than with seperate regression.
                  Dear Maarten, I have another question. You say that "The point estimates will be the same and if you do inference correctly the conclusions will also be the same."
                  What does it mean? Could you give me an example? For example,

                  Code:
                  webuse nlswork,clear
                  drop if race==3
                  reg ln_wage age if race==1
                  est store re1
                  reg ln_wage age if race==2
                  est store re2
                  suest re1 re2
                  test [re1_mean]age = [re2_mean]age
                  
                  tab race,gen(dum)
                  gen inter = dum1*age
                  reg ln_wage age inter
                  Thanks very much for your help.

                  Best regards,
                  wanhaiyou

                  Comment


                  • #10
                    Originally posted by Carlo Lazzaro View Post
                    wanhaiyou:
                    i agree with you.
                    I simply focussed on the last block of your message when I replied to your post#1 (probably because I was thinking about -regress-).
                    Hi Carlo, thank you for your help. I see.

                    Best regards,
                    wanhaiyou

                    Comment


                    • #11
                      In your example you omited the main effect of race. Moreover, you did not use the factor variable notation. The latter is not wrong, but it does prevent you from correctly using post-estimation commands like margins, contrast, and marginsplot. Here is a corrected example:

                      Code:
                      webuse nlswork,clear
                      gen byte black = race == 2 if race < 3
                      
                      // seperate analysis
                      reg ln_wage age if black == 0
                      est store re1
                      reg ln_wage age if black == 1
                      est store re2
                      suest re1 re2
                      test [re1_mean]age = [re2_mean]age
                      
                      // interaction effect, default specification
                      reg ln_wage c.age##i.black, vce(robust)
                      // the test behind the interaction effect corresponds to the test above
                      
                      // look at the separate effects of age for whites and blacks
                      margins, dydx(age) over(black)
                      
                      // interaction effect, alternative specification
                      // does the same thing as margins or separate analysis
                      reg ln_wage ibn.black c.age#i.black, vce(robust) hascons
                      test 0.black#c.age = 1.black#c.age
                      As you can see all three give the same result in this case. As I mentioned above (#5), this is only true if include an interaction effect with all other covariates. This is why I think of seperate analysis as a less flexible analysis strategy compared to interaction effects.
                      Last edited by Maarten Buis; 10 Jun 2015, 01:37.
                      ---------------------------------
                      Maarten L. Buis
                      University of Konstanz
                      Department of history and sociology
                      box 40
                      78457 Konstanz
                      Germany
                      http://www.maartenbuis.nl
                      ---------------------------------

                      Comment


                      • #12
                        wanhaiyou:
                        as far as the part of your code with interaction is concerned, I would write it a bit differently, using -fvvarlist-:
                        Code:
                        reg ln_wage c.age##i.race
                        Results are different from the ones obtained with your code, in that, creating interaction by hand, makes Stata losing memory of the variables included in the interaction.
                        Conversely, if you rely on -fvvarlist- for creating interaction and higher order terms, useful commands like -margins- and -marginsplot- (as Maarten pointed out) can be called easily, as in the following toy-example:
                        Code:
                        reg ln_wage c.age##i.race
                        margins race, dydx( age)
                        marginsplot, xdimension(race)
                        Kind regards,
                        Carlo
                        (Stata 18.0 SE)

                        Comment


                        • #13
                          Originally posted by Maarten Buis View Post
                          In your example you omited the main effect of race. Moreover, you did not use the factor variable notation. The latter is not wrong, but it does prevent you from correctly using post-estimation commands like margins, contrast, and marginsplot. Here is a corrected example:

                          Code:
                          webuse nlswork,clear
                          gen byte black = race == 2 if race < 3
                          
                          // seperate analysis
                          reg ln_wage age if black == 0
                          est store re1
                          reg ln_wage age if black == 1
                          est store re2
                          suest re1 re2
                          test [re1_mean]age = [re2_mean]age
                          
                          // interaction effect, default specification
                          reg ln_wage c.age##i.black, vce(robust)
                          // the test behind the interaction effect corresponds to the test above
                          
                          // look at the separate effects of age for whites and blacks
                          margins, dydx(age) over(black)
                          
                          // interaction effect, alternative specification
                          // does the same thing as margins or separate analysis
                          reg ln_wage ibn.black c.age#i.black, vce(robust) hascons
                          test 0.black#c.age = 1.black#c.age
                          As you can see all three give the same result in this case. As I mentioned above (#5), this is only true if include an interaction effect with all other covariates. This is why I think of seperate analysis as a less flexible analysis strategy compared to interaction effects.
                          Perfect answer! Thanks very much for your help, Maarten.

                          Best regards,
                          wanhaiyou

                          Comment


                          • #14
                            Originally posted by Carlo Lazzaro View Post
                            wanhaiyou:
                            as far as the part of your code with interaction is concerned, I would write it a bit differently, using -fvvarlist-:
                            Code:
                            reg ln_wage c.age##i.race
                            Results are different from the ones obtained with your code, in that, creating interaction by hand, makes Stata losing memory of the variables included in the interaction.
                            Conversely, if you rely on -fvvarlist- for creating interaction and higher order terms, useful commands like -margins- and -marginsplot- (as Maarten pointed out) can be called easily, as in the following toy-example:
                            Code:
                            reg ln_wage c.age##i.race
                            margins race, dydx( age)
                            marginsplot, xdimension(race)
                            Dear Carlo, thanks very much for your good suggestions. I see now.
                            Thanks again for your kindly.

                            Best regards,
                            wanhaiyou

                            Comment


                            • #15
                              Originally posted by Maarten Buis View Post
                              In your example you omited the main effect of race. Moreover, you did not use the factor variable notation. The latter is not wrong, but it does prevent you from correctly using post-estimation commands like margins, contrast, and marginsplot. Here is a corrected example:
                              Code:
                              webuse nlswork,cleargen byte black = race == 2 if race < 3
                              // seperate analysisreg ln_wage age if black == 0est store re1
                              reg ln_wage age if black == 1est store re2suest re1 re2
                              test [re1_mean]age = [re2_mean]age// interaction effect, default specification
                              reg ln_wage c.age##i.black, vce(robust)
                              // the test behind the interaction effect corresponds to the test above
                              // look at the separate effects of age for whites and blacks
                              margins, dydx(age) over(black)// interaction effect, alternative specification
                              // does the same thing as margins or separate analysis
                              reg ln_wage ibn.black c.age#i.black, vce(robust) hascons
                              test 0.black#c.age = 1.black#c.age
                              As you can see all three give the same result in this case. As I mentioned above (#5), this is only true if include an interaction effect with all other covariates. This is why I think of seperate analysis as a less flexible analysis strategy compared to interaction effects.
                              Hi,Maarten,
                              Following your codes above,I am running two method to test the difference between two groups.
                              However, I don't know why the results are different. The following is the codes used
                              Code:
                                sysuse auto,clear
                              gen index = 1 in 1/30
                              replace index = 0 in 31/l
                              reg price mpg if index==0
                              est store re1
                              reg price mpg if index==1
                              est store re2
                              suest re1 re2
                              test [re1_mean]mpg = [re2_mean]mpg
                              
                                   
                                      
                                    
                              reg price c.mpg##i.index,vce(robust)
                              test 0.index#c.mpg = 1.index#c.mpg
                              And the results
                              Code:
                                . test [re1_mean]mpg = [re2_mean]mpg
                              
                               ( 1)  [re1_mean]mpg - [re2_mean]mpg = 0
                              
                                         chi2(  1) =    8.52
                                       Prob > chi2 =    0.0035
                              Code:
                                
                              . test 0.index#c.mpg = 1.index#c.mpg
                              
                               ( 1)  0b.index#co.mpg - 1.index#c.mpg = 0
                              
                                     F(  1,    70) =    8.17
                                          Prob > F =    0.0056
                              I cannot find any differences between your codes and mine.
                              In addition, if there exist multiple independent variables, could you please me an example how to employ the interaction method?
                              Thanks very much
                              Best regards,
                              wanhaiyou



                              Comment

                              Working...
                              X