Differences between interaction and subgroup analysis

wanhaiyou

Join Date: May 2014

Posts: 130
#1

Differences between interaction and subgroup analysis

08 Jun 2015, 19:53

Hi,
I want to compare the effect of one variable (var1) on dependent variable across two subgroups (e.g. female and male).
One way is to run the same regression model separately for each group and test the differences between these two
groups using suest.

Another way is to run the regression model including the interaction term (var1*sex).

Which one we should use? or both? Thanks very much for input.

Best regards,
wanhaiyou
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17729
#2

08 Jun 2015, 23:56

wanhaiyou:
I would prefer interaction:

Code:

regress depvar c.var1##i.sex//assuming that -var1- is continuous

Kind regards,
Carlo
(Stata 19.0)
Comment
wanhaiyou

Join Date: May 2014

Posts: 130
#3

09 Jun 2015, 00:25

Originally posted by Carlo Lazzaro View Post

wanhaiyou:
I would prefer interaction:

Code:

regress depvar c.var1##i.sex//assuming that -var1- is continuous

Dear Carlo, thanks very much for your help. What are the main reasons that you prefer interaction?
Are there any differences between interaction and subgroup analysis?

Thanks very much for your clarify.

Best regards,
wanhaiyou
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17729
#4

09 Jun 2015, 00:37

wanhaiyou:
it's probably a matter of personal taste: I prefer interactions because -regress- allows using -robust- and -cluster- options for standard errors.
If you plan to use -suest-

Estimation should take place without the vce(robust) or vce(cluster clustvar)...

(please, see page 2239 of Stata 13.1 .pdf manual, -suest- entry).

Kind regards,
Carlo
(Stata 19.0)
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3466
#5

09 Jun 2015, 01:41

When we are talking about linear regression, the only difference is that the two separate regression estimate separate residual variances across groups, while using interactions you have one residual variance. The point estimates will be the same and if you do inference correctly the conclusions will also be the same. This difference no longer applies when you do logistic regression, as no residual variance is estimated (this is actually a big problem, but that is a different story as that problem applies equally to both ways of estimating the interaction effect).

The real advantage of interaction effects is that it is more flexible when you have multiple explanatory variables. By estimating seperate regressions you force an interaction between gender and all other explanatory variables, while with interaction effects you can choose which effect can change over gender and which remains constant. This is often very desirable as interaction effects tend to eat large amounts of statistical power. On a more mechanical level, with interaction effects (if you used the factor variable notation) you have all the power of margins, contrast, and marginsplot at your disposal. If you like your effects in the form of "the effect of x for men and the effect of x for women" (as you would get with seperate regressions) rather than "the effect of men and how much this effect is different for women" (the default for interaction effects) then you can look at http://maartenbuis.nl/publications/ref_cat.html or just use contrast. In short: with logistic regression you can do more with interactions than with seperate regression.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
wanhaiyou

Join Date: May 2014

Posts: 130
#6

09 Jun 2015, 07:57

Originally posted by Carlo Lazzaro View Post

wanhaiyou:
it's probably a matter of personal taste: I prefer interactions because -regress- allows using -robust- and -cluster- options for standard errors.
If you plan to use -suest- (please, see page 2239 of Stata 13.1 .pdf manual, -suest- entry).

Dear Carlo,thanks very much for your help.
As seen from Stata user guider, suest command also can report robust standard error. In the user guider of Stata 12, we can see that the following sentences
Typical applications of suest are tests for intramodel and cross-model hypotheses using test
or testnl, for example, a generalized Hausman specification test. lincom and nlcom may be used
after suest to estimate linear combinations and nonlinear functions of coefficients.

Code:

suest may also be used to adjust a standard VCE for clustering or survey design effects.

.......

Code:

The estimators should be estimated without vce(robust) or vce(cluster clustvar) options. suest returns the robust VCE, allows the vce(cluster clustvar) option, and automatically works with results from the svy prefix command (only for vce(linearized)).

Best regards,
wanhaiyou
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17729
#7

09 Jun 2015, 08:04

wanhaiyou:
i agree with you.
I simply focussed on the last block of your message when I replied to your post#1 (probably because I was thinking about -regress-).

Kind regards,
Carlo
(Stata 19.0)
Comment
wanhaiyou

Join Date: May 2014

Posts: 130
#8

09 Jun 2015, 08:04

Originally posted by Maarten Buis View Post

When we are talking about linear regression, the only difference is that the two separate regression estimate separate residual variances across groups, while using interactions you have one residual variance. The point estimates will be the same and if you do inference correctly the conclusions will also be the same. This difference no longer applies when you do logistic regression, as no residual variance is estimated (this is actually a big problem, but that is a different story as that problem applies equally to both ways of estimating the interaction effect).
The real advantage of interaction effects is that it is more flexible when you have multiple explanatory variables. By estimating seperate regressions you force an interaction between gender and all other explanatory variables, while with interaction effects you can choose which effect can change over gender and which remains constant. This is often very desirable as interaction effects tend to eat large amounts of statistical power. On a more mechanical level, with interaction effects (if you used the factor variable notation) you have all the power of margins, contrast, and marginsplot at your disposal. If you like your effects in the form of "the effect of x for men and the effect of x for women" (as you would get with seperate regressions) rather than "the effect of men and how much this effect is different for women" (the default for interaction effects) then you can look at http://maartenbuis.nl/publications/ref_cat.html or just use contrast. In short: with logistic regression you can do more with interactions than with seperate regression.

Dear Maarten, thanks very much for your help. Thank you for your clarify.

Best regards,
wanhaiyou
Comment
wanhaiyou

Join Date: May 2014

Posts: 130
#9

09 Jun 2015, 22:47

Originally posted by Maarten Buis View Post

When we are talking about linear regression, the only difference is that the two separate regression estimate separate residual variances across groups, while using interactions you have one residual variance. The point estimates will be the same and if you do inference correctly the conclusions will also be the same. This difference no longer applies when you do logistic regression, as no residual variance is estimated (this is actually a big problem, but that is a different story as that problem applies equally to both ways of estimating the interaction effect).

The real advantage of interaction effects is that it is more flexible when you have multiple explanatory variables. By estimating seperate regressions you force an interaction between gender and all other explanatory variables, while with interaction effects you can choose which effect can change over gender and which remains constant. This is often very desirable as interaction effects tend to eat large amounts of statistical power. On a more mechanical level, with interaction effects (if you used the factor variable notation) you have all the power of margins, contrast, and marginsplot at your disposal. If you like your effects in the form of "the effect of x for men and the effect of x for women" (as you would get with seperate regressions) rather than "the effect of men and how much this effect is different for women" (the default for interaction effects) then you can look at http://maartenbuis.nl/publications/ref_cat.html or just use contrast. In short: with logistic regression you can do more with interactions than with seperate regression.

Dear Maarten, I have another question. You say that "The point estimates will be the same and if you do inference correctly the conclusions will also be the same."
What does it mean? Could you give me an example? For example,

Code:

webuse nlswork,clear drop if race==3 reg ln_wage age if race==1 est store re1 reg ln_wage age if race==2 est store re2 suest re1 re2 test [re1_mean]age = [re2_mean]age tab race,gen(dum) gen inter = dum1*age reg ln_wage age inter

Thanks very much for your help.

Best regards,
wanhaiyou
Comment
wanhaiyou

Join Date: May 2014

Posts: 130
#10

09 Jun 2015, 22:48

Originally posted by Carlo Lazzaro View Post

wanhaiyou:
i agree with you.
I simply focussed on the last block of your message when I replied to your post#1 (probably because I was thinking about -regress-).

Hi Carlo, thank you for your help. I see.

Best regards,
wanhaiyou
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3466
#11

10 Jun 2015, 01:33

In your example you omited the main effect of race. Moreover, you did not use the factor variable notation. The latter is not wrong, but it does prevent you from correctly using post-estimation commands like margins, contrast, and marginsplot. Here is a corrected example:

Code:

webuse nlswork,clear gen byte black = race == 2 if race < 3 // seperate analysis reg ln_wage age if black == 0 est store re1 reg ln_wage age if black == 1 est store re2 suest re1 re2 test [re1_mean]age = [re2_mean]age // interaction effect, default specification reg ln_wage c.age##i.black, vce(robust) // the test behind the interaction effect corresponds to the test above // look at the separate effects of age for whites and blacks margins, dydx(age) over(black) // interaction effect, alternative specification // does the same thing as margins or separate analysis reg ln_wage ibn.black c.age#i.black, vce(robust) hascons test 0.black#c.age = 1.black#c.age

As you can see all three give the same result in this case. As I mentioned above (#5), this is only true if include an interaction effect with all other covariates. This is why I think of seperate analysis as a less flexible analysis strategy compared to interaction effects.

Last edited by Maarten Buis; 10 Jun 2015, 01:37.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17729
#12

10 Jun 2015, 01:48

wanhaiyou:
as far as the part of your code with interaction is concerned, I would write it a bit differently, using -fvvarlist-:

Code:

reg ln_wage c.age##i.race

Results are different from the ones obtained with your code, in that, creating interaction by hand, makes Stata losing memory of the variables included in the interaction.
Conversely, if you rely on -fvvarlist- for creating interaction and higher order terms, useful commands like -margins- and -marginsplot- (as Maarten pointed out) can be called easily, as in the following toy-example:

Code:

reg ln_wage c.age##i.race margins race, dydx( age) marginsplot, xdimension(race)

Kind regards,
Carlo
(Stata 19.0)
Comment
wanhaiyou

Join Date: May 2014

Posts: 130
#13

10 Jun 2015, 02:05

Originally posted by Maarten Buis View Post

In your example you omited the main effect of race. Moreover, you did not use the factor variable notation. The latter is not wrong, but it does prevent you from correctly using post-estimation commands like margins, contrast, and marginsplot. Here is a corrected example:

Code:

webuse nlswork,clear gen byte black = race == 2 if race < 3 // seperate analysis reg ln_wage age if black == 0 est store re1 reg ln_wage age if black == 1 est store re2 suest re1 re2 test [re1_mean]age = [re2_mean]age // interaction effect, default specification reg ln_wage c.age##i.black, vce(robust) // the test behind the interaction effect corresponds to the test above // look at the separate effects of age for whites and blacks margins, dydx(age) over(black) // interaction effect, alternative specification // does the same thing as margins or separate analysis reg ln_wage ibn.black c.age#i.black, vce(robust) hascons test 0.black#c.age = 1.black#c.age

As you can see all three give the same result in this case. As I mentioned above (#5), this is only true if include an interaction effect with all other covariates. This is why I think of seperate analysis as a less flexible analysis strategy compared to interaction effects.

Perfect answer! Thanks very much for your help, Maarten.

Best regards,
wanhaiyou
Comment
wanhaiyou

Join Date: May 2014

Posts: 130
#14

10 Jun 2015, 02:08

Originally posted by Carlo Lazzaro View Post

wanhaiyou:
as far as the part of your code with interaction is concerned, I would write it a bit differently, using -fvvarlist-:

Code:

reg ln_wage c.age##i.race

Results are different from the ones obtained with your code, in that, creating interaction by hand, makes Stata losing memory of the variables included in the interaction.
Conversely, if you rely on -fvvarlist- for creating interaction and higher order terms, useful commands like -margins- and -marginsplot- (as Maarten pointed out) can be called easily, as in the following toy-example:

Code:

reg ln_wage c.age##i.race margins race, dydx( age) marginsplot, xdimension(race)

Dear Carlo, thanks very much for your good suggestions. I see now.
Thanks again for your kindly.

Best regards,
wanhaiyou
Comment

wanhaiyou

Join Date: May 2014
Posts: 130

#15

06 Dec 2015, 00:41

Originally posted by Maarten Buis View Post

In your example you omited the main effect of race. Moreover, you did not use the factor variable notation. The latter is not wrong, but it does prevent you from correctly using post-estimation commands like margins, contrast, and marginsplot. Here is a corrected example:

Code:

webuse nlswork,cleargen byte black = race == 2 if race < 3
// seperate analysisreg ln_wage age if black == 0est store re1
reg ln_wage age if black == 1est store re2suest re1 re2
test [re1_mean]age = [re2_mean]age// interaction effect, default specification
reg ln_wage c.age##i.black, vce(robust)
// the test behind the interaction effect corresponds to the test above
// look at the separate effects of age for whites and blacks
margins, dydx(age) over(black)// interaction effect, alternative specification
// does the same thing as margins or separate analysis
reg ln_wage ibn.black c.age#i.black, vce(robust) hascons
test 0.black#c.age = 1.black#c.age

As you can see all three give the same result in this case. As I mentioned above (#5), this is only true if include an interaction effect with all other covariates. This is why I think of seperate analysis as a less flexible analysis strategy compared to interaction effects.

Hi,Maarten,
Following your codes above,I am running two method to test the difference between two groups.
However, I don't know why the results are different. The following is the codes used

Code:

  sysuse auto,clear
gen index = 1 in 1/30
replace index = 0 in 31/l
reg price mpg if index==0
est store re1
reg price mpg if index==1
est store re2
suest re1 re2
test [re1_mean]mpg = [re2_mean]mpg

     
        
      
reg price c.mpg##i.index,vce(robust)
test 0.index#c.mpg = 1.index#c.mpg

And the results

Code:

  . test [re1_mean]mpg = [re2_mean]mpg

 ( 1)  [re1_mean]mpg - [re2_mean]mpg = 0

           chi2(  1) =    8.52
         Prob > chi2 =    0.0035

Code:

  
. test 0.index#c.mpg = 1.index#c.mpg

 ( 1)  0b.index#co.mpg - 1.index#c.mpg = 0

       F(  1,    70) =    8.17
            Prob > F =    0.0056

I cannot find any differences between your codes and mine.
In addition, if there exist multiple independent variables, could you please me an example how to employ the interaction method?
Thanks very much
Best regards,
wanhaiyou

Announcement

Differences between interaction and subgroup analysis

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment