Compare regression coefficients across different subsamples

Francesco Firrincieli

Join Date: Feb 2018

Posts: 9
#1

Compare regression coefficients across different subsamples

09 Feb 2018, 09:19

Dear Statalisters,

I am new to this forum and I looking for help because after hours of searching the internet I am confused about a question that I will now expose.

I am running a regression in order to understand the impact of a specific characteristics of the investor, rapresented by the dummy variable INDIP, on the invested company's performance (DIPV), controlling for other factors. One of these factors is, let's say, another dummy variable CONTR.

I want to understand whether the impact of INDIP on DIPV is different among the two subgroups, identified by setting CONTR=0 and CONTR=1.

I therefore run two identical regressions on the two subsamples, in order to compare the coefficients obtained for the INDIP variable (both significant and positive).

My questions are:
-1 Is this the proper way of dealing with the problem of comparing the different impact of a dummy variable among two indipendent subgroups?

-2 What is the proper statistical test to evaluate whether the difference between the two coefficients is significantly different from 0?

I found that a Z test costructed as follows could be a solution:
Z=(b1-b2)/(SEb1^2+SEb2^2)^1/2,
where b1 and b2 are the coefficients, and SEb1 and SEb2 are the respective standard errors of the regression.

- I run a robust regression, is it correct to use such standard errors?
- Should it be a t-test, theorically speaking, since the variance of the population is unknown? If yes, how can I run such a test on STATA?

Thank you for the support and I'm sorry for my little comprehension on this matter.
Tags: None
Bruce Weaver

Join Date: May 2014

Posts: 1121
#2

09 Feb 2018, 09:38

Hi Francesco. The first example on this UCLA webpage should be helpful. As it shows, the coefficient for the interaction between a dichotomous explanatory variable and a continuous explanatory variable shows the difference between the two slopes.

HTH.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
Comment
Francesco Firrincieli

Join Date: Feb 2018

Posts: 9
#3

09 Feb 2018, 10:01

Hello Bruce, thank you for you help,

I understand now that using an interaction term could be useful. Regarding this approach:
-Can I use it even if both variables are dummies?
-What are the pros and cons of using this approach compared to the "regression on subsamples" approach? Is the latter theorically wrong?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17683
#4

09 Feb 2018, 10:31

Francesco:
1) yes, you can interact categorical variables (see -help fvvarlis- for more details);
2) -fvvarlist- notation has the enormous benefit of tight relationships with -margins- and -marginsplot- (see related help file for more details).

Kind regards,
Carlo
(Stata 19.0)
Comment
Francesco Firrincieli

Join Date: Feb 2018

Posts: 9
#5

09 Feb 2018, 12:28

Thank you Carlo,

regarding the first approach, comparing the coefficients for the INDIP variable in the regressions identified for the two subsamples (Contr=0 and Contr=1) makes no sense?
Comment

Bruce Weaver

Join Date: May 2014
Posts: 1121

09 Feb 2018, 15:42

Francesco, when both variables are dichotomous, the coefficient for the interaction term is equal to what some folks call the difference in differences. Try the following example to see what I mean.

Code:

clear *
sysuse auto
tab rep78
keep if inrange(rep78,3,4) // make rep78 dichotomous
tab rep78

regress mpg foreign##rep78
local int = _b[1.foreign#4.rep78]
regress mpg foreign if rep78==3, noheader
local f3 = _b[foreign]
regress mpg foreign if rep78==4, noheader
local f4 = _b[foreign]
display as text "Difference in differences = " as result `f4'-`f3'
display as text "Coefficient for the interaction = " as result `int'

Here is my output from the two -display- commands at the end.

Code:

. display as text "Difference in differences = " as result `f4'-`f3'
Difference in differences = 2.1111111

. display as text "Coefficient for the interaction = " as result `int'
Coefficient for the interaction = 2.1111111

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)

Comment

Francesco Firrincieli

Join Date: Feb 2018

Posts: 9
#7

09 Feb 2018, 16:49

Bruce,
to make sure I understood the example: the coefficient for the interaction term in the total sample should be equal to the difference between the coefficients obtained for the considered dummy in the regressions on the two subsamples.
However, this does not apply to my regressions, maybe because I also include other variables in the analysis.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1121
#8

09 Feb 2018, 16:55

However, this does not apply to my regressions, maybe because I also include other variables in the analysis.

Do you mean that both models include (for example) A, B and C in addition to the two interacting variables, or that one model includes A, B and C while the other includes X, Y and Z?

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
Comment
Francesco Firrincieli

Join Date: Feb 2018

Posts: 9
#9

10 Feb 2018, 01:38

The models applied on the subsamples are the same, as they include the same "other" control variables, say A, B, and C, in addition to CONTR and INDIP.

Let me explain:

MODEL1 (reg DIPV INDIP A B C if CONTR=0, vce(robust)) -> INDIP coefficient = 5
MODEL2 (reg DIPV INDIP A B C if CONTR=1, vce(robust)) -> INDIP coefficient = 8
MODEL3 (reg DIPV INDIP A B C CONTR INDIP##CONTR, vce(robust)) -> INDIP##CONTR = 2 (while it should be, according to the example above, equal to 3, the difference of coefficients in models 1 and 2)

Last edited by Francesco Firrincieli; 10 Feb 2018, 01:49.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1121
#10

10 Feb 2018, 06:33

It would help, I think, if you posted a reproducible example. See item 12 in the FAQ for details about using -dataex- and posting your exact Stata commands between CODE delimiters.

Meanwhile, you could try this:

Code:

regress DIPV c.INDIP##c.CONTR A B C, vce(robust) generate byte c0 = e(sample) & CONTR==0 generate byte c1 = e(sample) & CONTR==1 regress DIPV INDIP A B C if c0, vce(robust) regress DIPV INDIP A B C if c1, vce(robust)

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
Comment
Francesco Firrincieli

Join Date: Feb 2018

Posts: 9
#11

10 Feb 2018, 09:04

Thank you Bruce,
using your code I get exactly the same regression outputs, with a coefficient for interaction which is different from the difference in differences.

Anyway, the most important questions now are (since I built my entire study on the regressions on the subsamples, and the deadline is approaching):
- is it WRONG to use this method when I have the same variables for both regressions?
- can I someway justify this approach, compared to the one with the interaction term?
- using this approach, is there a way to compare the coefficients and understand whether a significant difference exists between them?

Thank you for your kind support, it is really appreciated!
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17683

#12

10 Feb 2018, 09:49

Francesco:
despite I do prefer the interaction approach, Stata offers a way (Chow's test) to calculate what (I think) you're after via -suest-, as you can see from the folowing toy-example:

Code:

. use "C:\Program Files (x86)\Stata15\ado\base\a\auto.dta"
(1978 Automobile Data)

. regress price mpg foreign if foreign==0
note: foreign omitted because of collinearity

      Source |       SS           df       MS      Number of obs   =        52
-------------+----------------------------------   F(1, 50)        =     17.05
       Model |   124392956         1   124392956   Prob > F        =    0.0001
    Residual |   364801844        50  7296036.89   R-squared       =    0.2543
-------------+----------------------------------   Adj R-squared   =    0.2394
       Total |   489194801        51  9592054.92   Root MSE        =    2701.1

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -329.2551   79.74034    -4.13   0.000    -489.4183   -169.0919
     foreign |          0  (omitted)
       _cons |   12600.54   1624.773     7.76   0.000     9337.085    15863.99
------------------------------------------------------------------------------

. estimates store A

. regress price mpg foreign if foreign==1
note: foreign omitted because of collinearity

      Source |       SS           df       MS      Number of obs   =        22
-------------+----------------------------------   F(1, 20)        =     13.25
       Model |  57534941.7         1  57534941.7   Prob > F        =    0.0016
    Residual |  86828271.1        20  4341413.55   R-squared       =    0.3985
-------------+----------------------------------   Adj R-squared   =    0.3685
       Total |   144363213        21   6874438.7   Root MSE        =    2083.6

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -250.3668   68.77435    -3.64   0.002    -393.8276    -106.906
     foreign |          0  (omitted)
       _cons |   12586.95   1760.689     7.15   0.000     8914.217    16259.68
------------------------------------------------------------------------------

. estimates store B

. suest A B

Simultaneous results for A, B

                                                Number of obs     =         74

------------------------------------------------------------------------------
             |               Robust
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
A_mean       |
         mpg |  -329.2551   80.16093    -4.11   0.000    -486.3676   -172.1425
     foreign |          0  (omitted)
       _cons |   12600.54   1755.108     7.18   0.000     9160.589    16040.49
-------------+----------------------------------------------------------------
A_lnvar      |
       _cons |   15.80284   .2986031    52.92   0.000     15.21759    16.38809
-------------+----------------------------------------------------------------
B_mean       |
         mpg |  -250.3668   84.69387    -2.96   0.003    -416.3637   -84.36987
     foreign |          0  (omitted)
       _cons |   12586.95   2258.417     5.57   0.000     8160.534    17013.37
-------------+----------------------------------------------------------------
B_lnvar      |
       _cons |   15.28371   .2310235    66.16   0.000     14.83091    15.73651
------------------------------------------------------------------------------

. test [A_mean = B_mean]

 ( 1)  [A_mean]mpg - [B_mean]mpg = 0
 ( 2)  [A_mean]o.foreign - [B_mean]o.foreign = 0
       Constraint 2 dropped

           chi2(  1) =    0.46
         Prob > chi2 =    0.4987

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Francesco Firrincieli

Join Date: Feb 2018

Posts: 9
#13

10 Feb 2018, 10:00

Carlo,
thank you for your suggestion.
However, as I understand the Chow's Test regards the equality of ALL the coefficients in the two regressions.
In my case, I am only interested in analyzing the difference between the 2 coefficients of the INDIP variable, desregarding the A B C variables.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17683
#14

10 Feb 2018, 10:31

Francesco:
do you mean something along the following lines?

Code:

. test [A_mean]mpg - [B_mean]mpg = 0 ( 1) [A_mean]mpg - [B_mean]mpg = 0 chi2( 1) = 0.46 Prob > chi2 = 0.4987

Kind regards,
Carlo
(Stata 19.0)
Comment
Francesco Firrincieli

Join Date: Feb 2018

Posts: 9
#15

10 Feb 2018, 11:19

If that test compares only the coefficients of mpg obtained in A and B, and excludes all the other variables (which are not present in this example), then YES. If P-value < 0,1 I can conclude that the coefficients are not equal, right?
Comment

Announcement