Equality of regression coefficients when estimating regressions with same covariates and different outcomes

Stefano Lombardi started a topic Equality of regression coefficients when estimating regressions with same covariates and different outcomes

19 Dec 2019, 06:50
Equality of regression coefficients when estimating regressions with same covariates and different outcomes
Consider a setting where the RHS of the two equations is identical, what is different is the LHS outcome:

Regression 1:

Code:

\[ y = a + bX + e \]

Regresion 2:

Code:

\[ y_2 = a_2 + b_2 X + e_2 \]

The goal is to test the equality of the coefficients on X in the two regressions. By browsing on the internet I found two options.

Option 1 is to use suest, as for instance explained here: https://stats.idre.ucla.edu/stata/co...s-using-suest/
Option 2 is to manually stack the data (one dataset copy per regression), creating a dataset dummy variable, and estimating a model where X is interacted with the dummy. This is explained here: https://www.stata.com/support/faqs/s...-coefficients/

In my "real life" application, I am using -reghdfe- since I also need to absorb a large number of fixed effects. Since the package does not support suest, I am trying to implement option 2.

When I do not "absorb" fixed effects I have no problems, in the sense that on top of running the test of interest I am also able to retrieve the coefficients b and b_2 (and the two intercepts) from the stacked regression. These are identical to those obtained when separately estimating the two original regressions.
However, when I absorb fixed effects this result doesn't hold anymore: the coefficients from the stacked regression are different from those obtained from the two separate regressions. I believe this is due to the fact that de-meaning the two separate models is different than de-meaning the stacked one.

Is it a right approach in this setting to first separately de-mean the two datasets, then stack the de-meaned datasets, and finally apply "Option 2"? My problem with this is that standard errors will be wrongly calculated since we have to take into account a preliminary estimation step. Do you know of alternative approaches to reach the same goal?
Last edited by Stefano Lombardi; 19 Dec 2019, 06:52.
Tags: fixed effects, test
Stefano Lombardi replied

19 Dec 2019, 14:46
Excellent, thanks!
Stefano
Leave a comment:

Andrew Musau replied

19 Dec 2019, 14:41

Thanks for the data example. Here is the approach.

Code:

sysuse auto, clear
*RUN SEPARATE REGRESSIONS
reghdfe price foreign, a(rep78)
reghdfe gear_ratio foreign, a(rep78)
*RESTRUCTURE DATA
rename (price gear_ratio) var#, addnumber(1)
reshape long var, i(make) j(group)
lab def group 1 "price" 2 "gear_ratio"
lab values group group
*GENERATE GROUP INDICATORS
gen gr1= 1.group
gen gr2= 2.group
*RUN JOINT REGRESSION WITH ROBUST STD. ERRORS
*CROSS-SECTIONAL LINEAR MODEL, ROBUST SE = CLUSTER BY OBSERVATION
gen obs=_n
reghdfe var c.gr1#(c.foreign) c.gr2#(c.foreign), a(i.rep78#c.gr1 i.rep78#c.gr2) cluster(obs)
test c.gr1#c.foreign= c.gr2#c.foreign

Res.:

Code:

.
. *RUN SEPARATE REGRESSIONS

.
. reghdfe price foreign, a(rep78)
(MWFE estimator converged in 1 iterations)

HDFE Linear regression                            Number of obs   =         69
Absorbing 1 HDFE group                            F(   1,     63) =       0.00
                                                  Prob > F        =     0.9711
                                                  R-squared       =     0.0145
                                                  Adj R-squared   =    -0.0637
                                                  Within R-sq.    =     0.0000
                                                  Root MSE        =  3003.7661

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     foreign |    36.7572   1010.484     0.04   0.971    -1982.533    2056.048
       _cons |   6134.857   474.7025    12.92   0.000     5186.239    7083.474
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
       rep78 |         5           0           5     |
-----------------------------------------------------+

.
. reghdfe gear_ratio foreign, a(rep78)
(MWFE estimator converged in 1 iterations)

HDFE Linear regression                            Number of obs   =         69
Absorbing 1 HDFE group                            F(   1,     63) =      47.05
                                                  Prob > F        =     0.0000
                                                  R-squared       =     0.5409
                                                  Adj R-squared   =     0.5045
                                                  Within R-sq.    =     0.4276
                                                  Root MSE        =     0.3257

------------------------------------------------------------------------------
  gear_ratio |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     foreign |   .7515638   .1095635     6.86   0.000     .5326186    .9705089
       _cons |   2.770539   .0514704    53.83   0.000     2.667683    2.873394
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
       rep78 |         5           0           5     |
-----------------------------------------------------+

. *RUN JOINT REGRESSION WITH ROBUST STD. ERRORS

. reghdfe var c.gr1#(c.foreign) c.gr2#(c.foreign), a(i.rep78#c.gr1 i.rep78#c.gr2) cluster(obs)
(warning: no intercepts terms in absorb(); regression lacks constant term)
(MWFE estimator converged in 2 iterations)

HDFE Linear regression                            Number of obs   =        138
Absorbing 2 HDFE groups                           F(   2,    126) =      22.69
Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                  R-squared       =     0.8214
                                                  Adj R-squared   =     0.8044
                                                  Within R-sq.    =     0.0000
Number of clusters (obs)     =        138         Root MSE        =  2123.9834

                                     (Std. Err. adjusted for 138 clusters in obs)
---------------------------------------------------------------------------------
                |               Robust
            var |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
c.gr1#c.foreign |    36.7572   658.8838     0.06   0.956    -1267.154    1340.669
                |
c.gr2#c.foreign |   .7515638   .1115713     6.74   0.000     .5307675      .97236
---------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
 rep78#c.gr1 |         5           0           5    ?|
 rep78#c.gr2 |         5           0           5    ?|
-----------------------------------------------------+
? = number of redundant parameters may be higher

. test c.gr1#c.foreign= c.gr2#c.foreign

 ( 1)  c.gr1#c.foreign - c.gr2#c.foreign = 0

       F(  1,   126) =    0.00
            Prob > F =    0.9565

Last edited by Andrew Musau; 19 Dec 2019, 14:43.

Leave a comment:

Stefano Lombardi replied

19 Dec 2019, 10:11

Hi,

thank you to both for the replies.

Here's a minimal working example for the "stacked approach", without using FEs (a replication of the second link I posted).
The variable we're interested in is foreign.
I should mention that, of course, using reghdfe or manual de-meaning shouldn't affect the results (I use the package since it simplifies things when absorbing FEs).

Code:

cls
clear all
sysuse auto, clear


// Baseline estimates

reghdfe price foreign, noabsorb
reghdfe gear_ratio foreign, noabsorb


// Generate stacked datasets

loc iter = 1
foreach y in price gear_ratio {

    preserve
        rename `y' y
        gen dataset = "`y'"
        keep dataset y foreign
        if (`iter' > 1) append using "dat_appended"
        qui save "dat_appended", replace
    restore    
        
    loc iter = `iter' + 1
}

use "dat_appended", clear
encode dataset, generate(dataset_num)
save "dat_appended", replace


// Check regressions separately

reghdfe y foreign if dataset_num == 1, noabsorb
loc b1    = _b[foreign]  // store for inspection below
loc cons1 = _b[_cons]
reghdfe y foreign if dataset_num == 2, noabsorb
loc b2    = _b[foreign]
loc cons2 = _b[_cons]


// Stacked regression and test if coef of foreign is different across reg1 and reg2

* stacked regression
reghdfe y i.dataset_num##(i.foreign), noabsorb

* test foreign "across reg1 and reg2"
test _b[2.dataset_num] = 0, notest
test _b[2.dataset_num#1.foreign] = 0, accum  // the test of interest

* Check that coefficients in stacked regression are same as those in separate ones
di _n _b[_cons] _n `cons1'    
assert float(_b[_cons])     == float(`cons1')                             // reg1, constant
di _b[1.foreign] _n `b1'    
assert float(_b[1.foreign]) == float(`b1')                                // reg1, foreign
di _n _b[_cons] + _b[2.dataset_num] _n `cons2'    
assert float(_b[_cons] + _b[2.dataset_num]) == float(`cons2')             // reg2, constant
di _b[1.foreign] + _b[2.dataset_num#1.foreign] _n `b2'
assert float(_b[1.foreign] + _b[2.dataset_num#1.foreign]) == float(`b2')  // reg2, foreign  


cap erase "dat_appended.dta"

*-------------------------------------------------------------------------------
*-------------------------------------------------------------------------------

The code shows that when not adding FEs we get exactly the same point estimates in the two "baseline" regressions and from the stacked regression.
However, if you try to replace all noabsorb with, for instance, absorb(rep78), this doesn't hold anymore.

Note that this can be easily extended when having additional regressors on the RHS of both models, as long as the additional regressors are mean-centered before interacting them with foreign (I tried this).
This means that one could additionally control for the FE dummies, but I don't want to do so, since I am not interested in estimating them/testing their equality across regressions.

Best wishes,
S.

Last edited by Stefano Lombardi; 19 Dec 2019, 10:30.

Announcement