Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Equality of regression coefficients when estimating regressions with same covariates and different outcomes

    Consider a setting where the RHS of the two equations is identical, what is different is the LHS outcome:

    Regression 1:
    Code:
    \[ y = a + bX + e \]
    Regresion 2:
    Code:
    \[ y_2 = a_2 + b_2 X + e_2 \]

    The goal is to test the equality of the coefficients on X in the two regressions. By browsing on the internet I found two options.

    Option 1 is to use suest, as for instance explained here: https://stats.idre.ucla.edu/stata/co...s-using-suest/
    Option 2 is to manually stack the data (one dataset copy per regression), creating a dataset dummy variable, and estimating a model where X is interacted with the dummy. This is explained here: https://www.stata.com/support/faqs/s...-coefficients/

    In my "real life" application, I am using -reghdfe- since I also need to absorb a large number of fixed effects. Since the package does not support suest, I am trying to implement option 2.

    When I do not "absorb" fixed effects I have no problems, in the sense that on top of running the test of interest I am also able to retrieve the coefficients b and b_2 (and the two intercepts) from the stacked regression. These are identical to those obtained when separately estimating the two original regressions.
    However, when I absorb fixed effects this result doesn't hold anymore: the coefficients from the stacked regression are different from those obtained from the two separate regressions. I believe this is due to the fact that de-meaning the two separate models is different than de-meaning the stacked one.

    Is it a right approach in this setting to first separately de-mean the two datasets, then stack the de-meaned datasets, and finally apply "Option 2"? My problem with this is that standard errors will be wrongly calculated since we have to take into account a preliminary estimation step. Do you know of alternative approaches to reach the same goal?
    Last edited by Stefano Lombardi; 19 Dec 2019, 06:52.

  • Stefano Lombardi
    replied
    Excellent, thanks!
    Stefano

    Leave a comment:


  • Andrew Musau
    replied
    Thanks for the data example. Here is the approach.

    Code:
    sysuse auto, clear
    *RUN SEPARATE REGRESSIONS
    reghdfe price foreign, a(rep78)
    reghdfe gear_ratio foreign, a(rep78)
    *RESTRUCTURE DATA
    rename (price gear_ratio) var#, addnumber(1)
    reshape long var, i(make) j(group)
    lab def group 1 "price" 2 "gear_ratio"
    lab values group group
    *GENERATE GROUP INDICATORS
    gen gr1= 1.group
    gen gr2= 2.group
    *RUN JOINT REGRESSION WITH ROBUST STD. ERRORS
    *CROSS-SECTIONAL LINEAR MODEL, ROBUST SE = CLUSTER BY OBSERVATION
    gen obs=_n
    reghdfe var c.gr1#(c.foreign) c.gr2#(c.foreign), a(i.rep78#c.gr1 i.rep78#c.gr2) cluster(obs)
    test c.gr1#c.foreign= c.gr2#c.foreign
    Res.:

    Code:
    .
    . *RUN SEPARATE REGRESSIONS
    
    .
    . reghdfe price foreign, a(rep78)
    (MWFE estimator converged in 1 iterations)
    
    HDFE Linear regression                            Number of obs   =         69
    Absorbing 1 HDFE group                            F(   1,     63) =       0.00
                                                      Prob > F        =     0.9711
                                                      R-squared       =     0.0145
                                                      Adj R-squared   =    -0.0637
                                                      Within R-sq.    =     0.0000
                                                      Root MSE        =  3003.7661
    
    ------------------------------------------------------------------------------
           price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
         foreign |    36.7572   1010.484     0.04   0.971    -1982.533    2056.048
           _cons |   6134.857   474.7025    12.92   0.000     5186.239    7083.474
    ------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    -----------------------------------------------------+
     Absorbed FE | Categories  - Redundant  = Num. Coefs |
    -------------+---------------------------------------|
           rep78 |         5           0           5     |
    -----------------------------------------------------+
    
    .
    . reghdfe gear_ratio foreign, a(rep78)
    (MWFE estimator converged in 1 iterations)
    
    HDFE Linear regression                            Number of obs   =         69
    Absorbing 1 HDFE group                            F(   1,     63) =      47.05
                                                      Prob > F        =     0.0000
                                                      R-squared       =     0.5409
                                                      Adj R-squared   =     0.5045
                                                      Within R-sq.    =     0.4276
                                                      Root MSE        =     0.3257
    
    ------------------------------------------------------------------------------
      gear_ratio |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
         foreign |   .7515638   .1095635     6.86   0.000     .5326186    .9705089
           _cons |   2.770539   .0514704    53.83   0.000     2.667683    2.873394
    ------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    -----------------------------------------------------+
     Absorbed FE | Categories  - Redundant  = Num. Coefs |
    -------------+---------------------------------------|
           rep78 |         5           0           5     |
    -----------------------------------------------------+
    
    . *RUN JOINT REGRESSION WITH ROBUST STD. ERRORS
    
    . reghdfe var c.gr1#(c.foreign) c.gr2#(c.foreign), a(i.rep78#c.gr1 i.rep78#c.gr2) cluster(obs)
    (warning: no intercepts terms in absorb(); regression lacks constant term)
    (MWFE estimator converged in 2 iterations)
    
    HDFE Linear regression                            Number of obs   =        138
    Absorbing 2 HDFE groups                           F(   2,    126) =      22.69
    Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                      R-squared       =     0.8214
                                                      Adj R-squared   =     0.8044
                                                      Within R-sq.    =     0.0000
    Number of clusters (obs)     =        138         Root MSE        =  2123.9834
    
                                         (Std. Err. adjusted for 138 clusters in obs)
    ---------------------------------------------------------------------------------
                    |               Robust
                var |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ----------------+----------------------------------------------------------------
    c.gr1#c.foreign |    36.7572   658.8838     0.06   0.956    -1267.154    1340.669
                    |
    c.gr2#c.foreign |   .7515638   .1115713     6.74   0.000     .5307675      .97236
    ---------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    -----------------------------------------------------+
     Absorbed FE | Categories  - Redundant  = Num. Coefs |
    -------------+---------------------------------------|
     rep78#c.gr1 |         5           0           5    ?|
     rep78#c.gr2 |         5           0           5    ?|
    -----------------------------------------------------+
    ? = number of redundant parameters may be higher
    
    . test c.gr1#c.foreign= c.gr2#c.foreign
    
     ( 1)  c.gr1#c.foreign - c.gr2#c.foreign = 0
    
           F(  1,   126) =    0.00
                Prob > F =    0.9565
    Last edited by Andrew Musau; 19 Dec 2019, 14:43.

    Leave a comment:


  • Stefano Lombardi
    replied
    Hi,

    thank you to both for the replies.

    Here's a minimal working example for the "stacked approach", without using FEs (a replication of the second link I posted).
    The variable we're interested in is foreign.
    I should mention that, of course, using reghdfe or manual de-meaning shouldn't affect the results (I use the package since it simplifies things when absorbing FEs).

    Code:
    cls
    clear all
    sysuse auto, clear
    
    
    // Baseline estimates
    
    reghdfe price foreign, noabsorb
    reghdfe gear_ratio foreign, noabsorb
    
    
    // Generate stacked datasets
    
    loc iter = 1
    foreach y in price gear_ratio {
    
        preserve
            rename `y' y
            gen dataset = "`y'"
            keep dataset y foreign
            if (`iter' > 1) append using "dat_appended"
            qui save "dat_appended", replace
        restore    
            
        loc iter = `iter' + 1
    }
    
    use "dat_appended", clear
    encode dataset, generate(dataset_num)
    save "dat_appended", replace
    
    
    // Check regressions separately
    
    reghdfe y foreign if dataset_num == 1, noabsorb
    loc b1    = _b[foreign]  // store for inspection below
    loc cons1 = _b[_cons]
    reghdfe y foreign if dataset_num == 2, noabsorb
    loc b2    = _b[foreign]
    loc cons2 = _b[_cons]
    
    
    // Stacked regression and test if coef of foreign is different across reg1 and reg2
    
    * stacked regression
    reghdfe y i.dataset_num##(i.foreign), noabsorb
    
    * test foreign "across reg1 and reg2"
    test _b[2.dataset_num] = 0, notest
    test _b[2.dataset_num#1.foreign] = 0, accum  // the test of interest
    
    * Check that coefficients in stacked regression are same as those in separate ones
    di _n _b[_cons] _n `cons1'    
    assert float(_b[_cons])     == float(`cons1')                             // reg1, constant
    di _b[1.foreign] _n `b1'    
    assert float(_b[1.foreign]) == float(`b1')                                // reg1, foreign
    di _n _b[_cons] + _b[2.dataset_num] _n `cons2'    
    assert float(_b[_cons] + _b[2.dataset_num]) == float(`cons2')             // reg2, constant
    di _b[1.foreign] + _b[2.dataset_num#1.foreign] _n `b2'
    assert float(_b[1.foreign] + _b[2.dataset_num#1.foreign]) == float(`b2')  // reg2, foreign  
    
    
    cap erase "dat_appended.dta"
    
    *-------------------------------------------------------------------------------
    *-------------------------------------------------------------------------------
    The code shows that when not adding FEs we get exactly the same point estimates in the two "baseline" regressions and from the stacked regression.
    However, if you try to replace all noabsorb with, for instance, absorb(rep78), this doesn't hold anymore.

    Note that this can be easily extended when having additional regressors on the RHS of both models, as long as the additional regressors are mean-centered before interacting them with foreign (I tried this).
    This means that one could additionally control for the FE dummies, but I don't want to do so, since I am not interested in estimating them/testing their equality across regressions.

    Best wishes,
    S.
    Last edited by Stefano Lombardi; 19 Dec 2019, 10:30.

    Leave a comment:


  • Jeff Wooldridge
    replied
    Without knowing the context it’s hard to know what to recommend. It seems peculiar to want to impose but not test whether the fixed effects coefficients are the same.

    With a lot of FEs and relatively few observations per group, there is no two-stage estimation error. It’s just demeaning. I would think, though, that you’d want to cluster your standard errors, but I’d need to know more.

    Leave a comment:


  • Andrew Musau
    replied
    However, when I absorb fixed effects this result doesn't hold anymore: the coefficients from the stacked regression are different from those obtained from the two separate regressions.
    I am almost certain that this results from how you specify your interactions and the absorbed variables. If you enclose your data set and the separate regressions commands (or a reproducible example), I can illustrate how you should do this.
    Last edited by Andrew Musau; 19 Dec 2019, 09:02.

    Leave a comment:

Working...
X