Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Equality of regression coefficients when estimating regressions with same covariates and different outcomes

    Consider a setting where the RHS of the two equations is identical, what is different is the LHS outcome:

    Regression 1:
    Code:
    \[ y = a + bX + e \]
    Regresion 2:
    Code:
    \[ y_2 = a_2 + b_2 X + e_2 \]

    The goal is to test the equality of the coefficients on X in the two regressions. By browsing on the internet I found two options.

    Option 1 is to use suest, as for instance explained here: https://stats.idre.ucla.edu/stata/co...s-using-suest/
    Option 2 is to manually stack the data (one dataset copy per regression), creating a dataset dummy variable, and estimating a model where X is interacted with the dummy. This is explained here: https://www.stata.com/support/faqs/s...-coefficients/

    In my "real life" application, I am using -reghdfe- since I also need to absorb a large number of fixed effects. Since the package does not support suest, I am trying to implement option 2.

    When I do not "absorb" fixed effects I have no problems, in the sense that on top of running the test of interest I am also able to retrieve the coefficients b and b_2 (and the two intercepts) from the stacked regression. These are identical to those obtained when separately estimating the two original regressions.
    However, when I absorb fixed effects this result doesn't hold anymore: the coefficients from the stacked regression are different from those obtained from the two separate regressions. I believe this is due to the fact that de-meaning the two separate models is different than de-meaning the stacked one.

    Is it a right approach in this setting to first separately de-mean the two datasets, then stack the de-meaned datasets, and finally apply "Option 2"? My problem with this is that standard errors will be wrongly calculated since we have to take into account a preliminary estimation step. Do you know of alternative approaches to reach the same goal?
    Last edited by Stefano Lombardi; 19 Dec 2019, 06:52.

  • #2
    However, when I absorb fixed effects this result doesn't hold anymore: the coefficients from the stacked regression are different from those obtained from the two separate regressions.
    I am almost certain that this results from how you specify your interactions and the absorbed variables. If you enclose your data set and the separate regressions commands (or a reproducible example), I can illustrate how you should do this.
    Last edited by Andrew Musau; 19 Dec 2019, 09:02.

    Comment


    • #3
      Without knowing the context it’s hard to know what to recommend. It seems peculiar to want to impose but not test whether the fixed effects coefficients are the same.

      With a lot of FEs and relatively few observations per group, there is no two-stage estimation error. It’s just demeaning. I would think, though, that you’d want to cluster your standard errors, but I’d need to know more.

      Comment


      • #4
        Hi,

        thank you to both for the replies.

        Here's a minimal working example for the "stacked approach", without using FEs (a replication of the second link I posted).
        The variable we're interested in is foreign.
        I should mention that, of course, using reghdfe or manual de-meaning shouldn't affect the results (I use the package since it simplifies things when absorbing FEs).

        Code:
        cls
        clear all
        sysuse auto, clear
        
        
        // Baseline estimates
        
        reghdfe price foreign, noabsorb
        reghdfe gear_ratio foreign, noabsorb
        
        
        // Generate stacked datasets
        
        loc iter = 1
        foreach y in price gear_ratio {
        
            preserve
                rename `y' y
                gen dataset = "`y'"
                keep dataset y foreign
                if (`iter' > 1) append using "dat_appended"
                qui save "dat_appended", replace
            restore    
                
            loc iter = `iter' + 1
        }
        
        use "dat_appended", clear
        encode dataset, generate(dataset_num)
        save "dat_appended", replace
        
        
        // Check regressions separately
        
        reghdfe y foreign if dataset_num == 1, noabsorb
        loc b1    = _b[foreign]  // store for inspection below
        loc cons1 = _b[_cons]
        reghdfe y foreign if dataset_num == 2, noabsorb
        loc b2    = _b[foreign]
        loc cons2 = _b[_cons]
        
        
        // Stacked regression and test if coef of foreign is different across reg1 and reg2
        
        * stacked regression
        reghdfe y i.dataset_num##(i.foreign), noabsorb
        
        * test foreign "across reg1 and reg2"
        test _b[2.dataset_num] = 0, notest
        test _b[2.dataset_num#1.foreign] = 0, accum  // the test of interest
        
        * Check that coefficients in stacked regression are same as those in separate ones
        di _n _b[_cons] _n `cons1'    
        assert float(_b[_cons])     == float(`cons1')                             // reg1, constant
        di _b[1.foreign] _n `b1'    
        assert float(_b[1.foreign]) == float(`b1')                                // reg1, foreign
        di _n _b[_cons] + _b[2.dataset_num] _n `cons2'    
        assert float(_b[_cons] + _b[2.dataset_num]) == float(`cons2')             // reg2, constant
        di _b[1.foreign] + _b[2.dataset_num#1.foreign] _n `b2'
        assert float(_b[1.foreign] + _b[2.dataset_num#1.foreign]) == float(`b2')  // reg2, foreign  
        
        
        cap erase "dat_appended.dta"
        
        *-------------------------------------------------------------------------------
        *-------------------------------------------------------------------------------
        The code shows that when not adding FEs we get exactly the same point estimates in the two "baseline" regressions and from the stacked regression.
        However, if you try to replace all noabsorb with, for instance, absorb(rep78), this doesn't hold anymore.

        Note that this can be easily extended when having additional regressors on the RHS of both models, as long as the additional regressors are mean-centered before interacting them with foreign (I tried this).
        This means that one could additionally control for the FE dummies, but I don't want to do so, since I am not interested in estimating them/testing their equality across regressions.

        Best wishes,
        S.
        Last edited by Stefano Lombardi; 19 Dec 2019, 10:30.

        Comment


        • #5
          Thanks for the data example. Here is the approach.

          Code:
          sysuse auto, clear
          *RUN SEPARATE REGRESSIONS
          reghdfe price foreign, a(rep78)
          reghdfe gear_ratio foreign, a(rep78)
          *RESTRUCTURE DATA
          rename (price gear_ratio) var#, addnumber(1)
          reshape long var, i(make) j(group)
          lab def group 1 "price" 2 "gear_ratio"
          lab values group group
          *GENERATE GROUP INDICATORS
          gen gr1= 1.group
          gen gr2= 2.group
          *RUN JOINT REGRESSION WITH ROBUST STD. ERRORS
          *CROSS-SECTIONAL LINEAR MODEL, ROBUST SE = CLUSTER BY OBSERVATION
          gen obs=_n
          reghdfe var c.gr1#(c.foreign) c.gr2#(c.foreign), a(i.rep78#c.gr1 i.rep78#c.gr2) cluster(obs)
          test c.gr1#c.foreign= c.gr2#c.foreign
          Res.:

          Code:
          .
          . *RUN SEPARATE REGRESSIONS
          
          .
          . reghdfe price foreign, a(rep78)
          (MWFE estimator converged in 1 iterations)
          
          HDFE Linear regression                            Number of obs   =         69
          Absorbing 1 HDFE group                            F(   1,     63) =       0.00
                                                            Prob > F        =     0.9711
                                                            R-squared       =     0.0145
                                                            Adj R-squared   =    -0.0637
                                                            Within R-sq.    =     0.0000
                                                            Root MSE        =  3003.7661
          
          ------------------------------------------------------------------------------
                 price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
               foreign |    36.7572   1010.484     0.04   0.971    -1982.533    2056.048
                 _cons |   6134.857   474.7025    12.92   0.000     5186.239    7083.474
          ------------------------------------------------------------------------------
          
          Absorbed degrees of freedom:
          -----------------------------------------------------+
           Absorbed FE | Categories  - Redundant  = Num. Coefs |
          -------------+---------------------------------------|
                 rep78 |         5           0           5     |
          -----------------------------------------------------+
          
          .
          . reghdfe gear_ratio foreign, a(rep78)
          (MWFE estimator converged in 1 iterations)
          
          HDFE Linear regression                            Number of obs   =         69
          Absorbing 1 HDFE group                            F(   1,     63) =      47.05
                                                            Prob > F        =     0.0000
                                                            R-squared       =     0.5409
                                                            Adj R-squared   =     0.5045
                                                            Within R-sq.    =     0.4276
                                                            Root MSE        =     0.3257
          
          ------------------------------------------------------------------------------
            gear_ratio |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
               foreign |   .7515638   .1095635     6.86   0.000     .5326186    .9705089
                 _cons |   2.770539   .0514704    53.83   0.000     2.667683    2.873394
          ------------------------------------------------------------------------------
          
          Absorbed degrees of freedom:
          -----------------------------------------------------+
           Absorbed FE | Categories  - Redundant  = Num. Coefs |
          -------------+---------------------------------------|
                 rep78 |         5           0           5     |
          -----------------------------------------------------+
          
          . *RUN JOINT REGRESSION WITH ROBUST STD. ERRORS
          
          . reghdfe var c.gr1#(c.foreign) c.gr2#(c.foreign), a(i.rep78#c.gr1 i.rep78#c.gr2) cluster(obs)
          (warning: no intercepts terms in absorb(); regression lacks constant term)
          (MWFE estimator converged in 2 iterations)
          
          HDFE Linear regression                            Number of obs   =        138
          Absorbing 2 HDFE groups                           F(   2,    126) =      22.69
          Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                            R-squared       =     0.8214
                                                            Adj R-squared   =     0.8044
                                                            Within R-sq.    =     0.0000
          Number of clusters (obs)     =        138         Root MSE        =  2123.9834
          
                                               (Std. Err. adjusted for 138 clusters in obs)
          ---------------------------------------------------------------------------------
                          |               Robust
                      var |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          ----------------+----------------------------------------------------------------
          c.gr1#c.foreign |    36.7572   658.8838     0.06   0.956    -1267.154    1340.669
                          |
          c.gr2#c.foreign |   .7515638   .1115713     6.74   0.000     .5307675      .97236
          ---------------------------------------------------------------------------------
          
          Absorbed degrees of freedom:
          -----------------------------------------------------+
           Absorbed FE | Categories  - Redundant  = Num. Coefs |
          -------------+---------------------------------------|
           rep78#c.gr1 |         5           0           5    ?|
           rep78#c.gr2 |         5           0           5    ?|
          -----------------------------------------------------+
          ? = number of redundant parameters may be higher
          
          . test c.gr1#c.foreign= c.gr2#c.foreign
          
           ( 1)  c.gr1#c.foreign - c.gr2#c.foreign = 0
          
                 F(  1,   126) =    0.00
                      Prob > F =    0.9565
          Last edited by Andrew Musau; 19 Dec 2019, 14:43.

          Comment


          • #6
            Excellent, thanks!
            Stefano

            Comment

            Working...
            X