Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Testing equality of coefficients across same model with different samples and two-way clustered errors

    Hi,

    I'm looking for something equivalent to -suest- that allows for two-way clustering.

    In particular, I want to test the equality of the two regression coefficient across the same model but different samples. That in itself is not a problem -- the problem is that I want to use two-way clustered standard errors, and I have not found a solution for this.

    Suppose I have one outcome y and one independent variable x, and I want to cluster on two variables: c1 and c2. In a perfect world, I would run the following code:

    Code:
    reg y x if sample1 == 1
    estimate store e1
    reg y x if sample2 == 1
    estimate store e2
    suest e1 e2, r cluster(c1 c2)
    test [e1_mean]x = [e2_mean]x
    The problem is -suest- doesn't allow for two-way clustering, so I get the error message "cluster(): too many variables specified".

    Does anyone have any suggestions of how I might proceed? This feels like a straightforward problem but I cannot figure out a solution. Thanks in advance.

    Andrew (using Stata 15 on a Macbook Pro OS 10.14.6 Mojave)
    Last edited by Andrew Chan; 14 Nov 2019, 11:17.

  • #2
    Depending on how the samples are defined, you can use an estimator that allows double clustering and specify an interaction model. reghdfe from SSC is one such estimator. To replicate the results of regress, create a constant variable and absorb it. Here is an example:

    Code:
    sysuse auto, clear
    regress price mpg weight
    gen avar=1
    reghdfe price mpg weight, a(avar)

    Res.:

    Code:
    . regress price mpg weight
    
          Source |       SS           df       MS      Number of obs   =        74
    -------------+----------------------------------   F(2, 71)        =     14.74
           Model |   186321280         2  93160639.9   Prob > F        =    0.0000
        Residual |   448744116        71  6320339.67   R-squared       =    0.2934
    -------------+----------------------------------   Adj R-squared   =    0.2735
           Total |   635065396        73  8699525.97   Root MSE        =      2514
    
    ------------------------------------------------------------------------------
           price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             mpg |  -49.51222   86.15604    -0.57   0.567    -221.3025     122.278
          weight |   1.746559   .6413538     2.72   0.008      .467736    3.025382
           _cons |   1946.069    3597.05     0.54   0.590    -5226.245    9118.382
    ------------------------------------------------------------------------------
    
    
    . reghdfe price mpg weight, a(avar)
    (MWFE estimator converged in 1 iterations)
    
    HDFE Linear regression                            Number of obs   =         74
    Absorbing 1 HDFE group                            F(   2,     71) =      14.74
                                                      Prob > F        =     0.0000
                                                      R-squared       =     0.2934
                                                      Adj R-squared   =     0.2735
                                                      Within R-sq.    =     0.2934
                                                      Root MSE        =  2514.0286
    
    ------------------------------------------------------------------------------
           price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             mpg |  -49.51222   86.15604    -0.57   0.567    -221.3025     122.278
          weight |   1.746559   .6413538     2.72   0.008      .467736    3.025382
           _cons |   1946.069    3597.05     0.54   0.590    -5226.245    9118.382
    ------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    -----------------------------------------------------+
     Absorbed FE | Categories  - Redundant  = Num. Coefs |
    -------------+---------------------------------------|
            avar |         1           0           1     |
    -----------------------------------------------------+

    Comment


    • #3
      Hi Andrew,

      I appreciate the response, however your suggestion does not solve my problem. I've previously tried something similar to your example, however the problem is that -suest- results in the the error message "re-estimate without the cluster() option, and specify the cluster() option with suest."

      Below I've tried to create a minimum-working example using the same dataset as your example. Also note that creating a constant for -reghdfe- is not necessary as you can use the -noabsorb- option in its place. Also note that the two-way clustering I choose here doesn't make sense, but it helps to make my point.

      Code:
      // suest fails because of two-way clustering with reghdfe
      
      sysuse auto, clear
      sum weight, detail
      g heavy = weight > r(p50)
      reghdfe mpg weight, noabsorb resid cluster(foreign trunk)
      eststo e1
      reghdfe mpg weight if heavy == 1, noabsorb resid cluster(foreign trunk)
      eststo e2
      suest e1 e2
      test [e1_mean]weight = [e2_mean]weight
      
      
      // Works, however can only cluster on just foreign (or trunk) but never both
      
      sysuse auto, clear
      sum weight, detail
      g heavy = weight > r(p50)
      reg mpg weight
      eststo e1
      reg mpg weight if heavy == 1
      eststo e2
      suest e1 e2, cluster(foreign)
      test [e1_mean]weight = [e2_mean]weight
      The problem is that -suest- does not allow two-way clustering, and -suest- cannot handle stored estimates that were estimated with two-way clustered errors.
      Last edited by Andrew Chan; 18 Nov 2019, 07:52.

      Comment


      • #4
        I make no mention of suest in #2.
        1. In a regression, you can test differences in coefficients by including interactions.
        2. reghdfe allows multi-way clustering.

        So do it all in reghdfe.

        Comment


        • #5
          Thanks for the clarification of your answer. I think issue is the way I described my problem initially, which is not totally accurate. Sorry about that. Here's one last try. Suppose my data looked like this:
          y x1 x2 c1 c2
          0.5 30 30 1 3
          0.23 150 150 2 3
          0.76 250 . 1 3
          0.33 20 . 2 4
          0.05 5 . 1 4
          0.5 100 . 2 4
          0.25 50 50 1 4
          0.8 400 400 2 4
          Where x2 = x1 for a subset of the data. What I want to do is run the following code:

          Code:
          reghdfe y x1, noa cluster(c1 c2)
          eststo e1
          reghdfe y x2, noa cluster(c1 c2)
          eststo e2
          suest e1 e2
          test [e1_mean]x1 = [e2_mean]x2
          Which gives me the error message "re-estimate without the cluster() option, and specify the cluster() option with suest."

          I don't think an interaction makes sense in this case, does it? Sorry for my misunderstanding.

          Comment


          • #6
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input float y int(x1 x2) byte(c1 c2)
             .5  30  30 1 3
            .23 150 150 2 3
            .76 250   . 1 3
            .33  20   . 2 4
            .05   5   . 1 4
             .5 100   . 2 4
            .25  50  50 1 4
             .8 400 400 2 4
            end
            *INITIAL REGRESSIONS
            reghdfe y x1, noa cluster(c1 c2)
            reghdfe y x2, noa cluster(c1 c2)
            *RESHAPE DATA
            gen id=_n
            reshape long x, i(id) j(group)
            *GENERATE CONSTANT
            gen cons=1
            reghdfe y i.group#(c.x c.cons), noa nocons cluster(c1 c2)
            *TEST EQUALITY OF COEFFICIENTS
            test 1.group#c.x =2.group#c.x
            *OR DIRECTLY - THE COEFFICIENT OF THE INTERACTION IS THE DIFFERENCE
            reghdfe y i.group##(c.x), noa cluster(c1 c2)
            Res.:

            Code:
            . *INITIAL REGRESSIONS
            
            .
            . reghdfe y x1, noa cluster(c1 c2)
            (MWFE estimator converged in 1 iterations)
            Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Mille
            > r applied.
            
            HDFE Linear regression                            Number of obs   =          8
            Absorbing 1 HDFE group                            F(   1,      1) =      34.97
            Statistics robust to heteroskedasticity           Prob > F        =     0.1066
                                                              R-squared       =     0.6206
                                                              Adj R-squared   =     0.5574
            Number of clusters (c1)      =          2         Within R-sq.    =     0.6206
            Number of clusters (c2)      =          2         Root MSE        =     0.1746
            
                                              (Std. Err. adjusted for 2 clusters in c1 c2)
            ------------------------------------------------------------------------------
                         |               Robust
                       y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                      x1 |   .0015027   .0002541     5.91   0.107    -.0017259    .0047312
                   _cons |   .2387264   .0074597    32.00   0.020     .1439422    .3335106
            ------------------------------------------------------------------------------
            
            .
            . reghdfe y x2, noa cluster(c1 c2)
            (MWFE estimator converged in 1 iterations)
            Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Mille
            > r applied.
            
            HDFE Linear regression                            Number of obs   =          4
            Absorbing 1 HDFE group                            F(   1,      1) =       3.41
            Statistics robust to heteroskedasticity           Prob > F        =     0.3161
                                                              R-squared       =     0.5589
                                                              Adj R-squared   =     0.3383
            Number of clusters (c1)      =          2         Within R-sq.    =     0.5589
            Number of clusters (c2)      =          2         Root MSE        =     0.2169
            
                                              (Std. Err. adjusted for 2 clusters in c1 c2)
            ------------------------------------------------------------------------------
                         |               Robust
                       y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                      x2 |   .0011728   .0006355     1.85   0.316    -.0069014     .009247
                   _cons |   .2602884   .0587436     4.43   0.141    -.4861201    1.006697
            ------------------------------------------------------------------------------
            
            
            .
            . reghdfe y i.group#(c.x c.cons), noa nocons cluster(c1 c2)
            (MWFE estimator converged in 1 iterations)
            Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Mille
            > r applied.
            warning: missing F statistic; dropped variables due to collinearity or too few clusters
            note: 2.group#c.cons omitted because of collinearity
            
            HDFE Linear regression                            Number of obs   =         12
            Absorbing 1 HDFE group                            F(   3,      1) =          .
            Statistics robust to heteroskedasticity           Prob > F        =          .
                                                              R-squared       =     0.6022
                                                              Adj R-squared   =     0.4530
            Number of clusters (c1)      =          2         Within R-sq.    =     0.6022
            Number of clusters (c2)      =          2         Root MSE        =     0.1861
            
                                              (Std. Err. adjusted for 2 clusters in c1 c2)
            ------------------------------------------------------------------------------
                         |               Robust
                       y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
               group#c.x |
                      1  |   .0015027   .0002605     5.77   0.109    -.0018067    .0048121
                      2  |   .0011728   .0005643     2.08   0.286    -.0059979    .0083434
                         |
            group#c.cons |
                      1  |   -.021562   .1382686    -0.16   0.902    -1.778431    1.735307
                      2  |          0  (omitted)
            ------------------------------------------------------------------------------
            
            .
            . *TEST EQUALITY OF COEFFICIENTS
            
            .
            . test 1.group#c.x =2.group#c.x
            
             ( 1)  1b.group#c.x - 2.group#c.x = 0
            
                   F(  1,     1) =    0.18
                        Prob > F =    0.7473
            
            .
            . *OR DIRECTLY - THE COEFFICIENT OF THE INTERACTION IS THE DIFFERENCE
            
            . reghdfe y i.group##(c.x), noa cluster(c1 c2)
            (MWFE estimator converged in 1 iterations)
            Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Mille
            > r applied.
            warning: missing F statistic; dropped variables due to collinearity or too few clusters
            
            HDFE Linear regression                            Number of obs   =         12
            Absorbing 1 HDFE group                            F(   3,      1) =          .
            Statistics robust to heteroskedasticity           Prob > F        =          .
                                                              R-squared       =     0.6022
                                                              Adj R-squared   =     0.4530
            Number of clusters (c1)      =          2         Within R-sq.    =     0.6022
            Number of clusters (c2)      =          2         Root MSE        =     0.1861
            
                                              (Std. Err. adjusted for 2 clusters in c1 c2)
            ------------------------------------------------------------------------------
                         |               Robust
                       y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                 2.group |    .021562   .1380912     0.16   0.901    -1.733053    1.776177
                       x |   .0015027   .0002752     5.46   0.115    -.0019939    .0049992
                         |
               group#c.x |
                      2  |  -.0003299   .0007694    -0.43   0.742    -.0101061    .0094463
                         |
                   _cons |   .2387264   .0087814    27.19   0.023     .1271481    .3503047
            ------------------------------------------------------------------------------
            Last edited by Andrew Musau; 18 Nov 2019, 12:53.

            Comment


            • #7
              Thank you Andrew!

              Comment

              Working...
              X