Replicating cluster sandard errors by subsample in a fully-interacted OLS model

Junida Mulla

Join Date: Nov 2023

Posts: 4
#1

Replicating cluster sandard errors by subsample in a fully-interacted OLS model

13 Nov 2023, 04:56

Hello,
I am doing a subsample analysis for the effect of X on Y, based on whether observations fall in a particular Subsample (defined by dummy variable S). I am clustering the standard errors by group (variable gr). These are the models I am running:

Subsample a: reg Y X if S==0, cluster(gr)
Subample b: reg Y X if S==1, cluster(gr)

Because I want to test for the difference of the effect of X on Y across the subgroups, I am running the fully-interacted model, aiming to replicate the estimates from Subsamples 1 and 2:

Fully interacted model: reg Y c.X#i.S c._cons#i.S, nocons cluster(gr), where _cons=1.

While this approach replicates my coefficient estimates, it does not replicate my standard errors. I think the problem is with how standard errors are clustered. I tried an alternative approach, by creating double clusters manually at the group&subsample level, but it still did not replicate my s.e. across subsamples:

egen double_clus_group=group(gr S)

reg Y c.X#i.S c._cons#i.S, nocons cluster(double_clus_group)

Any help is appreciated.

Last edited by Junida Mulla; 13 Nov 2023, 05:11.
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10254
#2

13 Nov 2023, 09:31

You cannot get the same standard errors as the sample sizes differ across the regressions, and the standard error is a function of the sample size. suest implements the correct approach as far as cross-model hypothesis testing is concerned, and the advice is to estimate the individual regressions without clustering and then to cluster using suest. See

Code:

help suest

for a discussion.
Comment
Junida Mulla

Join Date: Nov 2023

Posts: 4
#3

13 Nov 2023, 11:33

Thank you so much, Andrew. One question: I am also using cluster() option in the equivalent IV regressions (after ivreg2). There the fully interacted model gives exact s.e. as in the subsample regressions. Is that because the cluster() option after IVREG2 produces both heteroscadicity robust and accounts for clusters (I am assuming the formula would not depend on number of observations). More importantly, is it OK that I use the fully interacted model in the 2SLS regressions to test for difference in coefficient of X across subsamples, or is there a way equivalent to suest that I can use for conducting same test after instrumented regressions?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10254
#4

13 Nov 2023, 12:45

suest and an interacted model are equivalent if you read through the documentation, except for models that include ancilliary parameters, e.g., the cut-points in an ordered logit model. In such cases, suest should be preferred as the subsample regressions will estimate the ancilliary parameters separately whereas an interacted model will estimate them jointly and therefore constrain them in some way.

More importantly, is it OK that I use the fully interacted model in the 2SLS regressions to test for difference in coefficient of X across subsamples

Yes, I illustrate exactly that here: https://www.statalist.org/forums/for...ferent-samples
Comment

Announcement

Replicating cluster sandard errors by subsample in a fully-interacted OLS model

Comment

Comment

Comment