Conditioning on group with largest treatment effect

Laurie Molina

Join Date: Jul 2014

Posts: 16
#1

Conditioning on group with largest treatment effect

16 Aug 2024, 20:56

Hi everyone,

I have data with N=10 000 corresponding to observations that belong to 50 different groups. For each group, I calculate the treatment effect of T (randomly assigned). I end up with 50 different coefficients.
Then I go back to my sample and I split it into two groups, one with observations that belong to a group with a treatment effect higher than the median, and another half with observations that belong to groups with treatment effects below the median.

Finally, I calculate treatment effects on various other outcomes for the two halves of the sample. effectively I am calculating treatment effects on various outcomes for individuals with a high and low treatment effect on a primary outcome.

I worry this may create a sort of over fitting bias and affect the treatment effects on various outcomes, since I would be using information that is based on post treatment outcomes to split my sample. But I do not know how to formalize this concern. Further, there may not be any issue in the first place ! I wonder if anyone here can comment on this potential issue and/or provide references to papers or books describing this issue?

Thanks in advance !
Tags: None
George Ford

Join Date: Aug 2014

Posts: 3138
#2

17 Aug 2024, 13:02

Interesting question. I haven't run across this sort of thing, but others are likely more familiar with the literature. It's basically an argument for heterogenous treatment effects.

I suspect the treatment effects may be biased. If the effects are correlated, then there's an endogeneity issue. And, you are creating control groups based on treatment A, which leads to questions about parallel paths.

Another option might be to get all the treatment effects for the full sample, and then analyze those by the split.

Or, maybe you need to estimate all the treatment effects jointly, and then analyze by the split.
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 423
#3

19 Aug 2024, 11:13

I am afraid that I do not fully understand your approach with the median split and then looking at other outcomes based on their median split on the first outcome. What do you hope to get by doing this? Is there something in the literature or in theory that is informing the approach?

As George Ford suggested, I would instead estimate all the treatment effects jointly using either gsem or mixed, given the multilevel nature of the data. What this does is account for and model the following:
Shared outcome variance due to measuring different outcomes on the same individual

The continuous nature of the group-level treatment effect on each outcome

The common part of the treatment effect that is due to group

Preserve uncertainty about both the group treatment effect and the group effect on the intercept.

Below I coded up a simulation of what I think your data looks like and the syntax for the two estimation approaches. They yield the same results, but differ in the number of unique parameter estimates you get in the random effects. This is because gsem is truly multivariate. We are tricking mixed to deal with the multivariate data and you get something like average random effects for each of the two unique outcomes. Inspiration from the simulation comes from this post and code by Joseph Coveney.

Code:

version 16.1 clear * set seed 346201 * Create groups and group-level random intercept and slopes quietly set obs 50 generate gid = _n * group intercept that is constant across outcomes generate double gid_u = rnormal(0, 1) * correlated treatment effect slopes matrix sd = (0.4, 0.2) drawnorm gid_u0 gid_u1, double corr(1 0.5 \ 0.5 1) sds(sd) * Cases within groups quietly expand 200 generate cid = _n * correlated outcomes drawnorm out0 out1, double corr(1 0.5 \ 0.5 1) * treatment assignment gen trtmt = runiform()<.5 // equal probability of being treatment * Add treatment effect (unique for each outcome) and group random effects quietly { replace out0 = out0 + .6*trtmt + trtmt*gid_u0 + gid_u replace out1 = out1 + .3*trtmt + trtmt*gid_u1 + gid_u } * gsem gsem (out? <- i.trtmt M[gid] 1.trtmt#M1[gid]), /// covstructure(e._En, unstructured) /// nocnsreport nodvheader // nolog * For mixed, Reshape so each case has two rows quietly reshape long out , i(cid) j(subj) mixed out i.subj##i.trtmt || gid: trtmt, cov(un) || cid: , /// noconstant residuals(unstructured, t(subj)) stddev /// nogroup nolrtest exit
Comment

Announcement

Conditioning on group with largest treatment effect

Comment

Comment