Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Conditioning on group with largest treatment effect

    Hi everyone,


    I have data with N=10 000 corresponding to observations that belong to 50 different groups. For each group, I calculate the treatment effect of T (randomly assigned). I end up with 50 different coefficients.
    Then I go back to my sample and I split it into two groups, one with observations that belong to a group with a treatment effect higher than the median, and another half with observations that belong to groups with treatment effects below the median.

    Finally, I calculate treatment effects on various other outcomes for the two halves of the sample. effectively I am calculating treatment effects on various outcomes for individuals with a high and low treatment effect on a primary outcome.


    I worry this may create a sort of over fitting bias and affect the treatment effects on various outcomes, since I would be using information that is based on post treatment outcomes to split my sample. But I do not know how to formalize this concern. Further, there may not be any issue in the first place ! I wonder if anyone here can comment on this potential issue and/or provide references to papers or books describing this issue?


    Thanks in advance !

  • #2
    Interesting question. I haven't run across this sort of thing, but others are likely more familiar with the literature. It's basically an argument for heterogenous treatment effects.

    I suspect the treatment effects may be biased. If the effects are correlated, then there's an endogeneity issue. And, you are creating control groups based on treatment A, which leads to questions about parallel paths.

    Another option might be to get all the treatment effects for the full sample, and then analyze those by the split.

    Or, maybe you need to estimate all the treatment effects jointly, and then analyze by the split.



    Comment


    • #3
      I am afraid that I do not fully understand your approach with the median split and then looking at other outcomes based on their median split on the first outcome. What do you hope to get by doing this? Is there something in the literature or in theory that is informing the approach?

      As George Ford suggested, I would instead estimate all the treatment effects jointly using either gsem or mixed, given the multilevel nature of the data. What this does is account for and model the following:
      • Shared outcome variance due to measuring different outcomes on the same individual
      • The continuous nature of the group-level treatment effect on each outcome
      • The common part of the treatment effect that is due to group
      • Preserve uncertainty about both the group treatment effect and the group effect on the intercept.
      Below I coded up a simulation of what I think your data looks like and the syntax for the two estimation approaches. They yield the same results, but differ in the number of unique parameter estimates you get in the random effects. This is because gsem is truly multivariate. We are tricking mixed to deal with the multivariate data and you get something like average random effects for each of the two unique outcomes. Inspiration from the simulation comes from this post and code by Joseph Coveney.
      Code:
      version 16.1
      
      clear *
      
      set seed 346201
      
      * Create groups and group-level random intercept and slopes    
      quietly set obs 50
      generate gid = _n
      * group intercept that is constant across outcomes
      generate double gid_u = rnormal(0, 1)
      * correlated treatment effect slopes 
      matrix sd = (0.4, 0.2)
      drawnorm gid_u0 gid_u1, double corr(1 0.5 \ 0.5 1) sds(sd)
      
      * Cases within groups
      quietly expand 200
      generate cid = _n
      * correlated outcomes
      drawnorm out0 out1, double corr(1 0.5 \ 0.5 1)
      * treatment assignment
      gen trtmt = runiform()<.5    // equal probability of being treatment
      
      * Add treatment effect (unique for each outcome) and group random effects 
      quietly {
          replace out0 = out0 + .6*trtmt + trtmt*gid_u0 + gid_u
          replace out1 = out1 + .3*trtmt + trtmt*gid_u1 + gid_u
      }
      
      * gsem
      gsem (out? <- i.trtmt M[gid] 1.trtmt#M1[gid]), ///
          covstructure(e._En, unstructured) ///
          nocnsreport nodvheader // nolog
      
      * For mixed, Reshape so each case has two rows
      quietly reshape long out , i(cid) j(subj)
      mixed out i.subj##i.trtmt || gid: trtmt, cov(un) || cid: , ///
          noconstant residuals(unstructured, t(subj)) stddev ///
          nogroup nolrtest 
      
      exit

      Comment

      Working...
      X