Hi everyone,
I have data with N=10 000 corresponding to observations that belong to 50 different groups. For each group, I calculate the treatment effect of T (randomly assigned). I end up with 50 different coefficients.
Then I go back to my sample and I split it into two groups, one with observations that belong to a group with a treatment effect higher than the median, and another half with observations that belong to groups with treatment effects below the median.
Finally, I calculate treatment effects on various other outcomes for the two halves of the sample. effectively I am calculating treatment effects on various outcomes for individuals with a high and low treatment effect on a primary outcome.
I worry this may create a sort of over fitting bias and affect the treatment effects on various outcomes, since I would be using information that is based on post treatment outcomes to split my sample. But I do not know how to formalize this concern. Further, there may not be any issue in the first place ! I wonder if anyone here can comment on this potential issue and/or provide references to papers or books describing this issue?
Thanks in advance !
I have data with N=10 000 corresponding to observations that belong to 50 different groups. For each group, I calculate the treatment effect of T (randomly assigned). I end up with 50 different coefficients.
Then I go back to my sample and I split it into two groups, one with observations that belong to a group with a treatment effect higher than the median, and another half with observations that belong to groups with treatment effects below the median.
Finally, I calculate treatment effects on various other outcomes for the two halves of the sample. effectively I am calculating treatment effects on various outcomes for individuals with a high and low treatment effect on a primary outcome.
I worry this may create a sort of over fitting bias and affect the treatment effects on various outcomes, since I would be using information that is based on post treatment outcomes to split my sample. But I do not know how to formalize this concern. Further, there may not be any issue in the first place ! I wonder if anyone here can comment on this potential issue and/or provide references to papers or books describing this issue?
Thanks in advance !
Comment