I run a study (RCT) with subjects in 100 groups of heterogeneous sizes ranging from 2 to 20 individuals. I have data at the group level and at the individual level.
My main estimation parameter of interest is a treatment effect on a group-level outcome. So my main regressions are run at that level and I want to keep it that way. However, I have some supplementary results I use for interpretation, which are based on individual-level treatment effect regressions.
The issue is that, when I run individual-level regressions, subjects from larger groups mechanically make up a larger fraction of the sample and "dominate" the estimation. It thus seems appropriate to use weighting to make sure that each group weights the same total, so that results from the group-level regressions and the individual-level regressions are comparable.
My idea is to run for the group-level data:
and for the individual level data
1) Are pweights the right way to do this in Stata?
2) Is there anything else I need to worry about?
My main estimation parameter of interest is a treatment effect on a group-level outcome. So my main regressions are run at that level and I want to keep it that way. However, I have some supplementary results I use for interpretation, which are based on individual-level treatment effect regressions.
The issue is that, when I run individual-level regressions, subjects from larger groups mechanically make up a larger fraction of the sample and "dominate" the estimation. It thus seems appropriate to use weighting to make sure that each group weights the same total, so that results from the group-level regressions and the individual-level regressions are comparable.
My idea is to run for the group-level data:
Code:
reg y_group treatment
Code:
bys group_id: inverse_groupsize = 1/_N reg y_indiv treatment [pweight=inverse_groupsize], cluster(group_id)
2) Is there anything else I need to worry about?
Comment