Hi all,
We're working on an analysis that uses matching with replacement. Each treatment individual appears in the data once, and each comparison individual is potentially matched to multiple treatment individuals with the variable _weight indicating the number of treatment individuals to whom they were matched.
My understanding of stata's weights was that the fweight option is tailor made to account for the fact that the error term is perfectly correlated among repeated matches of the same comparison student. This interpretation seems consistent with what the psmatch2 folks suggest too, accoring to https://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm
I therefore suggested we estimate the effect using something like
reg y x [fweight=_weight]
My colleague however caught that we get different results if we also cluster on an indiviudal ID variable, as in
reg y x [fweight=_weight], cluster(ID)
I'm now having a crisis of confidence! Could it be that fweights is merely a convenience function taht accounts for observationally identical individuals, without accounting for the correlated error among them? And thus the additional clustering option is needed when duplicate observations are duplicate people?
Curiously, the fweight with a cluster option leads to the same results as a pweight regression without clusters. Is it therefore valid to use pweights to account for duplicate individuals?
Thanks much,
Paul
We're working on an analysis that uses matching with replacement. Each treatment individual appears in the data once, and each comparison individual is potentially matched to multiple treatment individuals with the variable _weight indicating the number of treatment individuals to whom they were matched.
My understanding of stata's weights was that the fweight option is tailor made to account for the fact that the error term is perfectly correlated among repeated matches of the same comparison student. This interpretation seems consistent with what the psmatch2 folks suggest too, accoring to https://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm
I therefore suggested we estimate the effect using something like
reg y x [fweight=_weight]
My colleague however caught that we get different results if we also cluster on an indiviudal ID variable, as in
reg y x [fweight=_weight], cluster(ID)
I'm now having a crisis of confidence! Could it be that fweights is merely a convenience function taht accounts for observationally identical individuals, without accounting for the correlated error among them? And thus the additional clustering option is needed when duplicate observations are duplicate people?
Curiously, the fweight with a cluster option leads to the same results as a pweight regression without clusters. Is it therefore valid to use pweights to account for duplicate individuals?
Thanks much,
Paul
Comment