Dear all,
I have a question using the reghdfe command and pweight using Stata 16. I run an interaction analysis, and I am interested in the effect of VAR1 dependent on VAR2. I use a weighting, as this makes sense in my question. That works fine.
gen VAR1_VAR2 = VAR1*VAR2
reghdfe ///
ln_gross_investment_total ///
VAR1 VAR2 ///
VAR1_VAR2 ///
[pweight = WEIGHT], cluster(CLUSTER) a(FE) keepsingleton
Then, I perform a sample split in which I split my sample according to a dummy variable SPLIT into high (=1) and low (=0) values for that variable
reghdfe ///
ln_gross_investment_total ///
VAR1 VAR2 ///
VAR1_VAR2 ///
if SPLIT==1 [pweight = WEIGHT], cluster(CLUSTER) a(FE) keepsingleton
reghdfe ///
ln_gross_investment_total ///
VAR1 VAR2 ///
VAR1_VAR2 ///
if SPLIT==0 [pweight = WEIGHT], cluster(CLUSTER) a(FE) keepsingleton
Again, this also works fine. I get reasonable results. Now the problem begins: When I then estimate my whole sample again, using an interaction with SPLIT to test the significance of the difference. I get a different SPLIT_VAR1_VAR2 estimator than if I would compute the difference manually.
gen SPLIT_VAR1 = SPLIT*VAR1
gen SPLIT_VAR2 = SPLIT*VAR2
gen SPLIT_VAR1_VAR2 = SPLIT*VAR1_VAR2
reghdfe ///
ln_gross_investment_total ///
VAR1 VAR2 ///
VAR1_VAR2 ///
SPLIT_VAR1 ///
SPLIT_VAR2 ///
SPLIT_VAR1_VAR2 ///
[pweight = WEIGHT], cluster(CLUSTER) a(FE) keepsingleton
Natally, SPLIT_VAR1_VAR2 should yield the same as VAR1_VAR2 (if SPLIT==1) - VAR1_VAR2 (if SPLIT==0)
When I exclude the weighting from my analysis, this holds true. When I employ pweigh, this is no longer the case. Can it be true, that pweigth somehow biases my results in any direction? I thought that pweight would work also well in subsamples.
Thanks for your answers. I hope I have not forgotten anything. Let me know if you need further information.
Best,
Robert
I have a question using the reghdfe command and pweight using Stata 16. I run an interaction analysis, and I am interested in the effect of VAR1 dependent on VAR2. I use a weighting, as this makes sense in my question. That works fine.
gen VAR1_VAR2 = VAR1*VAR2
reghdfe ///
ln_gross_investment_total ///
VAR1 VAR2 ///
VAR1_VAR2 ///
[pweight = WEIGHT], cluster(CLUSTER) a(FE) keepsingleton
Then, I perform a sample split in which I split my sample according to a dummy variable SPLIT into high (=1) and low (=0) values for that variable
reghdfe ///
ln_gross_investment_total ///
VAR1 VAR2 ///
VAR1_VAR2 ///
if SPLIT==1 [pweight = WEIGHT], cluster(CLUSTER) a(FE) keepsingleton
reghdfe ///
ln_gross_investment_total ///
VAR1 VAR2 ///
VAR1_VAR2 ///
if SPLIT==0 [pweight = WEIGHT], cluster(CLUSTER) a(FE) keepsingleton
Again, this also works fine. I get reasonable results. Now the problem begins: When I then estimate my whole sample again, using an interaction with SPLIT to test the significance of the difference. I get a different SPLIT_VAR1_VAR2 estimator than if I would compute the difference manually.
gen SPLIT_VAR1 = SPLIT*VAR1
gen SPLIT_VAR2 = SPLIT*VAR2
gen SPLIT_VAR1_VAR2 = SPLIT*VAR1_VAR2
reghdfe ///
ln_gross_investment_total ///
VAR1 VAR2 ///
VAR1_VAR2 ///
SPLIT_VAR1 ///
SPLIT_VAR2 ///
SPLIT_VAR1_VAR2 ///
[pweight = WEIGHT], cluster(CLUSTER) a(FE) keepsingleton
Natally, SPLIT_VAR1_VAR2 should yield the same as VAR1_VAR2 (if SPLIT==1) - VAR1_VAR2 (if SPLIT==0)
When I exclude the weighting from my analysis, this holds true. When I employ pweigh, this is no longer the case. Can it be true, that pweigth somehow biases my results in any direction? I thought that pweight would work also well in subsamples.
Thanks for your answers. I hope I have not forgotten anything. Let me know if you need further information.
Best,
Robert

Comment