Hello,
I was using survey data from Statistics Canada (N = 25113), and they provide p-weights for the data set. According to the instructions in the user documentation, if regression is being used on a subset of the data, one needs to ensure that the mean for the p-weights is 1. In other words new_weight=wts_m/[mean of wts_m]. I was curious what effect this had on the overall regression model, so I compared one simple regression model with the original weights (wts_m) against another simple regression model with re-calculated weights (pw_c).
As you can see, the two models are identical (except for the sum of the weights). My basic question is this: why would Stats Canada "insist" on re-scaling the pweights if there were no differences between the models? Or is there a difference that isn't being displayed?
Cheers,
David.
I was using survey data from Statistics Canada (N = 25113), and they provide p-weights for the data set. According to the instructions in the user documentation, if regression is being used on a subset of the data, one needs to ensure that the mean for the p-weights is 1. In other words new_weight=wts_m/[mean of wts_m]. I was curious what effect this had on the overall regression model, so I compared one simple regression model with the original weights (wts_m) against another simple regression model with re-calculated weights (pw_c).
Code:
. regress distress dhhgage [pw=wts_m]
(sum of wgt is 2.8121e+07)
Linear regression Number of obs = 24927
F( 1, 24925) = 273.97
Prob > F = 0.0000
R-squared = 0.0219
Root MSE = 5.357
------------------------------------------------------------------------------
| Robust
distress | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
dhhgage | -.2206369 .01333 -16.55 0.000 -.2467644 -.1945095
_cons | 6.728471 .1171459 57.44 0.000 6.498858 6.958084
------------------------------------------------------------------------------
. regress distress dhhgage [pw=pw_c]
(sum of wgt is 2.5394e+04)
Linear regression Number of obs = 24927
F( 1, 24925) = 273.97
Prob > F = 0.0000
R-squared = 0.0219
Root MSE = 5.357
------------------------------------------------------------------------------
| Robust
distress | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
dhhgage | -.2206369 .01333 -16.55 0.000 -.2467644 -.1945095
_cons | 6.728471 .1171459 57.44 0.000 6.498858 6.958084
------------------------------------------------------------------------------
Cheers,
David.
Comment