Dear Statalists,
I do not understand why a weighted linear regression,
Code:
svy:reg
In the example below, the sample size used in the model is 48974, which includes 10421 respondents whose outcome variable value is missing.
Code:
svyset psu [pw=pw_xw], strata(strata) singleunit(scaled) pweight: pw_xw VCE: linearized Single unit: scaled Strata 1: strata SU 1: psu FPC 1: <zero> . svy: reg v1 v2 (running regress on estimation sample) Survey: Linear regression Number of strata = 1,769 Number of obs = 48,974 Number of PSUs = 7,699 Population size = 38,107.278 Design df = 5,930 F( 1, 5930) = 161.60 Prob > F = 0.0000 R-squared = 0.0043 ------------------------------------------------------------------------------ | Linearized v1 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- v2 | -.0985989 .0077561 -12.71 0.000 -.1138037 -.0833941 _cons | 3.633923 .0065863 551.74 0.000 3.621011 3.646835 ------------------------------------------------------------------------------ Note: 7 strata omitted because they contain no population members. . count if v1==. & e(sample)==1 10,421 .
Code:
. svyset [pw=pw_xw] pweight: pw_xw VCE: linearized Single unit: missing Strata 1: <one> SU 1: <observations> FPC 1: <zero> . svy: reg v1 v2 (running regress on estimation sample) Survey: Linear regression Number of strata = 1 Number of obs = 49,053 Number of PSUs = 49,053 Population size = 38,107.278 Design df = 49,052 F( 1, 49052) = 132.80 Prob > F = 0.0000 R-squared = 0.0043 ------------------------------------------------------------------------------ | Linearized v1 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- v2 | -.0985989 .0085559 -11.52 0.000 -.1153686 -.0818292 _cons | 3.633923 .0057285 634.36 0.000 3.622695 3.645151 ------------------------------------------------------------------------------ . count if v1==. & e(sample)==1 10,498
Earlier posts suggest that the differences between using the prefix svy and putting [pw==...] at the end is related to subpopulation. But I do not think the subpopulation issue is involved here.
I am using Stata15 MP, windows, 64-bit.
Many thanks.
Regards,
Min
Comment