Dear Statalists,
I do not understand why a weighted linear regression,
Code:
svy:reg
In the example below, the sample size used in the model is 48974, which includes 10421 respondents whose outcome variable value is missing.
Code:
svyset psu [pw=pw_xw], strata(strata) singleunit(scaled)
pweight: pw_xw
VCE: linearized
Single unit: scaled
Strata 1: strata
SU 1: psu
FPC 1: <zero>
. svy: reg v1 v2
(running regress on estimation sample)
Survey: Linear regression
Number of strata = 1,769 Number of obs = 48,974
Number of PSUs = 7,699 Population size = 38,107.278
Design df = 5,930
F( 1, 5930) = 161.60
Prob > F = 0.0000
R-squared = 0.0043
------------------------------------------------------------------------------
| Linearized
v1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
v2 | -.0985989 .0077561 -12.71 0.000 -.1138037 -.0833941
_cons | 3.633923 .0065863 551.74 0.000 3.621011 3.646835
------------------------------------------------------------------------------
Note: 7 strata omitted because they contain no population members.
. count if v1==. & e(sample)==1
10,421
.
Code:
. svyset [pw=pw_xw]
pweight: pw_xw
VCE: linearized
Single unit: missing
Strata 1: <one>
SU 1: <observations>
FPC 1: <zero>
. svy: reg v1 v2
(running regress on estimation sample)
Survey: Linear regression
Number of strata = 1 Number of obs = 49,053
Number of PSUs = 49,053 Population size = 38,107.278
Design df = 49,052
F( 1, 49052) = 132.80
Prob > F = 0.0000
R-squared = 0.0043
------------------------------------------------------------------------------
| Linearized
v1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
v2 | -.0985989 .0085559 -11.52 0.000 -.1153686 -.0818292
_cons | 3.633923 .0057285 634.36 0.000 3.622695 3.645151
------------------------------------------------------------------------------
. count if v1==. & e(sample)==1
10,498
Earlier posts suggest that the differences between using the prefix svy and putting [pw==...] at the end is related to subpopulation. But I do not think the subpopulation issue is involved here.
I am using Stata15 MP, windows, 64-bit.
Many thanks.
Regards,
Min

Comment