svy: mean .., over () - need formula for standard deviation, not given in the manual

Nona Karalashvili

Join Date: Aug 2014

Posts: 20
#1

svy: mean .., over () - need formula for standard deviation, not given in the manual

01 May 2015, 05:08

Hi All,

I am using the following command: svy: mean height, over(size). The standard deviations over each level of the size variable is different from the standard deviations calculated separately for each size. I do not understand why. I read the manual and formulas are not given for this particular case. Would you be so kind to let me know what formula is used to calculate standard deviations when using "over()" option?

Many thanks
Nona
Tags: None

Steve Samuels

Join Date: Mar 2014
Posts: 1786

01 May 2015, 19:46

Please show us the commands and results that you are comparing (FAQ 12). Note that svy: mean does not display estimated standard deviations, but it does display standard errors. Using over is equivalent to invoking the subpop() option for each category. This will usually produce standard errors different (usually larger) than those obtained by specifying an if option. In the following example, the standard error is smaller.

Code:

sysuse auto, clear

. svyset rep78


. svy: mean mpg if foreign==1

Number of strata =       1        Number of obs   =         21
Number of PSUs   =       3        Population size =         21
                                  Design df       =          2

--------------------------------------------------------------
             |             Linearized
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
         mpg |   25.28571    .680034      22.35976    28.21166
--------------------------------------------------------------

. svy, subpop(if foreign==1): mean mpg

Number of strata =       1        Number of obs   =         69
Number of PSUs   =       5        Population size =         69
                                  Subpop. no. obs =         21
                                  Subpop. size    =         21
                                  Design df       =          4

--------------------------------------------------------------
             |             Linearized
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
         mpg |   25.28571   .6207832      23.56214    27.00928
--------------------------------------------------------------

. svy: mean mpg, over(foreign)

Number of strata =       1        Number of obs   =         69
Number of PSUs   =       5        Population size =         69
                                  Design df       =          4

     Domestic: foreign = Domestic
      Foreign: foreign = Foreign

--------------------------------------------------------------
             |             Linearized
        Over |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
mpg          |
    Domestic |   19.54167   .7186234      17.54645    21.53689
     Foreign |   25.28571   .6207832      23.56214    27.00928
--------------------------------------------------------------

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2

Comment

Nona Karalashvili

Join Date: Aug 2014
Posts: 20

03 May 2015, 01:21

Please see below. I also had an impression that over was equivalent to invoking the subpop() option for each category (especially after reading the manual that focuses on subpop using the terminology that indirectly suggests equivalence), but as you will see below over gives out different standard error than subpop (I am not showing subpop() for all categories for brevity, they are all different from what over gives out).

I just want to understand what each of these are and reading the exact formulas that Stata uses would be the best way for me to understand.

Thank you!

Code:

. svy: mean perf1 if size == 1
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =       1          Number of obs    =     177
Number of PSUs   =     177          Population size  =     177
                                    Design df        =     176

--------------------------------------------------------------
             |             Linearized
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
       perf1 |   4.592206   4.008757     -3.319214    12.50363
--------------------------------------------------------------

. svy, subpop(if size==1): mean perf1
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =       1          Number of obs    =     365
Number of PSUs   =     365          Population size  =     365
                                    Subpop. no. obs  =     177
                                    Subpop. size     =     177
                                    Design df        =     364

--------------------------------------------------------------
             |             Linearized
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
       perf1 |   4.592206   4.002904     -3.279516    12.46393
--------------------------------------------------------------

. svy: mean perf1, over(size)
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =       1          Number of obs    =     269
Number of PSUs   =     269          Population size  =     269
                                    Design df        =     268

    _subpop_1: size = small(<20)
    _subpop_2: size = medium(20-99)
    _subpop_3: size = large(100 and over)

--------------------------------------------------------------
             |             Linearized
        Over |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
perf1        |
   _subpop_1 |   4.592206   4.004868     -3.292799    12.47721
   _subpop_2 |    19.4163   5.607306       8.37633    30.45628
   _subpop_3 |   34.32737   12.72553      9.272651     59.3821
--------------------------------------------------------------

Comment

Nona Karalashvili

Join Date: Aug 2014

Posts: 20
#4

03 May 2015, 01:41

I just noticed that when I drop observation for which "perf1 == .", then

Code:

svy, subpop(if size==1): mean perf1

does produce the same output as

Code:

svy: mean perf1, over(size)

This might mean that the formulas for subpop() do apply.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#5

03 May 2015, 13:32

The subpop() formulas are shown in the Chapter "subpopulation estimation" in the Survey Data Manual or in any good sampling text. The manual entry states:

The svy: mean, svy: proportion, svy: ratio, and svy: total commands also have the over() option to perform
estimation for multiple subpopulations.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Announcement