Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • svy: mean .., over () - need formula for standard deviation, not given in the manual

    Hi All,

    I am using the following command: svy: mean height, over(size). The standard deviations over each level of the size variable is different from the standard deviations calculated separately for each size. I do not understand why. I read the manual and formulas are not given for this particular case. Would you be so kind to let me know what formula is used to calculate standard deviations when using "over()" option?

    Many thanks
    Nona

  • #2
    Please show us the commands and results that you are comparing (FAQ 12). Note that svy: mean does not display estimated standard deviations, but it does display standard errors. Using over is equivalent to invoking the subpop() option for each category. This will usually produce standard errors different (usually larger) than those obtained by specifying an if option. In the following example, the standard error is smaller.

    Code:
    sysuse auto, clear
    
    . svyset rep78
    
    
    . svy: mean mpg if foreign==1
    
    Number of strata =       1        Number of obs   =         21
    Number of PSUs   =       3        Population size =         21
                                      Design df       =          2
    
    --------------------------------------------------------------
                 |             Linearized
                 |       Mean   Std. Err.     [95% Conf. Interval]
    -------------+------------------------------------------------
             mpg |   25.28571    .680034      22.35976    28.21166
    --------------------------------------------------------------
    
    . svy, subpop(if foreign==1): mean mpg
    
    Number of strata =       1        Number of obs   =         69
    Number of PSUs   =       5        Population size =         69
                                      Subpop. no. obs =         21
                                      Subpop. size    =         21
                                      Design df       =          4
    
    --------------------------------------------------------------
                 |             Linearized
                 |       Mean   Std. Err.     [95% Conf. Interval]
    -------------+------------------------------------------------
             mpg |   25.28571   .6207832      23.56214    27.00928
    --------------------------------------------------------------
    
    . svy: mean mpg, over(foreign)
    
    Number of strata =       1        Number of obs   =         69
    Number of PSUs   =       5        Population size =         69
                                      Design df       =          4
    
         Domestic: foreign = Domestic
          Foreign: foreign = Foreign
    
    --------------------------------------------------------------
                 |             Linearized
            Over |       Mean   Std. Err.     [95% Conf. Interval]
    -------------+------------------------------------------------
    mpg          |
        Domestic |   19.54167   .7186234      17.54645    21.53689
         Foreign |   25.28571   .6207832      23.56214    27.00928
    --------------------------------------------------------------
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      Please see below. I also had an impression that over was equivalent to invoking the subpop() option for each category (especially after reading the manual that focuses on subpop using the terminology that indirectly suggests equivalence), but as you will see below over gives out different standard error than subpop (I am not showing subpop() for all categories for brevity, they are all different from what over gives out).

      I just want to understand what each of these are and reading the exact formulas that Stata uses would be the best way for me to understand.

      Thank you!

      Code:
      . svy: mean perf1 if size == 1
      (running mean on estimation sample)
      
      Survey: Mean estimation
      
      Number of strata =       1          Number of obs    =     177
      Number of PSUs   =     177          Population size  =     177
                                          Design df        =     176
      
      --------------------------------------------------------------
                   |             Linearized
                   |       Mean   Std. Err.     [95% Conf. Interval]
      -------------+------------------------------------------------
             perf1 |   4.592206   4.008757     -3.319214    12.50363
      --------------------------------------------------------------
      
      . svy, subpop(if size==1): mean perf1
      (running mean on estimation sample)
      
      Survey: Mean estimation
      
      Number of strata =       1          Number of obs    =     365
      Number of PSUs   =     365          Population size  =     365
                                          Subpop. no. obs  =     177
                                          Subpop. size     =     177
                                          Design df        =     364
      
      --------------------------------------------------------------
                   |             Linearized
                   |       Mean   Std. Err.     [95% Conf. Interval]
      -------------+------------------------------------------------
             perf1 |   4.592206   4.002904     -3.279516    12.46393
      --------------------------------------------------------------
      
      . svy: mean perf1, over(size)
      (running mean on estimation sample)
      
      Survey: Mean estimation
      
      Number of strata =       1          Number of obs    =     269
      Number of PSUs   =     269          Population size  =     269
                                          Design df        =     268
      
          _subpop_1: size = small(<20)
          _subpop_2: size = medium(20-99)
          _subpop_3: size = large(100 and over)
      
      --------------------------------------------------------------
                   |             Linearized
              Over |       Mean   Std. Err.     [95% Conf. Interval]
      -------------+------------------------------------------------
      perf1        |
         _subpop_1 |   4.592206   4.004868     -3.292799    12.47721
         _subpop_2 |    19.4163   5.607306       8.37633    30.45628
         _subpop_3 |   34.32737   12.72553      9.272651     59.3821
      --------------------------------------------------------------

      Comment


      • #4
        I just noticed that when I drop observation for which "perf1 == .", then
        Code:
         
         svy, subpop(if size==1): mean perf1
        does produce the same output as
        Code:
         
         svy: mean perf1, over(size)
        This might mean that the formulas for subpop() do apply.

        Comment


        • #5
          The subpop() formulas are shown in the Chapter "subpopulation estimation" in the Survey Data Manual or in any good sampling text. The manual entry states:
          The svy: mean, svy: proportion, svy: ratio, and svy: total commands also have the over() option to perform
          estimation for multiple subpopulations.
          Steve Samuels
          Statistical Consulting
          [email protected]

          Stata 14.2

          Comment

          Working...
          X