Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • svy- different se and confident intervals when using subpop or over

    Hi,

    Here is the svyset for my survey data:
    svyset psu [pw=wt_final], strata(strata1) fpc(fpc1) || school_id, strata(strata2) fpc(fpc2) || id, strata(strata3) fpc(fpc3) singleunit(centered)

    I use svy mean and the subpop statement to get the mean for grade 4 and 6 and region A. I used again svy mean but now using over(grade region) to compare the results. I get the same estimate but different standard error (slightly) and confidence intervals. Could someone explain to me why?. I really appreciate your input.

    Thanks!
    Jennifer

    . svy, subpop(if grade==4 & region=="REGION A"): mean adj_orf
    (running mean on estimation sample)

    Survey: Mean estimation

    Number of strata = 1 Number of obs = 1574
    Number of PSUs = 10 Population size = 182028
    Subpop. no. obs = 503
    Subpop. size = 63187.1
    Design df = 9

    --------------------------------------------------------------
    | Linearized
    | Mean Std. Err. [95% Conf. Interval]
    -------------+------------------------------------------------
    adj_orf | 4.637633 1.727553 .7296358 8.54563
    --------------------------------------------------------------
    Note: 2 strata omitted because they contain no subpopulation
    members.
    Note: Strata with single sampling unit centered at overall
    mean.

    . svy, subpop(if grade==6 & region==1): mean adj_orf
    (running mean on estimation sample)

    Survey: Mean estimation

    Number of strata = 1 Number of obs = 1577
    Number of PSUs = 10 Population size = 182253
    Subpop. no. obs = 517
    Subpop. size = 47116.9
    Design df = 9

    --------------------------------------------------------------
    | Linearized
    | Mean Std. Err. [95% Conf. Interval]
    -------------+------------------------------------------------
    adj_orf | 24.52088 4.222985 14.96783 34.07394
    --------------------------------------------------------------
    Note: 2 strata omitted because they contain no subpopulation
    members.
    Note: Strata with single sampling unit centered at overall
    mean.


    . svy: mean adj_orf, over(region grade)
    (running mean on estimation sample)

    Survey: Mean estimation

    Number of strata = 3 Number of obs = 3428
    Number of PSUs = 28 Population size = 461175
    Design df = 25

    Over: region grade
    _subpop_1: REGION A 4
    _subpop_2: REGION A 6
    _subpop_3: REGION B 4
    _subpop_4: REGION B 6
    _subpop_5: REGION C 4
    _subpop_6: REGION C 6

    --------------------------------------------------------------
    | Linearized
    Over | Mean Std. Err. [95% Conf. Interval]
    -------------+------------------------------------------------
    adj_orf |
    _subpop_1 | 4.637633 1.727522 1.079735 8.195532
    _subpop_2 | 24.52088 4.226485 15.81627 33.22549
    _subpop_3 | 10.21037 1.837862 6.425223 13.99552
    _subpop_4 | 35.90704 5.374154 24.83876 46.97532
    _subpop_5 | 10.59444 2.29463 5.868564 15.32032
    _subpop_6 | 33.77606 4.203415 25.11897 42.43316
    --------------------------------------------------------------
    Note: Strata with single sampling unit centered at overall
    mean.

    .

  • #2
    In the analysis using subpop() there was a footnote following the estimation table.
    The footnote was

    Code:
    Note: 2 strata omitted because they contain no subpopulation
    members.
    This means that observations were dropped from the estimation sample, and were not
    counted when computing the standard errors.

    In the analysis using over(), the above mentioned strata were not omitted because
    they overlapped with on of the other subpopulations identified by the over() variables.

    Comment


    • #3
      Hi Jeff,
      Does the note "2 strata omitted because they contain no subpopulation members." raise any concern to you? Do you mean that the -over()- option is better in this case?
      Thanks
      Stata MP 13 User

      Comment


      • #4
        Not really.

        For the subpop case, this just means that 2 strata were not relevant (i.e. did not provide any information)
        for the specified subpop.

        Using over benefits those of us who want to compare parameter estimates between multiple subpopulations.

        Comment

        Working...
        X