Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Different CIs with specified survey design using svy: mean and svy: tab


    Dear Statalisters,
    my first Post is a question concerning different confidence intervals with specified survey design using different commands. My original question was how i could test proportions of an illness in different subgroups against the propotion in the whole population.
    As a proprotion of a dummy variable coded with zeros and ones is the same as the mean I thought I could use the same approach I used to test means in subgroups against the population mean.
    In other words I wanted to do this:

    Code:
    use https://www.stata-press.com/data/r17/nhefs
    svyset psu2 [pw=swgt2], strata(strata2)
    
    svy: mean rural
    svy: mean rural, over(region)
    
    svy: reg rural i.region
    
    contrast gw.region, effects mcompare(bonferroni)
    I know for dummy variables one normally uses something like logit, but I thought in these case I could use a simple linear regression.
    All looked good to me, however when I started to compare the Confidence Intervals with other commands giving me proportions I noticed differences.

    Code:
    svy: mean rural
    svy: tab rural, ci
    quietly: svy: reg rural i.region
    margins
    This is the output of the command as you can see the proportion in the population is always the same, however the confidence intervals differ slightly from each other.

    Code:
    . svy: mean rural
    (running mean on estimation sample)
    
    Survey: Mean estimation
    
    Number of strata =  35           Number of obs   =      14,407
    Number of PSUs   = 105           Population size = 212,619,074
                                     Design df       =          70
    
    --------------------------------------------------------------
                 |             Linearized
                 |       Mean   std. err.     [95% conf. interval]
    -------------+------------------------------------------------
           rural |   .3181015   .0185216      .2811612    .3550417
    --------------------------------------------------------------
    
    . svy: tab rural, ci
    (running tabulate on estimation sample)
    
    Number of strata =  35                           Number of obs   =      14,407
    Number of PSUs   = 105                           Population size = 212,619,074
                                                     Design df       =          70
    
    ----------------------------------------------
    rural     |
    residence | proportion          lb          ub
    ----------+-----------------------------------
            0 |      .6819       .6439       .7176
            1 |      .3181       .2824       .3561
              |
        Total |          1                        
    ----------------------------------------------
    Key: proportion = Cell proportion
                 lb = Lower 95% confidence bound for cell proportion
                 ub = Upper 95% confidence bound for cell proportion
    
    . quietly: svy: reg rural i.region
    
    . margins
    
    Predictive margins
    
    Number of strata =  35                           Number of obs   =      14,407
    Number of PSUs   = 105                           Population size = 212,619,074
    Model VCE: Linearized                            Design df       =          70
    
    Expression: Linear prediction, predict()
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           _cons |   .3181015   .0188586    16.87   0.000     .2804892    .3557137
    ------------------------------------------------------------------------------

    So now I have two questions:

    1. Why do the confidence intervals differ? In R it is common practice to calculate the proportion by calculating means of dummy variables. Is one method the "right one"?
    2. Taking in consideration that these methods result in different confidence intervals, can I still use the method above to test the proportions in the subgroups against the proportions in the population? (but maybe I should open another one for this one)

    Thank you very much for your help!
    Stephan
    Last edited by Stephan Junker; 10 Sep 2021, 04:04.

  • #2
    Is there anything more needed to make it possible to engage with this problem or have I phrased my question badly?
    Please give me a hint if some more information or rephrasing is needed.
    Thank you
    Stephan

    Comment


    • #3
      Hello Stephan

      There are several methods of calculating a confidence interval for a proportion. Your intervals differ because "mean" and "prop" use different methods. "mean" calculates the standard error of the mean and gives the CI as estimate +/- t*se. "prop" uses a transformation. It finds a CI for the logit of the proportion and then transforms back to get a CI for the proportion. The advantage of the second approach is that it ensures the confidence interval doesn't include negative values or values greater than one, so I think it's generally preferred.

      Nevertheless, on your second question, I think the answer is that it doesn't matter which one you use. Provided your sample sizes aren't small, there shouldn't be much difference between the two intervals. So they should give very similar results. You say you want to use the interval for testing. If you intend to check whether a particular value is in a 95% CI, then the methods will usually give the same results. It's possible that the two will differ in the case where p is close to 5%. But I don't think you're saying the result is interesting if p = 4.9% and uninteresting if p = 5.1%. So the exact value of p isn't important.

      Hope that makes sense.

      Comment

      Working...
      X