Different CIs with specified survey design using svy: mean and svy: tab

Stephan Junker

Join Date: Sep 2021
Posts: 2

Different CIs with specified survey design using svy: mean and svy: tab

10 Sep 2021, 03:10

Dear Statalisters,
my first Post is a question concerning different confidence intervals with specified survey design using different commands. My original question was how i could test proportions of an illness in different subgroups against the propotion in the whole population.
As a proprotion of a dummy variable coded with zeros and ones is the same as the mean I thought I could use the same approach I used to test means in subgroups against the population mean.
In other words I wanted to do this:

Code:

use https://www.stata-press.com/data/r17/nhefs
svyset psu2 [pw=swgt2], strata(strata2)

svy: mean rural
svy: mean rural, over(region)

svy: reg rural i.region

contrast gw.region, effects mcompare(bonferroni)

I know for dummy variables one normally uses something like logit, but I thought in these case I could use a simple linear regression.
All looked good to me, however when I started to compare the Confidence Intervals with other commands giving me proportions I noticed differences.

Code:

svy: mean rural
svy: tab rural, ci
quietly: svy: reg rural i.region
margins

This is the output of the command as you can see the proportion in the population is always the same, however the confidence intervals differ slightly from each other.

Code:

. svy: mean rural
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =  35           Number of obs   =      14,407
Number of PSUs   = 105           Population size = 212,619,074
                                 Design df       =          70

--------------------------------------------------------------
             |             Linearized
             |       Mean   std. err.     [95% conf. interval]
-------------+------------------------------------------------
       rural |   .3181015   .0185216      .2811612    .3550417
--------------------------------------------------------------

. svy: tab rural, ci
(running tabulate on estimation sample)

Number of strata =  35                           Number of obs   =      14,407
Number of PSUs   = 105                           Population size = 212,619,074
                                                 Design df       =          70

----------------------------------------------
rural     |
residence | proportion          lb          ub
----------+-----------------------------------
        0 |      .6819       .6439       .7176
        1 |      .3181       .2824       .3561
          |
    Total |          1                        
----------------------------------------------
Key: proportion = Cell proportion
             lb = Lower 95% confidence bound for cell proportion
             ub = Upper 95% confidence bound for cell proportion

. quietly: svy: reg rural i.region

. margins

Predictive margins

Number of strata =  35                           Number of obs   =      14,407
Number of PSUs   = 105                           Population size = 212,619,074
Model VCE: Linearized                            Design df       =          70

Expression: Linear prediction, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       _cons |   .3181015   .0188586    16.87   0.000     .2804892    .3557137
------------------------------------------------------------------------------

So now I have two questions:

1. Why do the confidence intervals differ? In R it is common practice to calculate the proportion by calculating means of dummy variables. Is one method the "right one"?
2. Taking in consideration that these methods result in different confidence intervals, can I still use the method above to test the proportions in the subgroups against the proportions in the population? (but maybe I should open another one for this one)

Thank you very much for your help!
Stephan

Last edited by Stephan Junker; 10 Sep 2021, 04:04.

Tags: None

Stephan Junker

Join Date: Sep 2021

Posts: 2
#2

22 Sep 2021, 05:10

Is there anything more needed to make it possible to engage with this problem or have I phrased my question badly?
Please give me a hint if some more information or rephrasing is needed.
Thank you
Stephan
Comment
John DSouza

Join Date: Apr 2020

Posts: 6
#3

23 Sep 2021, 13:08

Hello Stephan

There are several methods of calculating a confidence interval for a proportion. Your intervals differ because "mean" and "prop" use different methods. "mean" calculates the standard error of the mean and gives the CI as estimate +/- t*se. "prop" uses a transformation. It finds a CI for the logit of the proportion and then transforms back to get a CI for the proportion. The advantage of the second approach is that it ensures the confidence interval doesn't include negative values or values greater than one, so I think it's generally preferred.

Nevertheless, on your second question, I think the answer is that it doesn't matter which one you use. Provided your sample sizes aren't small, there shouldn't be much difference between the two intervals. So they should give very similar results. You say you want to use the interval for testing. If you intend to check whether a particular value is in a 95% CI, then the methods will usually give the same results. It's possible that the two will differ in the case where p is close to 5%. But I don't think you're saying the result is interesting if p = 4.9% and uninteresting if p = 5.1%. So the exact value of p isn't important.

Hope that makes sense.
2 likes
Comment

Announcement

Different CIs with specified survey design using svy: mean and svy: tab

Comment

Comment