Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Lopsided confidence intervals with svy commands and how to export table estimates with confidence intervenal

    Hello. A few related questions from one project. I am using Stata/MP 15.1.

    I want to produce many tables from a voter registration and voting survey. For instance, the code below gives the voting rate (weighted and averaged over several years) for Wisconsin citizens who are either part-time or full-time high school students. The results are shown for mid-term election cycles (pres==0) and presidential cycles (pres==1).
    Code:
    table pres [pw=weight] if stateabb=="WI" & ptfths==1, c(mean voted) f(%5.3fc)
    pres mean(voted)
    0 0.316
    1 0.604
    I had hoped to use the -xtable- command (with syntax similar to the above example) because it would allow me to run many such tables and export them quickly into MS Excel files with sheets for each of many subpopulations. However, I want the tables to include confidence intervals on the estimates because the samples for some subpopulations are small (n= 100 for the Wisconsin student example above). The method I know for getting confidence intervals for proportions is the -svy tabulate- command, see below, but I do not know how to export tidy tables from this command. First question: What other ways are there to produce estimates and CIs, ideally in a single table? Subquestion: What is the best way to export for any of these methods? For instance, -putdocx- doesn't seem to work for -svy tabulate-. The exported file format doesn't have to be Excel.

    The second question relates to using svy tabulate to get the CIs. To set the weights, I used
    Code:
    svyset _n [pweight=weight]
    . I then issued this command:
    Code:
    svy, subpop(if stateabb=="WI" & ptfths==1): tabulate pres voted, ci row
    Here is the output:
    (running tabulate on estimation sample)
    Number of strata = 1 Number of obs = 815,414
    Number of PSUs = 815,414 Population size = 1,842,740,528
    Subpop. no. obs = 100
    Subpop. size = 261,136.862793
    Design df = 815,413
    voted
    Cycle Did not vote Voted
    Midterm .6841 .3159
    [.5462,.7957] [.2043,.4538]
    President .3963 .6037
    [.2601,.5507] [.4493,.7399]
    Total .5309 .4691
    [.4274,.6318] [.3682,.5726]
    Key: row proportion
    [95% confidence interval for row proportion]
    Pearson:
    Uncorrected chi2(1) = 6.75e+04
    Design-based F(1, 815413) = 7.6817 P = 0.0056
    Second question: Why are the lower and upper CI bounds uneven around the estimated voting rate? E.g., for voting in presidential election cycles: 60.37 - 44.93 = 15.44 for the lower bound and 73.99 -60.37 = 13.62 for the upper bound. I rarely analyze small sample estimates or use the -svy- way of handling weights (I usually analyze with regression and use [pweight==weight]), I have not encountered this before. Perhaps I am using the -subpop- option of the -svy- prefix incorrectly? If there is another strategy to get the estimates and CI together, perhaps that won't produce this problem, but I still wonder what is going on here.

    I also noticed if I just drop the other observations, instead of using the -subpop- option, I get a different set of bounds. The mean is the same but the bounds are different (yet, still lopsided): 74.2 - 60.37 = 13.83 & 60.37- 44.66 = 15.71. Third question: Why do these CIs differ from the output above using the -subpop- option?
    Code:
      preserve
      keep if statea=="WI" & ptfths==1
      svy:tabulate pres voted, ci row
      restore
    (running tabulate on estimation sample)
    Number of strata = 1 Number of obs = 100
    Number of PSUs = 100 Population size = 261,136.86
    Design df = 99
    voted
    Cycle Did not vote Voted
    Midterm .6841 .3159
    [.5437,.7974] [.2026,.4563]
    President .3963 .6037
    [.258,.5534] [.4466,.742]
    Total .5309 .4691
    [.4256,.6335] [.3665,.5744]
    Key: row proportion
    [95% confidence interval for row proportion]
    Pearson:
    Uncorrected chi2(1) = 8.2795
    Design-based F(1, 99) = 7.6049 P = 0.0069
    Thanks. I keep thinking I'm missing something obvious, but I don't use -svy- often. Perhaps there's a good primer out there on svy (besides what's in the manuals)?

    -Doug

  • #2
    In answer to your second question, confidence intervals aren’t required to be symmetric. Your CI will be symmetric on the logit scale, which is used for computing proportions by svy tab, then they are transformed to the more familiar probability scale which makes them asymmetric on that scale.

    Comment


    • #3
      Originally posted by Leonardo Guizzetti View Post
      In answer to your second question, confidence intervals aren’t required to be symmetric. Your CI will be symmetric on the logit scale, which is used for computing proportions by svy tab, then they are transformed to the more familiar probability scale which makes them asymmetric on that scale.
      Thanks. I didn't realize -svy tab- used logit for proportions.

      Comment

      Working...
      X