Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trouble getting confidence interval/margin of error for "counts"

    I am using the 3 year 2011-2013 American Community Survey Public Use Microdata data from IPUMS. We want to use STATA to calculate confidence intervals (and eventually margins of error) on counts within a specific variable (for instance # of men and # of women within the Sex variable). We figured out how to get confidence intervals on the averages of the Sex variable using the ci command, but are stumped on how to get it for counts. Ideally, we could get STATA to seamlessly crank confidence intervals for the data “observations" (i.e. Male/Female) within a variable (i.e. Sex), but we know we can get at that using "keep" or "if".

    We did find a nice spreadsheet from the Census Bureau that tells us the margin of error for different variables for that dataset, so we have something to “check” against. Unfortunately, it doesn’t tell us the margins of error for what we eventually want to get to (detailed occupational category totals and wage/salary info for them).

    Thanks,
    Anne

  • #2
    Have a look at the -total- command. -total-ing a 0/1 variable gives you counts. Also, since it seems you are using survey data, it is helpful that -total- works nicely with the -svy:- prefix.

    Comment


    • #3
      ci will give incorrect confidence intervals for the ACS, because it ignores the survey design.
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment


      • #4
        Thanks, Steve! I've just been pouring over the technical documentation. It seems that there are highly tailored formulas that I doubt I could get STATA to automate. Looks like I will be hand cranking these stats. https://www.census.gov/programs-surv...tion.2013.html

        Comment


        • #5
          There's no need to do anything by hand. There is an example of the svyset statement for the ACS in the Manual entry for svy sdr. However the names of the base and replicate weights are probably"perwt" and "repwtp". To be sure, you'll just have to check the variables names in your data.

          See:
          https://usa.ipums.org/usa-action/var.../group?id=tech
          and
          http://answers.popdata.org/What-diff...a-q834466.aspx
          Last edited by Steve Samuels; 10 Mar 2016, 15:52.
          Steve Samuels
          Statistical Consulting
          [email protected]

          Stata 14.2

          Comment


          • #6
            In a previous post, you stated that you were using Stata 11, whereas SDR (successive difference replication) capability first showed up in Stata 12. Stata 11 can analyze balanced repeated replicates (BRR) replicates. If you do a BRR analysis on SDR replicates, and compare to the correct SDR results, you will find that the BRR standard errors are too small by a factor of 2. Therefore you can get correct results if you multiply the BRR standard errors by 2. The code below uses a small program to automate this conversion. (The program multiplies variances by 4.)

            Code:
            /* Write a conversion program "brr_to_sdr" */
            capture program drop _all
            
            program define brr_to_sdr, eclass
                matrix b = e(b)
                matrix V = 4*e(V)
                ereturn post b V
            end
            
            use http://www.stata-press.com/data/r14/ss07ptx
            
            /*SDR Analysis */
            svyset [pw = pwgtp] , sdrweight(pwgtp*) vce(sdr)
            svy: mean agep
            estimates store orig_sdr
            
            /* BRR Analysis */
            svyset [pw = pwgtp] , brrweight(pwgtp*) vce(brr)
            svy:  mean agep
            
            /*Run  conversion program */
            brr_to_sdr
            
            /* Display Results  & compare to SDR*/
            ereturn display // BRR conversion
            estimates replay orig_sdr
            Here are the results of the last two commands: the standard errors & CIs are identical.

            Code:
            . ereturn display  // BRR conversion
            ------------------------------------------------------------------------------
                         |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                    agep |   34.24496   .0343891   995.81   0.000     34.17756    34.31236
            ------------------------------------------------------------------------------
            . estimates replay orig_sdr
            ----------------------------------------------------------------------------------------------------------------------------
            Model orig_sdr
            ----------------------------------------------------------------------------------------------------------------------------
            Survey: Mean estimation          Number of obs   =     230,817
                                             Population size =  23,904,380
                                             Replications    =          81
            
            --------------------------------------------------------------
                         |                 SDR
                         |       Mean   Std. Err.     [95% Conf. Interval]
            -------------+------------------------------------------------
                    agep |   34.24496   .0343891      34.17756    34.31236
            --------------------------------------------------------------
            Last edited by Steve Samuels; 11 Mar 2016, 11:39.
            Steve Samuels
            Statistical Consulting
            [email protected]

            Stata 14.2

            Comment

            Working...
            X