Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem on calculating Confidence Intervals for proportions over groups

    Dear all,
    I have a problem on calculating confidence intervals for proportions over groups. The groups in my research is 'nhom'.
    When I use the option 'over', Stata shows me the results that are different than when I use if 'nhom'==0 or 'nhom'==1 as you can see from the codes below.
    I wonder I did something wrong. Could you help me to see my mistakes and explain why?
    Code:
    . tab nhom
    
           Nhom |
        benh/tu |
           vong |      Freq.     Percent        Cum.
    ------------+-----------------------------------
        Tu vong |        149        2.99        2.99
       Benh/tat |      4,838       97.01      100.00
    ------------+-----------------------------------
          Total |      4,987      100.00
    
    . proportion gioi, over(nhom) level(95)
    
    Proportion estimation               Number of obs    =    4987
    
              Nam: gioi = Nam
               Nu: gioi = Nu
    
        _subpop_1: nhom = Tu vong
        _subpop_2: nhom = Benh/tat
    
    --------------------------------------------------------------
            Over | Proportion   Std. Err.     [95% Conf. Interval]
    -------------+------------------------------------------------
    Nam          |
       _subpop_1 |    .557047   .0408314      .4762113    .6349701
       _subpop_2 |   .4772633   .0071818      .4632056    .4913571
    -------------+------------------------------------------------
    Nu           |
       _subpop_1 |    .442953   .0408314      .3650299    .5237887
       _subpop_2 |   .5227367   .0071818      .5086429    .5367944
    --------------------------------------------------------------
    
    . proportion gioi if nhom==1, level(95)
    
    Proportion estimation               Number of obs    =    4838
    
    --------------------------------------------------------------
                 | Proportion   Std. Err.     [95% Conf. Interval]
    -------------+------------------------------------------------
    gioi         |
             Nam |   .4772633   .0071818      .4632055    .4913572
              Nu |   .5227367   .0071818      .5086428    .5367945
    --------------------------------------------------------------
    
    . proportion gioi if nhom==0, level(95)
    
    Proportion estimation               Number of obs    =     149
    
    --------------------------------------------------------------
                 | Proportion   Std. Err.     [95% Conf. Interval]
    -------------+------------------------------------------------
    gioi         |
             Nam |    .557047   .0408314       .475564    .6355714
              Nu |    .442953   .0408314      .3644286     .524436
    --------------------------------------------------------------
    Thank you so much in advance!

  • #2
    Thong:
    you ended up with a different sample size for each command you ran: it affects standard errors and confidence interval bounds.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Firstly, thank you so much for your attention, Carlo. I do not understand what you mentioned. As I thought, when we use the option over, the confidence intervals will be calculated for the sample size within each groups. Therefore, Stata should gives me the same results if I use the condition 'if nhom==1 or nhom==0' for each groups

      For example:
      Code:
      . proportion gioi if nhom==1, level(95)
      
      Proportion estimation               Number of obs    =    4838
      
      --------------------------------------------------------------
                   | Proportion   Std. Err.     [95% Conf. Interval]
      -------------+------------------------------------------------
      gioi         |
               Nam |   .4772633   .0071818      .4632055    .4913572
                Nu |   .5227367   .0071818      .5086428    .5367945
      --------------------------------------------------------------
      
      . * n for this groups is 4838 obs.
      . proportion gioi if nhom==0, level(95)
      
      Proportion estimation               Number of obs    =     149
      
      --------------------------------------------------------------
                   | Proportion   Std. Err.     [95% Conf. Interval]
      -------------+------------------------------------------------
      gioi         |
               Nam |    .557047   .0408314       .475564    .6355714
                Nu |    .442953   .0408314      .3644286     .524436
      --------------------------------------------------------------
      
      . * n for this groups is 149 obs.
      . proportion gioi, over(nhom) level(95)
      
      Proportion estimation               Number of obs    =    4987
      
                Nam: gioi = Nam
                 Nu: gioi = Nu
      
          _subpop_1: nhom = Tu vong
          _subpop_2: nhom = Benh/tat
      By the above command, the obs for each groups are the same with:\
      n of _subpop_1=149 obs.
      n of _subpop_2=4838 obs.

      I am not sure about that. Maybe I misunderstand something. Could you explain it in more details.
      Thank you again Carlo.
      Last edited by Thong Nguyen; 30 Jun 2016, 08:21.

      Comment


      • #4
        Dear Carlo,
        What you mentioned is when using
        Code:
        proportion gioi, over(nhom) level(95)
        The confidence intervals will be calculated with the sample size equals 4987, regardless of total observations for each groups. As a result, Stata gives me different CIs rather than using
        Code:
        proportion gioi if nhom==1, level(95)
        and 
        proportion gioi if nhom==0, level(95)
        Am I right?

        Comment


        • #5
          Yes,
          this is what I suspect.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Thank you very much for your help, Carlo.

            Comment


            • #7
              Thanks for pointing this out, Thong. I was unaware of this behavior. Here's the reason for the difference you saw. proportion calls the command _svy_summarize. In that program the over() option triggers a survey subpopulation correction to the standard errors. This is appropriate for estimation in survey analysis for analyzing subgroups that are not fixed by design. (See the entry on subpopulation estimation on p. 59 of the Survey Manual.). But it is not correct for your your situation. The subpop will slightly increase the standard error of the proportions for the smallest group. To avoid triggering the subpopulation behavior,, the general solution to use subsetting "if" expressions in a loop..
              Code:
              forvalues i = 0/1{
              proportion gioi if nhom==`i'
              }
              So, the individual proportion statements that you ran are correct.
              Last edited by Steve Samuels; 30 Jun 2016, 15:15.
              Steve Samuels
              Statistical Consulting
              [email protected]

              Stata 14.2

              Comment


              • #8
                Thank you so much for your attention and your codes, Steve.

                Comment

                Working...
                X