Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • subpop option with survey data

    Hi,

    I have complex survey data for the health outcomes of males and females in a population in two separate years. I am trying to obtain the variation in the health outcomes of the overall, female and male population between one and the other year. Strangely, when I perform this analysis using sample weights, the variation for the total population is smaller than that of the female population and smaller than that of the male population. This is unexpected and counterintuitive.

    My code is:

    svy, subpop(male): reg outcome year2
    svy, subpop(female): reg outcome year2
    svy, subpop(male): reg outcome year2

    Can anyone help? thanks!

  • #2
    First, I suspect your code is wrong as the first and third lines are identical. Second it would help to see your output so we could see what it is that concerns you. Use code tags to make the output legible; see point 12 in the FAQ.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Ok, so I hereby provide my code and stata output:

      Code:
      svyset [pweight=w_fstuwt], brrweight(w_fstr1-w_fstr80) vce(brr) fay(0.5) mse
      
      svy brr: reg pv1r year2003
      
      gen male=0
      replace male=1 if female==0
      
      svy brr, subpop(female): reg pv1r year2003
      svy brr, subpop(male): reg pv1r year2003
      
      
      BRR replications (80)
      ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
      ..................................................    50
      ..............................
      
      Survey: Linear regression                       Number of obs      =     11150
                                                      Population size    = 73546.129
                                                      Replications       =        80
                                                      Design df          =        79
                                                      F(   1,     79)    =     27.48
                                                      Prob > F           =    0.0000
                                                      R-squared          =    0.0114
      
      ------------------------------------------------------------------------------
                   |                BRR *
           pv1r |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
          year2003 |   23.33034   4.450639     5.24   0.000     14.47157    32.18912
             _cons |   411.6454   3.086953   133.35   0.000      405.501    417.7898
      ------------------------------------------------------------------------------
      
      BRR replications (80)
      ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
      ..................................................    50
      ..............................
      
      Survey: Linear regression                       Number of obs      =     11150
                                                      Population size    = 73546.129
                                                      Subpop. no. of obs =      5579
                                                      Subpop. size       = 38431.825
                                                      Replications       =        80
                                                      Design df          =        79
                                                      F(   1,     79)    =     28.12
                                                      Prob > F           =    0.0000
                                                      R-squared          =    0.0138
      
      ------------------------------------------------------------------------------
                   |                BRR *
           pv1r|      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
          year2003 |   24.00183   4.526299     5.30   0.000     14.99245     33.0112
             _cons |   428.8713    3.07544   139.45   0.000     422.7498    434.9928
      ------------------------------------------------------------------------------
      
      BRR replications (80)
      ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
      ..................................................    50
      ..............................
      
      Survey: Linear regression                       Number of obs      =     11150
                                                      Population size    = 73546.129
                                                      Subpop. no. of obs =      5571
                                                      Subpop. size       = 35114.303
                                                      Replications       =        80
                                                      Design df          =        79
                                                      F(   1,     79)    =     17.84
                                                      Prob > F           =    0.0001
                                                      R-squared          =    0.0111
      
      ------------------------------------------------------------------------------
                   |                BRR *
           pv1r |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
          year2003 |   24.07191   5.699001     4.22   0.000     12.72833    35.41548
             _cons |   392.1133   3.775034   103.87   0.000     384.5993    399.6273
      ------------------------------------------------------------------------------
      As you can see, the regression for males and females points to a difference between years of around 24 points and the overall difference is 23.3.

      Comment


      • #4
        I suspect that the problem comes from the weighting structure, but I don't see how.

        Comment


        • #5
          Shouldn't there be a dummy variable for gender in the pooled model?
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            That is not the problem I think

            Comment


            • #7
              The phenomenon you observe is an example of Simpson's paradox: the gender-specific differences differ from the pooled difference. It will occur whenever the weighted sex ratio is not the same in the two years. In the example below, the year 1 year 2 difference is 2 for both genders, but 1.9 in the pooled data.

              Thanks for the listing with code delimiters , but in the future, please also show the commands and results as they appear in the log. Otherwise, it is difficult to know immediately which command goes with which results.

              Code:
              . input year gender vmean count
              
                        year     gender      vmean      count
                1. 1  1  3 45
                2. 1  2  5 55
                3. 2  1  5 50
                4. 2  2  7 50
                5. end
              .
              . tabulate  year gender  [fw = count]
              
                         |        gender
                    year |         1          2 |     Total
              -----------+----------------------+----------
                       1 |        45         55 |       100
                       2 |        50         50 |       100
              -----------+----------------------+----------
                   Total |        95        105 |       200
              .
              . table   year gender [fw = count], c(mean vmean)
              
              ----------------------
                        |   gender  
                   year |    1     2
              ----------+-----------
                      1 |    3     5
                      2 |    5     7
              ----------------------
              
              . table   year     [fw = count], c( mean vmean)  // pooled
              
              -----------------------
                   year | mean(vmean)
              ----------+------------
                      1 |         4.1
                      2 |           6
              -----------------------
              Last edited by Steve Samuels; 21 Jul 2015, 17:58.
              Steve Samuels
              Statistical Consulting
              [email protected]

              Stata 14.2

              Comment

              Working...
              X