Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unexpected results from svycal

    Hi all,

    I'm working through the book Spatial Microsimulation with R and converting the code examples to Stata as I go.

    The book provides a small toy dataset to demonstrate some concepts as below:
    Code:
    id    sex_num    age_num
    1    1    2
    2    1    2
    3    1    1
    4    2    2
    5    2    1
    One of the exercises is to use iterative proportional fitting to weight these data to a population that has the following characteristics:
    Age 1: 7
    Age 2: 4
    Sex 1: 3
    Sex 2: 8

    To do this in Stata I used the following code:
    Code:
    svycal rake i.age_num i.sex_num, generate(weight) totals( ///
    _cons = 11 ///
    1.age_num = 7 ///
    2.age_num = 4 ///
    1.sex_num = 3 ///
    2.sex_num = 8)
    This appears to produce correct results, but if I edit the code to increase the population as below I get unexpected results.
    Code:
    svycal rake i.age_num i.sex_num, generate(weight10) totals( ///
    _cons = 110 ///
    1.age_num = 70 ///
    2.age_num = 40 ///
    1.sex_num = 30 ///
    2.sex_num = 80)
    Running this produces the following dataset. I've tested these numbers using ipfraking from SSC and that seems to produce correct result. Am I missing something about svycal that is causing this?

    Code:
    id    sex_num    age_num    weight    weight10
    1    1    2    .72508278    6.517e-11
    2    1    2    .72508278    6.517e-11
    3    1    1    1.5498344    1.141e-10
    4    2    2    2.5498344    40
    5    2    1    5.4501656    70

  • #2
    Apoligies for bumping such an old thread, but I'm curious as to whether this is a bug in svycal.

    Comment


    • #3
      svycal rake can settle (converge???) to bad adjusted values when
      the overall population size you specify is on a different scale than the
      design weight's sum. This typically only happens when design weights
      are not specified.

      In your example, you do not specify weights, so a weight of 1 is
      assumed, which totals to 5 (the sample size) for your data.

      If the population size is in fact 110, then I would expect the design
      weights to be 22 per observation, assuming SRS.

      If you rerun your example, but specify pweights as 22
      Code:
      svycal rake i.age_num i.sex_num [pw=22], ///
              generate(weight) ///
              totals( ///
                      _cons = 110 ///
                      1.age_num = 70 ///
                      2.age_num = 40 ///
                      1.sex_num = 30 ///
                      2.sex_num = 80 ///
              )
      you get the following
      Code:
      . list
      
           +------------------------------------+
           | id   sex_num   age_num      weight |
           |------------------------------------|
        1. |  1         1         2   .72508278 |
        2. |  2         1         2   .72508278 |
        3. |  3         1         1   1.5498344 |
        4. |  4         2         2   2.5498344 |
        5. |  5         2         1   5.4501656 |
           +------------------------------------+
      Actually, any pweight value greater than 1 will yield the above
      adjusted weights.

      svycal regress is not so sensitive to such scale differences, but
      yields different adjusted values.
      Code:
      svycal regress i.age_num i.sex_num, ///
              generate(weight_reg) ///
              totals( ///
                      _cons = 110 ///
                      1.age_num = 70 ///
                      2.age_num = 40 ///
                      1.sex_num = 30 ///
                      2.sex_num = 80 ///
              )
      
      . list
      
           +------------------------------------------------+
           | id   sex_num   age_num    weight10   weight_~g |
           |------------------------------------------------|
        1. |  1         1         2   7.2508278   4.2857143 |
        2. |  2         1         2   7.2508278   4.2857143 |
        3. |  3         1         1   15.498344   21.428571 |
        4. |  4         2         2   25.498344   31.428571 |
        5. |  5         2         1   54.501656   48.571429 |
           +------------------------------------------------+
      When in doubt, you can check your weight calibration specification using
      svyset and svy: total.

      Here we check the calibration weights created by rake()
      Code:
      gen one = 1
      gen wt_design = 22
      svyset _n [pw=wt_design], ///
              rake(i.age_num i.sex_num, ///
                      totals( ///
                              _cons = 110 ///
                              1.age_num = 70 ///
                              2.age_num = 40 ///
                              1.sex_num = 30 ///
                              2.sex_num = 80 ///
                      ) ///
              )
      svy: total one i.age_num i.sex_num
      (running total on estimation sample)
      
      Survey: Total estimation
      
      Number of strata = 1                     Number of obs   =   5
      Number of PSUs   = 5                     Population size = 110
      Calibration: rake                        Design df       =   4
      
      --------------------------------------------------------------
                   |             Linearized
                   |      Total   std. err.     [95% conf. interval]
      -------------+------------------------------------------------
               one |        110          .             .           .
                   |
           age_num |
                1  |         70   1.57e-15            70          70
                2  |         40   1.72e-15            40          40
                   |
           sex_num |
                1  |         30   4.88e-15            30          30
                2  |         80   1.00e-14            80          80
      --------------------------------------------------------------
      Here we check the calibration weights created by regress()
      Code:
      svyset _n, ///
              regress(i.age_num i.sex_num, ///
                      totals( ///
                              _cons = 110 ///
                              1.age_num = 70 ///
                              2.age_num = 40 ///
                              1.sex_num = 30 ///
                              2.sex_num = 80 ///
                      ) ///
              )
      svy: total one i.age_num i.sex_num
      (running total on estimation sample)
      
      Survey: Total estimation
      
      Number of strata = 1                     Number of obs   =   5
      Number of PSUs   = 5                     Population size = 110
      Calibration: regress                     Design df       =   4
      
      --------------------------------------------------------------
                   |             Linearized
                   |      Total   std. err.     [95% conf. interval]
      -------------+------------------------------------------------
               one |        110          .             .           .
                   |
           age_num |
                1  |         70   6.15e-15            70          70
                2  |         40   1.88e-14            40          40
                   |
           sex_num |
                1  |         30   7.94e-15            30          30
                2  |         80   1.56e-14            80          80
      --------------------------------------------------------------
      The estimated totals should match closely to the specified totals, and
      the linearized standard error should be close to zero (i.e. missing or
      tiny given finite precision computers).

      Comment


      • #4
        Many thanks for your detailed response, this is very helpful.

        Adding design weights to the code has solved the issue in the majority of cases, but I've been experimenting and sometimes it will still produce strange results unless I set the design weight to a value higher than the population size divided by the sample size.

        Is there a method for calculating this value?

        I've included an example as an attachment as it's probably too long to post in code tags.
        Attached Files

        Comment

        Working...
        X