Unexpected results from svycal

Dominic Peel

Join Date: May 2016

Posts: 7
#1

Unexpected results from svycal

28 Jun 2022, 18:08

Hi all,

I'm working through the book Spatial Microsimulation with R and converting the code examples to Stata as I go.

The book provides a small toy dataset to demonstrate some concepts as below:

Code:

id sex_num age_num 1 1 2 2 1 2 3 1 1 4 2 2 5 2 1

One of the exercises is to use iterative proportional fitting to weight these data to a population that has the following characteristics:
Age 1: 7
Age 2: 4
Sex 1: 3
Sex 2: 8

To do this in Stata I used the following code:

Code:

svycal rake i.age_num i.sex_num, generate(weight) totals( /// _cons = 11 /// 1.age_num = 7 /// 2.age_num = 4 /// 1.sex_num = 3 /// 2.sex_num = 8)

This appears to produce correct results, but if I edit the code to increase the population as below I get unexpected results.

Code:

svycal rake i.age_num i.sex_num, generate(weight10) totals( /// _cons = 110 /// 1.age_num = 70 /// 2.age_num = 40 /// 1.sex_num = 30 /// 2.sex_num = 80)

Running this produces the following dataset. I've tested these numbers using ipfraking from SSC and that seems to produce correct result. Am I missing something about svycal that is causing this?

Code:

id sex_num age_num weight weight10 1 1 2 .72508278 6.517e-11 2 1 2 .72508278 6.517e-11 3 1 1 1.5498344 1.141e-10 4 2 2 2.5498344 40 5 2 1 5.4501656 70
Tags: None
Dominic Peel

Join Date: May 2016

Posts: 7
#2

24 Oct 2022, 21:20

Apoligies for bumping such an old thread, but I'm curious as to whether this is a bug in svycal.
Comment

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 746

26 Oct 2022, 14:39

svycal rake can settle (converge???) to bad adjusted values when
the overall population size you specify is on a different scale than the
design weight's sum. This typically only happens when design weights
are not specified.

In your example, you do not specify weights, so a weight of 1 is
assumed, which totals to 5 (the sample size) for your data.

If the population size is in fact 110, then I would expect the design
weights to be 22 per observation, assuming SRS.

If you rerun your example, but specify pweights as 22

Code:

svycal rake i.age_num i.sex_num [pw=22], ///
        generate(weight) ///
        totals( ///
                _cons = 110 ///
                1.age_num = 70 ///
                2.age_num = 40 ///
                1.sex_num = 30 ///
                2.sex_num = 80 ///
        )

you get the following

Code:

. list

     +------------------------------------+
     | id   sex_num   age_num      weight |
     |------------------------------------|
  1. |  1         1         2   .72508278 |
  2. |  2         1         2   .72508278 |
  3. |  3         1         1   1.5498344 |
  4. |  4         2         2   2.5498344 |
  5. |  5         2         1   5.4501656 |
     +------------------------------------+

Actually, any pweight value greater than 1 will yield the above
adjusted weights.

svycal regress is not so sensitive to such scale differences, but
yields different adjusted values.

Code:

svycal regress i.age_num i.sex_num, ///
        generate(weight_reg) ///
        totals( ///
                _cons = 110 ///
                1.age_num = 70 ///
                2.age_num = 40 ///
                1.sex_num = 30 ///
                2.sex_num = 80 ///
        )

. list

     +------------------------------------------------+
     | id   sex_num   age_num    weight10   weight_~g |
     |------------------------------------------------|
  1. |  1         1         2   7.2508278   4.2857143 |
  2. |  2         1         2   7.2508278   4.2857143 |
  3. |  3         1         1   15.498344   21.428571 |
  4. |  4         2         2   25.498344   31.428571 |
  5. |  5         2         1   54.501656   48.571429 |
     +------------------------------------------------+

When in doubt, you can check your weight calibration specification using
svyset and svy: total.

Here we check the calibration weights created by rake()

Code:

gen one = 1
gen wt_design = 22
svyset _n [pw=wt_design], ///
        rake(i.age_num i.sex_num, ///
                totals( ///
                        _cons = 110 ///
                        1.age_num = 70 ///
                        2.age_num = 40 ///
                        1.sex_num = 30 ///
                        2.sex_num = 80 ///
                ) ///
        )
svy: total one i.age_num i.sex_num
(running total on estimation sample)

Survey: Total estimation

Number of strata = 1                     Number of obs   =   5
Number of PSUs   = 5                     Population size = 110
Calibration: rake                        Design df       =   4

--------------------------------------------------------------
             |             Linearized
             |      Total   std. err.     [95% conf. interval]
-------------+------------------------------------------------
         one |        110          .             .           .
             |
     age_num |
          1  |         70   1.57e-15            70          70
          2  |         40   1.72e-15            40          40
             |
     sex_num |
          1  |         30   4.88e-15            30          30
          2  |         80   1.00e-14            80          80
--------------------------------------------------------------

Here we check the calibration weights created by regress()

Code:

svyset _n, ///
        regress(i.age_num i.sex_num, ///
                totals( ///
                        _cons = 110 ///
                        1.age_num = 70 ///
                        2.age_num = 40 ///
                        1.sex_num = 30 ///
                        2.sex_num = 80 ///
                ) ///
        )
svy: total one i.age_num i.sex_num
(running total on estimation sample)

Survey: Total estimation

Number of strata = 1                     Number of obs   =   5
Number of PSUs   = 5                     Population size = 110
Calibration: regress                     Design df       =   4

--------------------------------------------------------------
             |             Linearized
             |      Total   std. err.     [95% conf. interval]
-------------+------------------------------------------------
         one |        110          .             .           .
             |
     age_num |
          1  |         70   6.15e-15            70          70
          2  |         40   1.88e-14            40          40
             |
     sex_num |
          1  |         30   7.94e-15            30          30
          2  |         80   1.56e-14            80          80
--------------------------------------------------------------

The estimated totals should match closely to the specified totals, and
the linearized standard error should be close to zero (i.e. missing or
tiny given finite precision computers).

Comment

Dominic Peel

Join Date: May 2016

Posts: 7
#4

27 Oct 2022, 17:41

Many thanks for your detailed response, this is very helpful.

Adding design weights to the code has solved the issue in the majority of cases, but I've been experimenting and sometimes it will still produce strange results unless I set the design weight to a value higher than the population size divided by the sample size.

Is there a method for calculating this value?

I've included an example as an attachment as it's probably too long to post in code tags.
Attached Files

Raking.do (10.8 KB, 1 view)
Comment

Announcement