Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • -ipfraking- with aggregated data?

    As part of an exercise that will eventually involve n-variate raking, I set out to reproduce the raking in example in Swanson and Siegel:
    Click image for larger version

Name:	Swanson.png
Views:	1
Size:	94.4 KB
ID:	1779172






    I was able to reproduce their results exactly, by creating an 8 242-record unit file (attached as simpleraking_Swanson.dta, ignoring the unknown ages; 285 duplicates of males aged less than 5, etc.), using the following code
    use "simpleraking_Swanson.dta", clear
    matrix define total_sex =( 1397073, 1445248)
    matrix rownames total_sex = sex
    matrix colnames total_sex =_one:1 _one:2

    matrix define total_age=(201421,411140,1305492,532944,391324)
    matrix rownames total_age = age
    matrix colnames total_age =_one:1 _one:2 _one:3 _one:4 _one:5
    ipfraking [pw=_one], ctotal(total_age total_sex) generate(rakedwgt2) iter(20)
    tab age sex [iw =rakedwgt2]
    Click image for larger version

Name:	ipfraking.png
Views:	1
Size:	18.6 KB
ID:	1779174







    All well and good.

    BUT - the raking I wish to do is not amenable to creating a unit-record data file (it will be too big). So, I have been trying to reproduce the above with a file of 10 records with age and sex, and the count (as above) of elements in each combination of age and sex.
    I cannot, however, work out what the weights should be that allow the derivation of the same results ... The closest I have been able to get with the aggregated file (..._weighted.dta), is by

    use "simpleraking_Swanson_weighted.dta", clear
    matrix define total_sex =( 1397073, 1445248)
    matrix rownames total_sex = sex
    matrix colnames total_sex =_one:1 _one:2

    matrix define total_age=(201421,411140,1305492,532944,391324)
    matrix rownames total_age = age
    matrix colnames total_age =_one:1 _one:2 _one:3 _one:4 _one:5
    ipfraking [pw=_one], ctotal(total_age total_sex) generate(rakedwgt2) iter(20)
    gen temp = raked * _one
    tab age sex [iw =temp]


    Which provides the correct marginals, but the incorrect cells ...

    Click image for larger version

Name:	ipfraking2.png
Views:	1
Size:	18.8 KB
ID:	1779180

    I am sure I am missing something obvious ... so any suggestions welcome!
    Attached Files
    Last edited by Tom Moultrie; 23 Jun 2025, 23:35.

  • #2
    You've confused ipfraking by making _one to be... not 1. In your weighted data,

    Code:
    rename _one input_weight
    gen byte _one = 1
    ipfraking [pw=input_weight], ctotal(total_age total_sex) generate(rakedwgt3) iter(20)
    -- Stas Kolenikov || http://stas.kolenikov.name
    -- Principal Survey Scientist, Abt SRBI
    -- Opinions stated in this post are mine only

    Comment


    • #3
      Originally posted by skolenik View Post
      You've confused ipfraking by making _one to be... not 1. In your weighted data,

      Code:
      rename _one input_weight
      gen byte _one = 1
      ipfraking [pw=input_weight], ctotal(total_age total_sex) generate(rakedwgt3) iter(20)
      Dear Stas

      Thank you (and face-palm!). But hopefully it will clarify matters for others!

      Tom

      Comment

      Working...
      X