Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating Weighted Median by 2 Grouping Variables - Labeling Values and Displaying Table

    Hi! I would like to create a table that displays weighted median earnings by 2 categorical variables detailed race-ethnicity and sex using 5 year estimate American Community Survey data. Ultimately I want to create a table that looks something like this.

    Race Men's Median Earnings Women's Median Earnings
    Chinese 11111 44444
    Japanese 22222 55555
    Cambodian 33333 66666
    I am using STATA 18 and so far have created a variable that calculates median earnings (incearn) by race-ethnicity (raced) for my groups of interest (raced=400-699) applying analytic weights (perwt) using this extremely helpful extremely helpful FAQ

    Code:
    gen AANHPI_dmed = .
    quietly forvalues i = 400/699 { 
        summarize incearn [w=perwt] if raced == `i', detail 
        replace AANHPI_dmed = r(p50) if raced == `i' 
            }
    What I am not sure of is whether I can also specify that I want medians by gender in the code above or if I should do that in a separate step.

    I would also like to know how to attach race-ethnicity labels to the values generated by the code listed above and how to ultimately produce a table like the example I give above.

    Thank you in advance for your help!

    Here is a sample of my data

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long incearn byte sex int(raced perwt)
      8502 1 610   6
     25506 1 662  16
     24291 1 610   2
     10081 1 610  12
       972 1 671  10
      8502 1 610   9
     13846 2 400  15
     13846 2 400  15
     13846 2 400  13
     25506 1 662  11
     13846 2 400  12
      8502 1 610   7
     10081 1 610   8
      8502 1 610   1
     13846 2 400   9
        24 1 671   8
     25506 1 662  18
     17489 1 630   3
     25506 1 662   2
     24291 1 610  15
     24291 1 610  22
     15668 1 620   1
      9716 1 610  11
     24291 1 610  18
     34007 1 610  15
     10081 1 610  14
     12753 2 400  27
     43724 2 663   1
    103237 1 620  14
     60727 2 620   5
       243 2 620  18
    468816 1 669   7
       668 2 669   4
       243 1 669   6
    213761 1 669   2
    194328 1 400  61
    468816 2 400  17
     43116 2 400   6
     40202 1 610   4
     40202 2 610   3
    468816 2 610   2
    114168 1 610   4
    468816 1 610   2
     78946 1 664   4
     60727 2 664   1
    128742 1 664  10
     13846 2 400   3
     46760 1 610   3
      6073 2 400  11
     72873 1 610   1
     78946 1 620   7
     60727 2 630   2
      6073 2 630  12
      6073 2 630  12
    145746 1 620   1
     24777 2 620   1
     14575 2 640  27
     48582 2 600  25
     29635 2 500   4
     42509 1 500   2
      4858 1 640   1
      3644 2 640   1
     18218 2 640   2
      9595 2 640   1
     29149 1 610  10
     29149 2 610  11
     18218 1 610  14
     66800 1 610  65
     15789 1 680  48
     42874 2 400  78
     43724 2 400   1
     48582 1 640  26
    106880 2 620  15
     19433 2 620   1
     82589 1 685   1
     43724 2 685   1
     20162 1 685   2
    468816 2 400   1
     24291 1 610   1
     99593 1 500  22
      2915 1 400 109
    109309 1 610   7
     66800 2 610   4
    472459 1 669  12
    468816 2 669   3
      1215 2 600   5
     69229 2 610   6
     78946 1 610  22
     92306 1 610  34
     21862 1 400  64
     54655 1 640   9
     29149 1 640  26
     29149 2 640  62
     37651 1 610   1
    222262 2 620  18
     85018 2 600   5
    468816 1 400 108
    115382 1 610  42
     14575 2 610   2
     85018 1 600   1
    end
    label values incearn INCEARN
    label values sex SEX
    label def SEX 1 "male", modify
    label def SEX 2 "female", modify
    label values raced RACED
    label def RACED 400 "chinese", modify
    label def RACED 500 "japanese", modify
    label def RACED 600 "filipino", modify
    label def RACED 610 "asian indian (hindu 1920_1940)", modify
    label def RACED 620 "korean", modify
    label def RACED 630 "hawaiian", modify
    label def RACED 640 "vietnamese", modify
    label def RACED 662 "laotian", modify
    label def RACED 663 "thai", modify
    label def RACED 664 "bangladeshi", modify
    label def RACED 669 "pakistani", modify
    label def RACED 671 "other asian, n.e.c", modify
    label def RACED 680 "samoan", modify
    label def RACED 685 "chamorro", modify



  • #2
    You should be able to build this table in one line.

    Code:
    table raced sex [w=perwt], statistic(median incearn)

    Comment


    • #3
      Thanks that's good to know! I am still interested in understanding whether I can create a variable that calculates weighted median by raced and sex in a single loop as I would like to have it in variable form not only as an output table. Appreciate any guidance on that point as well!

      Comment


      • #4
        Sure you can. Here is a general solution:

        Code:
        capture drop AANHPI_dmed
        gen AANHPI_dmed = .
        levelsof raced, local(races)
        qui foreach race in `races' {
            levelsof sex, local(sexes)
            foreach sex in `sexes'{
                summarize incearn [w=perwt] if raced == `race' & sex==`sex', detail
                replace AANHPI_dmed = r(p50) if raced == `race' & sex==`sex'
            }
        }
        Note that your forvalues loop works, but is a little suboptimal because it tries to evaluate races that are not valid. It probably doesn't matter much in practice, but I prefer the technique above because it only considers valid values of raced.

        Sex only has two cases, so you can equivalently avoid the nested loop and do it like this:

        Code:
        levelsof raced, local(races)
        qui foreach race in `races' {
            summarize incearn [w=perwt] if raced == `race' & sex==1, detail
            replace AANHPI_dmed = r(p50) if raced == `race' & sex==1
            summarize incearn [w=perwt] if raced == `race' & sex==2, detail
            replace AANHPI_dmed = r(p50) if raced == `race' & sex==2
        }
        To test that the code is correct, I'll just use the mean command knowing that the values are constant within race/sex groups, so the mean equals the desired median.

        Code:
        mean AANHPI_dmed, over(race sex)
        table raced sex [w=perwt], statistic(median incearn)
        Edit: I leave table at the end so that I can compare the means to the table and make sure they match. Just wanted to make that explicit.
        Last edited by Daniel Schaefer; 19 Mar 2026, 11:43.

        Comment


        • #5
          thank you so much!

          Comment

          Working...
          X