Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Percentage of a certain race in an occupation by county , month and year

    I want to figure out what percentage of African American and hispanic people work when the value of the SOC2 variable ==11 sorted by county, month, and year.

    Should I do this the following way?

    Code:
    bys county month year: egen blackhispan= total(cond(black, hispanic, soc2==11,.))
    bys county month year: egen alltotal= total(soc2==11)
    ​​​​​gen wanted= blackhispan / alltotal
    This is an example of my data.

    Code:
    input int race float(black hispanic soc2)
    100 0 0 53
    200 1 0 43
    100 0 0 47
    100 0 0 43
    100 0 0 43
    100 0 0 11
    100 0 0  .
    100 0 0 47
    200 1 0 47
    100 0 0 25
    100 0 0 11
    200 1 0 43
    200 1 0 41
    200 1 0  .
    200 1 0 37
    100 0 1 53
    100 0 0 53
    100 0 0 49
    100 0 0 41
    200 1 0 41
    100 0 0 51
    200 1 0  .
    100 0 0 35
    100 0 0 11
    100 0 1 51
    100 0 0 49
    100 0 0 43
    100 0 0 35
    200 1 0 47
    100 0 0 53
    100 0 0 11
    100 0 0 43
    100 0 0 35
    100 0 0 33
    100 0 0  .
    100 0 0 11
    100 0 0  .
    100 0 0 47
    100 0 0  .
    100 0 0 25
    200 1 0 53
    100 0 0 29
    100 0 0 41
    200 1 0 53
    200 1 0 53
    200 1 0 25
    Last edited by Tariq Abdullah; 08 Jul 2022, 10:54.

  • #2
    Your example data is useless. It does not contain the variables county, month, and year, which are an important part of what you are trying to calculate. Also, even if it were otherwise complete, it is inconvenient because it does not contain -clear- at the beginning or -end- at the end. Please use real -dataex- on a sample that can support the calculations you want help with, and post it correctly in the future.

    That said, the command -bys county month year: egen blackhispan= total(cond(black, hispanic, soc2==11,.))-, is wrong. Writing black, hispanic does not get you "black or hispanic." In fact you are counting only blacks here, and what you are adding up as a mix of whether the are hispanic if not black and whether the work in occupation 11 if they are black. The way to get "black or hispanic" is with the | operator. So that first line should be -bys county month year: egen blackhispan= total(cond(black | hispanic, soc2==11,.))-. See -help operator- to learn about all of Stata's logical operators.

    In any case, there is no need to make it so complicated. You can get the result you want more simply with:
    Code:
    assert !missing(black, hispanic)
    egen wanted = pc(black | hispanic), prop

    Comment


    • #3
      Additionally, your example data doesn't actually have any cases where an observation is either Black or Hispanic and soc2 is 11. In this case, "wanted" should always equal 0, because the number of Black or Hispanic observations with a soc2 of 11 is 0.

      cond() does not work the way you seem to expect. cond() is saying that if black is not zero, assign the value of Hispanic to the new variable; if black is zero, assign the value of soc2==11, which resolves to 1 if soc2 equals 11 and 0 otherwise; if black is missing assign missing to the new variable. This means that (e.g.) if black is 0 and Hispanic is 0 but soc2==1 is one, then blackhispan is 1. If black is 1 and hispanic is 0, then blackhispan is 0. Here is how I would do this with a slightly different dataset. I ignore the -bys- since you don't provide country, date, and year. Note that the value of wanted is constant without the by statement.

      Code:
      clear
      input int race float(black hispanic soc2)
      100 0 0 53
      200 1 0 43
      100 0 0 47
      100 0 0 43
      100 0 0 43
      100 1 0 11
      100 0 0  .
      100 0 0 47
      200 1 0 47
      100 0 0 25
      100 0 1 11
      200 1 0 43
      200 1 0 41
      200 1 0  .
      200 1 0 37
      100 0 1 53
      100 0 0 53
      100 0 0 49
      100 0 0 41
      200 1 0 41
      100 0 0 51
      200 1 0  .
      100 0 0 35
      100 1 1 11
      100 0 1 51
      100 0 0 49
      100 0 0 43
      100 0 0 35
      200 1 0 47
      100 0 0 53
      100 0 0 11
      100 0 0 43
      100 0 0 35
      100 0 0 33
      100 0 0  .
      100 0 0 11
      100 0 0  .
      100 0 0 47
      100 0 0  .
      100 0 0 25
      200 1 0 53
      100 0 0 29
      100 0 0 41
      200 1 0 53
      200 1 0 53
      200 1 0 25
      end
      Here is what cond() is doing:

      Code:
      . gen blackhispanwithcond = cond(black, hispanic, soc2==11,.)
      
      . gen equalseleven = soc2==11
      
      . checkvar blackhispanwithcond black hispanic equalseleven
      
      
        Created Variable: blackhispanwithcond
        --------------------------------------------------
        blackh~d |    black  hispanic  equals~n |     Freq
        ---------+------------------------------+---------
               0 |        0         0         0 |       26
               0 |        0         1         0 |        2
               0 |        1         0         0 |       13
               0 |        1         0         1 |        1
        ---------+------------------------------+---------
               1 |        0         0         1 |        2
               1 |        0         1         1 |        1
               1 |        1         1         1 |        1
        --------------------------------------------------
      And here is how I would handle this:

      Code:
      gen blackhispan = 0
      replace blackhispan = 1 if (black == 1 | hispanic == 1) & soc2==11
      egen blackhispantotal = total(blackhispan)
      egen alltotal = total(soc2==11)
      gen wanted = blackhispantotal / alltotal
      Last edited by Daniel Schaefer; 08 Jul 2022, 12:00. Reason: I wanted to explicitly demonstrate what cond() resolves to for values of black, hispanic, and soc2==1.

      Comment


      • #4
        Those last few lines of mine could easily be simplified using the code provided in #2

        Comment


        • #5
          Thanks to everyone for their considerate suggestion. Next time, I'll be careful while posting my data.

          After carefully following the suggestion provided by both of you, I successfully coded up the variable that I was aiming to create. Much obliged.

          Comment

          Working...
          X