Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create variable of group frequencies

    I have a data set that includes variables on individual's district (where a person lives) and occupation. I am trying to generate a new variable that tells me the occupation shares by district; i.e. if there are 10 people in the 1st district and 3 are farmers, I want the new variable created to say 0.3 for everyone that is a farmer in that district. Thus far I have tried:

    bysort district: tab occupation

    however, as there are over 600 districts in the sample, it would be very labor intensive to enter all the frequencies by hand. I have also tried

    egen freq=pc(occupation), by(district) prop

    but this does not return that correct proportions.

    I would really appreciate help with this. Thanks!

  • #2
    Try:

    Code:
    bysort district: gen denom=_N
    bysort district occupation: gen numerator=_N
    gen proportions=numerator/denom
    Last edited by Carole J. Wilson; 22 Jan 2019, 11:44.
    Stata/MP 14.1 (64-bit x86-64)
    Revision 19 May 2016
    Win 8.1

    Comment


    • #3
      Carole J. Wilson wrote egen and meant gen.

      Comment


      • #4
        Thanks for the help! Carole's suggestion worked

        Comment


        • #5
          Oops! Thanks, Nick. I’ll edit the original post.
          Stata/MP 14.1 (64-bit x86-64)
          Revision 19 May 2016
          Win 8.1

          Comment


          • #6
            There is still a question of why

            Code:
            egen freq=pc(occupation), by(district) prop
            was wrong. That is going to start by adding up your occupation codes within groups of observations, i.e. treat them literally. if your codes are say 1, 2, 7 then the total is 10 and the results are 0.1, 0.2, 0.7 not 1/3 three times.

            Here's a demonstration.

            Code:
            . clear
            
            . set obs 3
            number of observations (_N) was 0, now 3
            
            . gen y = cond(_n == 1, 1, cond(_n == 2, 2, 7))
            
            . list y
            
                 +---+
                 | y |
                 |---|
              1. | 1 |
              2. | 2 |
              3. | 7 |
                 +---+
            
            . egen pc = pc(y), prop
            
            . l
            
                 +--------+
                 | y   pc |
                 |--------|
              1. | 1   .1 |
              2. | 2   .2 |
              3. | 7   .7 |
                 +--------+
            That won't be what is wanted here. Naturally doing all this by() some other variable: doesn't change that principle.

            Comment


            • #7
              There might be also an issue if occupation is missing for some people.

              If occupation is missing for some people, Carole's code calculates, e.g., what fraction the people who are known to be carpenters are out of the total number of observations in the district. If occupation is missing for some people, this might be very different from what fraction are carpenters out of the people whose profession is known.

              Here is an example of the distinction:

              Code:
              . sysuse auto
              (1978 Automobile Data)
              
              . sort rep
              
              . keep in 64/l
              (63 observations deleted)
              
              . keep foreign rep
              
              . bysort foreign: gen denom = _N
              
              . bysort foreign rep: gen numer = _N
              
              . gen ratio = numer / denom
              
              . list, by(foreign)
              option by() not allowed
              r(198);
              
              . list, sepby(foreign)
              
                   +---------------------------------------------+
                   | rep78    foreign   denom   numer      ratio |
                   |---------------------------------------------|
                1. |     5   Domestic       5       1         .2 |
                2. |     .   Domestic       5       4         .8 |
                3. |     .   Domestic       5       4         .8 |
                4. |     .   Domestic       5       4         .8 |
                5. |     .   Domestic       5       4         .8 |
                   |---------------------------------------------|
                6. |     5    Foreign       6       5   .8333333 |
                7. |     5    Foreign       6       5   .8333333 |
                8. |     5    Foreign       6       5   .8333333 |
                9. |     5    Foreign       6       5   .8333333 |
               10. |     5    Foreign       6       5   .8333333 |
               11. |     .    Foreign       6       1   .1666667 |
                   +---------------------------------------------+
              In this example we calculated some fractions which are different from 1. However the value of rep78==5 is in fact 100% of the values that have known values.

              Comment


              • #8
                And in case occupation is missing for some people, we can do something like this:

                Code:
                . egen denom2 = count(rep), by(foreign)
                
                . egen numer2 = count(rep), by(foreign rep)
                
                . gen ratio2 = numer2 / denom2
                
                . list, sepby(foreign)
                
                     +------------------------------------------------------------------------+
                     | rep78    foreign   denom   numer      ratio   denom2   numer2   ratio2 |
                     |------------------------------------------------------------------------|
                  1. |     5   Domestic       5       1         .2        1        1        1 |
                  2. |     .   Domestic       5       4         .8        1        0        0 |
                  3. |     .   Domestic       5       4         .8        1        0        0 |
                  4. |     .   Domestic       5       4         .8        1        0        0 |
                  5. |     .   Domestic       5       4         .8        1        0        0 |
                     |------------------------------------------------------------------------|
                  6. |     5    Foreign       6       5   .8333333        5        5        1 |
                  7. |     5    Foreign       6       5   .8333333        5        5        1 |
                  8. |     5    Foreign       6       5   .8333333        5        5        1 |
                  9. |     5    Foreign       6       5   .8333333        5        5        1 |
                 10. |     5    Foreign       6       5   .8333333        5        5        1 |
                 11. |     .    Foreign       6       1   .1666667        5        0        0 |
                     +------------------------------------------------------------------------+

                Comment

                Working...
                X