Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating a new numeric variable that gives the proportion of a value count and the total count of another

    Hello Statalist,

    I'm working with an integer variable "ind" (for industry) and a 0-1 binary float variable "skill_type1"

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int ind
    9890
     180
    7780
    9480
    6990
    7870
    9890
    7860
    7390
    8680
    7270
    1990
    9590
    4690
    7860
    7860
    1190
    9590
    9590
    8590
     770
    9370
    8680
    9590
    8180
    6570
    8190
     770
    9590
    8590
    9570
    9590
    8370
     770
    9590
    9160
    6170
    4690
    8660
    9470
    7380
    2370
    8370
    2870
    4470
    3580
    9590
    9590
    8560
    4690
    8470
     670
     770
    9590
    8170
    9590
    1270
    9370
     170
    7860
    9470
    7070
    8170
    6290
     770
    6970
    9590
    8570
    9470
    8680
    4970
    2790
    9470
    9380
    8680
    6170
    9370
    8170
    9590
    7580
    7280
    7460
    7870
    7070
    8590
    9590
    9590
    3390
    8190
    4770
    7080
    2980
    8190
    9380
     380
    4780
     470
    9470
    9590
    8090
    end
    Code:
    . codebook skill_type1
    
    ----------------------------------------------------------------------------------------------------------
    skill_type1                                                                                    (unlabeled)
    ----------------------------------------------------------------------------------------------------------
    
                      type:  numeric (float)
                     label:  skill_type1
    
                     range:  [0,1]                        units:  1
             unique values:  2                        missing .:  0/90,026
    
                tabulation:  Freq.   Numeric  Label
                            70,571         0  no industry change
                            19,455         1  industry change
    I'm trying to generate a new variable that stores the proportion of industry changes (i.e. skill_type1 at value 1) per industry. For example, for the industry value "170" below, I would like the new variable to store the value equivalent of "189/803".

    Code:
    . tab ind skill_type1
    
               |      skill_type1
      industry | no indust  industry  |     Total
    -----------+----------------------+----------
           170 |       614        189 |       803 
           180 |       565        171 |       736 
           190 |        48         11 |        59
    I'd appreciate your suggestions for how I should proceed.


  • #2
    We can't play with your data as you don't give any for both the variables you care about. But the problem can be replicated. It's just the mean of your indicator.

    Code:
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . tab rep78 for, row
    
    +----------------+
    | Key            |
    |----------------|
    |   frequency    |
    | row percentage |
    +----------------+
    
        Repair |
        Record |       Car type
          1978 |  Domestic    Foreign |     Total
    -----------+----------------------+----------
             1 |         2          0 |         2 
               |    100.00       0.00 |    100.00 
    -----------+----------------------+----------
             2 |         8          0 |         8 
               |    100.00       0.00 |    100.00 
    -----------+----------------------+----------
             3 |        27          3 |        30 
               |     90.00      10.00 |    100.00 
    -----------+----------------------+----------
             4 |         9          9 |        18 
               |     50.00      50.00 |    100.00 
    -----------+----------------------+----------
             5 |         2          9 |        11 
               |     18.18      81.82 |    100.00 
    -----------+----------------------+----------
         Total |        48         21 |        69 
               |     69.57      30.43 |    100.00 
    
    . egen wanted = mean(foreign), by(rep78)
    
    . tabdisp rep78, c(wanted)
    
    ----------------------
    Repair    |
    Record    |
    1978      |     wanted
    ----------+-----------
            1 |          0
            2 |          0
            3 |         .1
            4 |         .5
            5 |   .8181818
            . |         .2
    ----------------------
    If you wanted percents, those are given directly by writing instead mean(100 * foreign) -- noting that 100 * mean(foreign) is illegal.

    Comment


    • #3
      Thanks so much Nick, this works perfectly.

      Comment

      Working...
      X