Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Maria Chr
    started a topic How to create a percentage variable

    How to create a percentage variable

    Hello,

    I have a categorical variable for ethnicity.

    When I tabulate it I can see that there are percentages assigned to each one. I want to create a new variable called availability ratio which is basically= number of people belonging to ethnicity A over total population (i.e. it's the percentage that I see when I use the tab command).

    I used the following command:

    egen avail_ratio=pc(ethnicity)

    but it doesn't work. It doesn't give me the right percentages.

    Could somebody correct my command please?

  • David Benson
    replied
    Hello Maria Chr ,

    It looks like you were really close, you just needed to change "egen avail_ratio = pc(population)" to "egen avail_ratio = pc(population), by(country)"
    Code:
    // I created some toy data
    dataex  //  Data shared using dataex command. To install: ssc install dataex
    clear
    input byte(ethnicity country) long population
    1 1  65049
    2 1 112886
    3 1  77051
    1 2  38592
    2 2  81728
    3 2  50737
    4 2 165055
    5 2 130231
    1 3 109997
    2 3  18738
    3 3 145086
    4 3 135422
    end
    
    order country, first
    format population %9.0gc
    
    list, sepby(country) noobs abbrev(12)
    
      +----------------------------------+
      | country   ethnicity   population |
      |----------------------------------|
      |       1           1       65,049 |
      |       1           2      112,886 |
      |       1           3       77,051 |
      |----------------------------------|
      |       2           1       38,592 |
      |       2           2       81,728 |
      |       2           3       50,737 |
      |       2           4      165,055 |
      |       2           5      130,231 |
      |----------------------------------|
      |       3           1      109,997 |
      |       3           2       18,738 |
      |       3           3      145,086 |
      |       3           4      135,422 |
      +----------------------------------+
    
    egen c_pop = total(population), by(country)  // Creating the country population
    format c_ %9.0gc
    gen ethnic_pct = population  / c_pop
    
    // Using your formulation above
    egen avail_ratio = pc(population), by(country)  // Looks like you were just missing this last part.  If you have multiple yrs, may need by(country year)
    
    . list, sepby(country) noobs abbrev(12)
    
      +-----------------------------------------------------------------------+
      | country   ethnicity   population     c_pop   ethnic_pct   avail_ratio |
      |-----------------------------------------------------------------------|
      |       1           1       65,049   254,986     .2551081      25.51081 |
      |       1           2      112,886   254,986     .4427145      44.27145 |
      |       1           3       77,051   254,986     .3021774      30.21774 |
      |-----------------------------------------------------------------------|
      |       2           1       38,592   466,343     .0827545      8.275454 |
      |       2           2       81,728   466,343      .175253       17.5253 |
      |       2           3       50,737   466,343     .1087976      10.87976 |
      |       2           4      165,055   466,343     .3539348      35.39347 |
      |       2           5      130,231   466,343     .2792601      27.92601 |
      |-----------------------------------------------------------------------|
      |       3           1      109,997   409,243     .2687816      26.87816 |
      |       3           2       18,738   409,243      .045787      4.578698 |
      |       3           3      145,086   409,243     .3545229      35.45229 |
      |       3           4      135,422   409,243     .3309085      33.09085 |
      +-----------------------------------------------------------------------+

    Leave a comment:


  • Scott Knowles
    replied
    Hello Maria,

    Here is some code I ran to understand your question better.

    **
    clear all
    set obs 100
    set seed 98
    gen u1=runiform()
    gen u2=1
    replace u2 = 0 if u2<.2
    replace u2 = 2 if u2 >.8
    tab u2
    contract u2, percent(u3)
    **

    In the above, I generated random observations and then separated them by categories (randomly choosing .2 and .8 as cutoffs). After this, you can use the tab command to verify. Finally, we can generate percentage variables.

    This is only meant to serve as an example to help you, but I cannot assist further without seeing your data/code.

    Hope this was useful.

    Sincerely,


    Scott

    Leave a comment:

Working...
X