Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • calculating proportions by groups

    I’m using administrative data (I’ve included an example below) and I am trying to calculate the proportion (%) of grants by sex and age for each year in my dataset. I am a bit lost on where to start with the code.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int Grants long(year apptype_grp sex age) float european_union
     4 1 1 0 1 0
    19 1 1 0 2 0
     2 1 1 1 1 0
    33 1 1 1 2 0
     2 1 1 1 3 0
     1 1 1 0 1 0
    11 1 1 0 2 0
     3 1 1 0 3 0
     6 1 1 1 1 0
    19 1 1 1 2 0
     4 1 1 1 3 0
     1 1 1 0 1 0
     3 1 1 0 2 0
     3 1 1 1 1 0
     6 1 1 1 2 0
     1 1 1 1 3 0
     1 1 1 0 1 0
    19 1 1 0 1 0
    26 1 1 0 2 0
     4 1 1 1 1 0
    25 1 1 1 2 0
     2 1 1 1 3 0
     7 1 1 0 1 0
    11 1 1 0 2 0
     2 1 1 1 1 0
     5 1 1 1 2 0
     1 1 1 1 3 0
     2 1 1 0 1 0
     5 1 1 0 2 0
     4 1 1 1 1 0
     8 1 1 1 2 0
     1 1 1 1 3 0
     2 1 1 0 1 0
     3 1 1 0 2 0
     2 1 1 1 1 0
     1 1 1 1 2 0
     1 1 1 0 2 0
     1 1 1 1 2 0
     1 1 1 1 2 0
     1 1 1 0 1 0
     1 1 1 1 2 0
     3 1 1 0 1 0
     3 1 1 0 2 0
     1 1 1 0 3 0
     4 1 1 1 2 0
     1 1 1 1 3 0
     2 1 1 0 1 0
     4 1 1 0 2 0
     3 1 1 1 2 0
     1 1 1 0 1 0
     6 1 1 0 2 0
     1 1 1 1 1 0
     8 1 1 1 2 0
     1 1 1 1 2 0
     4 1 1 0 1 0
     3 1 1 0 2 0
     1 1 1 1 1 0
     2 1 1 1 2 0
     1 1 1 1 3 0
     6 1 1 0 1 0
    10 1 1 0 2 0
     3 1 1 1 2 0
     1 1 1 1 3 0
     2 1 1 0 2 0
     3 1 1 0 1 0
    10 1 1 0 2 0
     1 1 1 1 1 0
    11 1 1 1 2 0
    17 1 1 0 1 0
    73 1 1 0 2 0
     5 1 1 0 3 0
     9 1 1 1 1 0
    51 1 1 1 2 0
    13 1 1 1 3 0
     2 1 1 1 2 0
     3 1 1 0 2 0
     2 1 1 1 1 0
     3 1 1 1 2 0
    26 1 1 0 1 0
    60 1 1 0 2 0
     6 1 1 0 3 0
    11 1 1 1 1 0
    39 1 1 1 2 0
    20 1 1 1 3 0
     4 1 1 1 4 0
     3 1 1 0 2 0
     1 1 1 0 1 0
     2 1 1 0 2 0
     1 1 1 1 1 0
     1 1 1 1 2 0
     1 1 1 1 3 0
     5 1 1 0 1 0
     3 1 1 0 2 0
     5 1 1 1 2 0
     1 1 1 1 3 0
     8 1 1 0 1 0
    32 1 1 0 2 0
     9 1 1 0 3 0
     1 1 1 1 1 0
    17 1 1 1 2 0
    end
    label values year year
    label def year 1 "2005", modify
    label values apptype_grp apptype_grp
    label def apptype_grp 1 "Naturalisation", modify
    label values sex gender
    label def gender 0 "Female", modify
    label def gender 1 "Male", modify
    label values age AGE
    label def AGE 1 "18-29", modify
    label def AGE 2 "30-49", modify
    label def AGE 3 "50-69", modify
    label def AGE 4 ">70", modify
    label values european_union EU
    label def EU 0 "non-EU", modify

  • #2
    one solution is to use -collapse-
    Code:
    collapse (sum) Grants ,by(age sex year)
    egen total = total(Grants) ,by(year)
    g share = (Grants / total) * 100

    Comment


    • #3
      Thank you very much Øyvind. I've also managed to calculate it this way, I get the same result which is reassuring.

      Code:
      bysort year: egen total=total(Grants)
      bysort sex year age: egen sex_age_total=total(Grants)
      gen percent=(sex_age_total/total)*100

      Comment


      • #4
        A slightly different solution from #2.

        Code:
        collapse (sum) Grants, by(age sex year)
        bys year: egen wanted = pc(Grants)
        Without collapsing the data:

        Code:
        bys year: egen temp = pc(Grants)
        bys year sex age: egen wanted = total(temp)
        Last edited by Fei Wang; 18 Nov 2021, 04:22.

        Comment


        • #5
          Thank you for the alternative solution, Fei. If I may ask a follow up question - I am trying to graph these results using a stacked bar chart. I've tried this code:

          Code:
          graph bar (sum) Grants, over(age) over(sex) over(year) asyvars percentages stack
          which produces this graph:

          Click image for larger version

Name:	Grants stack.png
Views:	1
Size:	54.1 KB
ID:	1637051


          I am not sure if I have the code right?

          Comment


          • #6
            Alex, I think what you wanted in #1 are percentages for each age-sex pair summing up to 100% within a year. But the graph shows percentages for each age group summing up to 100% within a year-sex pair.

            Comment


            • #7
              Yes, I do need to graph % (of grants) for each age-sex pair summing up to 100% for every year in my dataset - that's where I think the code I have in #5 is incorrect. Basically, I am not sure how to graph those results using the stacked bar chart format

              Comment


              • #8
                Code:
                egen grp = group(age sex)
                label def grplabs 1 "18-29 Female" 2 "18-29 Male" 3 "30-49 Female" 4 "30-49 Male" 5 "50-69 Female" 6 "50-69 Male" 7 ">70 Female" 8 ">70 Male"
                label values grp grplabs
                graph bar (sum) Grants ,over(grp) over(year) asyvars percentages stack

                Comment


                • #9
                  That's brill, thank you Øyvind.

                  Comment


                  • #10
                    For this problem tabplot from the Stata Journal may help. For an executive summary, see https://www.statalist.org/forums/for...updated-on-ssc For the SJ record write-up see results from

                    Code:
                    search tabplot, sj
                    with at the time of writing the 2016 paper being most comprehensive and the 2020 update being the latest public version of the files.



                    SJ-20-3 gr0066_2 . . . . . . . . . . . . . . . . Software update for tabplot
                    (help tabplot if installed) . . . . . . . . . . . . . . . . N. J. Cox
                    Q3/20 SJ 20(3):757--758
                    added new options frame() and frameopts() allowing framing
                    of bars and so-called thermometer plots or charts

                    SJ-16-2 gr0066 . . . . . . Speaking Stata: Multiple bar charts in table form
                    (help tabplot if installed) . . . . . . . . . . . . . . . . N. J. Cox
                    Q2/16 SJ 16(2):491--510
                    provides multiple bar charts in table form representing
                    contingency tables for one, two, or three categorical variables


                    I'd flag or assert various positives here:

                    1. A bar chart with this design makes some comparisons easier while making no comparisons harder.

                    2. It is easier to identify and think about (combinations of) categories that are rare (or absent!).

                    3. There is space to show numbers too and so get the best of both worlds, graph and table.

                    4. tabplot allows full control of which variable(s) are the framework for percent calculations.

                    The example here is indicative, not definitive, and manifestly based on fake data.

                    Code:
                    clear 
                    set obs 136
                    
                    egen year = seq(), from(2005) to(2021) block(8)
                    egen sex = seq(), from(0) to(1) block(4)
                    label def sex 0 male 1 female 
                    label val sex sex 
                    egen age = seq(), from(1) to(4)
                    label def age 1 "18-29" 2 "30-49" 3 "50-69" 4 "70+"
                    label val age age 
                    
                    forval y = 2005/2021 {
                        local this : di %02.0f mod(`y', 100) 
                        label def year `y' "`this'", add 
                    }
                    label val year year 
                    
                    tab year 
                    
                    gen whatever = cond(age == 1, 25, cond(age == 2, 50, cond(age == 3, 22, 3)))
                    
                    egen group = group(age sex), label 
                    
                    set scheme s1color 
                    tabplot group year [iw=whatever] , ytitle("") percent(year) height(0.65) yreverse showval(mlabsize(vsmall))   separate(sex) bar1(color(blue)) bar2(color(orange))
                    Click image for larger version

Name:	agesexyear.png
Views:	1
Size:	41.1 KB
ID:	1637075

                    Comment


                    • #11
                      Thank you for this detailed explanation, Nick.

                      Just one small point (i might be misunderstanding something), I have ran this code on the full dataset and the percentages are consistent across years for age-sex pairs (e.g., 70+ males account for 1.5% of total grants for all years (2005 - 2021). I don't think this should be the case since the percentages vary by year?

                      Comment


                      • #12
                        It sounds as if you must have done something wrong, but without seeing your data or your exact code I can't say what that is.

                        You should post here the results of

                        Code:
                        collapse (sum) Grants, by(age sex year)  
                        
                        dataex
                        and then if you wish the graph code that you used afterwards.

                        Comment


                        • #13
                          This is the code I have run:

                          Code:
                          collapse (sum) Grants, by(age sex year)
                          
                          rename year year2
                          rename sex sex2
                          rename age age2
                          
                          egen year = seq(), from(2005) to(2021) block(8)
                          egen sex = seq(), from(0) to(1) block(4)
                          label def sex 0 male 1 female 
                          label val sex sex 
                          egen age = seq(), from(1) to(4)
                          label def age 1 "18-29" 2 "30-49" 3 "50-69" 4 "70+"
                          label val age age 
                          
                          forval y = 2005/2021 {
                              local this : di %02.0f mod(`y', 100) 
                              label def year `y' "`this'", add 
                          }
                          label val year year 
                           
                          gen whatever = cond(age == 1, 25, cond(age == 2, 50, cond(age == 3, 22, 3)))
                          egen group = group(age sex), label 
                          set scheme s1color

                          and here is an example of my data. I am not quite sure where I am going wrong.

                          Code:
                          * Example generated by -dataex-. To install: ssc install dataex
                          clear
                          input long(year sex age) double Grants
                           1 0 1 15203
                           2 0 1 13645
                           3 0 1 16489
                           4 0 1 14495
                           5 0 1 24660
                           6 0 1 21954
                           7 0 1 17972
                           8 0 1 19418
                           9 0 1 20804
                          10 0 1  9892
                          11 0 1  9241
                          12 0 1 11589
                          13 0 1  7464
                          14 0 1  8710
                          15 0 1  8907
                          16 0 1  6936
                          17 0 1  4249
                           1 1 1 12279
                           2 1 1 10515
                           3 1 1 14039
                           4 1 1 12792
                           5 1 1 14842
                           6 1 1 12922
                           7 1 1 14230
                           8 1 1 14168
                           9 1 1 12416
                          10 1 1  6025
                          11 1 1  6201
                          12 1 1  7953
                          13 1 1  6008
                          14 1 1  7282
                          15 1 1  7273
                          16 1 1  5849
                          17 1 1  3643
                           1 0 2 35230
                           2 0 2 34782
                           3 0 2 38205
                           4 0 2 29333
                           5 0 2 51564
                           6 0 2 48100
                           7 0 2 41407
                           8 0 2 47997
                           9 0 2 57126
                          10 0 2 32505
                          11 0 2 30541
                          12 0 2 37768
                          13 0 2 33139
                          14 0 2 40796
                          15 0 2 43916
                          16 0 2 36147
                          17 0 2 21049
                           1 1 2 35512
                           2 1 2 33895
                           3 1 2 38426
                           4 1 2 29794
                           5 1 2 47424
                           6 1 2 44944
                           7 1 2 45557
                           8 1 2 51035
                           9 1 2 54525
                          10 1 2 32169
                          11 1 2 31018
                          12 1 2 37079
                          13 1 2 31286
                          14 1 2 36785
                          15 1 2 38407
                          16 1 2 32780
                          17 1 2 18776
                           1 0 3  6202
                           2 0 3  5319
                           3 0 3  4933
                           4 0 3  3882
                           5 0 3  6405
                           6 0 3  6047
                           7 0 3  5176
                           8 0 3  6390
                           9 0 3  7092
                          10 0 3  3724
                          11 0 3  3749
                          12 0 3  4602
                          13 0 3  4504
                          14 0 3  7126
                          15 0 3  7176
                          16 0 3  6420
                          17 0 3  3715
                           1 1 3  5252
                           2 1 3  4650
                           3 1 3  4337
                           4 1 3  3331
                           5 1 3  5603
                           6 1 3  5400
                           7 1 3  4810
                           8 1 3  5761
                           9 1 3  6465
                          10 1 3  3364
                          11 1 3  3104
                          12 1 3  3891
                          13 1 3  3662
                          14 1 3  6374
                          15 1 3  5905
                          end
                          label values year year
                          label def year 1 "2005", modify
                          label def year 2 "2006", modify
                          label def year 3 "2007", modify
                          label def year 4 "2008", modify
                          label def year 5 "2009", modify
                          label def year 6 "2010", modify
                          label def year 7 "2011", modify
                          label def year 8 "2012", modify
                          label def year 9 "2013", modify
                          label def year 10 "2014", modify
                          label def year 11 "2015", modify
                          label def year 12 "2016", modify
                          label def year 13 "2017", modify
                          label def year 14 "2018", modify
                          label def year 15 "2019", modify
                          label def year 16 "2020", modify
                          label def year 17 "2021", modify
                          label values sex gender
                          label def gender 0 "Female", modify
                          label def gender 1 "Male", modify
                          label values age AGE
                          label def AGE 1 "18-29", modify
                          label def AGE 2 "30-49", modify
                          label def AGE 3 "50-69", modify

                          Comment


                          • #14
                            You're repeating a lot of code that bears only on the fake dataset created for #10. This may be closer to what you want.


                            Code:
                            * Example generated by -dataex-. To install: ssc install dataex
                            clear
                            input long(year sex age) double Grants
                             1 0 1 153
                             2 0 1 13645
                             3 0 1 16489
                             4 0 1 14495
                             5 0 1 24660
                             6 0 1 21954
                             7 0 1 17972
                             8 0 1 19418
                             9 0 1 804
                            10 0 1  9892
                            11 0 1  9241
                            12 0 1 11589
                            13 0 1  7464
                            14 0 1  8710
                            15 0 1  8907
                            16 0 1  6936
                            17 0 1  4249
                             1 1 1 12279
                             2 1 1 10515
                             3 1 1 14039
                             4 1 1 12792
                             5 1 1 14842
                             6 1 1 12922
                             7 1 1 14230
                             8 1 1 14168
                             9 1 1 12416
                            10 1 1  6025
                            11 1 1  61
                            12 1 1  7953
                            13 1 1  6008
                            14 1 1  7282
                            15 1 1  7273
                            16 1 1  5849
                            17 1 1  3643
                             1 0 2 35230
                             2 0 2 34782
                             3 0 2 385
                             4 0 2 29333
                             5 0 2 51564
                             6 0 2 48100
                             7 0 2 41407
                             8 0 2 47997
                             9 0 2 57126
                            10 0 2 32505
                            11 0 2 30541
                            12 0 2 37768
                            13 0 2 33139
                            14 0 2 40796
                            15 0 2 43916
                            16 0 2 36147
                            17 0 2 21049
                             1 1 2 35512
                             2 1 2 33895
                             3 1 2 38426
                             4 1 2 29794
                             5 1 2 47424
                             6 1 2 44944
                             7 1 2 45557
                             8 1 2 51035
                             9 1 2 54525
                            10 1 2 32169
                            11 1 2 31018
                            12 1 2 37079
                            13 1 2 31286
                            14 1 2 36785
                            15 1 2 38407
                            16 1 2 32780
                            17 1 2 18776
                             1 0 3  62
                             2 0 3  5319
                             3 0 3  4933
                             4 0 3  3882
                             5 0 3  6405
                             6 0 3  6047
                             7 0 3  5176
                             8 0 3  6390
                             9 0 3  7092
                            10 0 3  3724
                            11 0 3  3749
                            12 0 3  4602
                            13 0 3  4504
                            14 0 3  7126
                            15 0 3  7176
                            16 0 3  64
                            17 0 3  3715
                             1 1 3  5252
                             2 1 3  4650
                             3 1 3  4337
                             4 1 3  3331
                             5 1 3  5603
                             6 1 3  5400
                             7 1 3  4810
                             8 1 3  5761
                             9 1 3  6465
                            10 1 3  3364
                            11 1 3  3104
                            12 1 3  3891
                            13 1 3  3662
                            14 1 3  6374
                            15 1 3  5905
                            end
                            label values year year
                            
                            label def year 1 "05", modify
                            label def year 2 "06", modify
                            label def year 3 "07", modify
                            label def year 4 "08", modify
                            label def year 5 "09", modify
                            label def year 6 "10", modify
                            label def year 7 "11", modify
                            label def year 8 "12", modify
                            label def year 9 "13", modify
                            label def year 10 "14", modify
                            label def year 11 "15", modify
                            label def year 12 "16", modify
                            label def year 13 "17", modify
                            label def year 14 "18", modify
                            label def year 15 "19", modify
                            label def year 16 "20", modify
                            label def year 17 "21", modify
                            label values sex gender
                            label def gender 0 "Female", modify
                            label def gender 1 "Male", modify
                            label values age AGE
                            label def AGE 1 "18-29", modify
                            label def AGE 2 "30-49", modify
                            label def AGE 3 "50-69", modify
                            
                            
                            collapse (sum) Grants, by(age sex year)
                            
                            egen group = group(age sex), label 
                            set scheme s1color
                            
                            tabplot group year [iw=Grants] , ytitle("") percent(year) height(0.65) yreverse showval(mlabsize(vsmall))   separate(sex) bar1(color(blue)) bar2(color(orange))
                            Click image for larger version

Name:	grants.png
Views:	1
Size:	41.6 KB
ID:	1637215


                            Comment


                            • #15
                              Many thanks for this, Nick. And will definitely be making more use of tabplot.

                              Comment

                              Working...
                              X