Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding sample size to graphs

    I want to add sample size for each of the subgraphs in the command below:

    graph bar if adoption == 1, over(category_cat) blabel(bar, format(%9.1f)) ytitle(% of farmers who adopted pulses) by(season_adop, noiytitle noixtitle)

    Please tell me how can I do that?

  • #2
    There is no data example here for us to work on (https://www.statalist.org/forums/help#stata) -- and it's hard to see how that code would yield a display of percents that you want. If you choose one category, nothing is said about the others and nothing in your code calculates percents -- so I will choose my own example.

    graph bar is pretty good at doing what it claims, but it is not infinitely flexible. At some point you are better off just calculating what you want to show and reaching for twoway bar instead.

    Here a percent is just 100 times the mean of an indicator variable -- or equivalently the mean of 100 times an indicator variable. And a sample size is just a count. So what you want to show as text is the concatenation of simple elements.

    This may be some distance from your set-up but

    Code:
    preserve 
    contract category_cat season_adopt adoption 
    dataex 
    restore
    would let you give us something more concrete. If that doesn't make sense, do please read the link just given.


    Code:
    sysuse auto, clear 
    
    egen pc = mean(100 * foreign), by(rep78)
    label var pc "% of foreign cars"
    bysort rep78 : gen count = _N 
    
    gen toshow = string(pc, "%2.1f") + "% (" + string(count) + ")"
    
    set scheme s1color 
    
    twoway bar pc rep78, barw(0.8) || scatter pc rep78 , mla(toshow) mlabpos(12) ms(none) legend(off) ysc(r(0 85)) yla(, ang(h))
    Click image for larger version

Name:	bar_and_label.png
Views:	1
Size:	21.6 KB
ID:	1578757

    Comment


    • #3
      Hi Nick,

      Thanks a lot for your help.
      My code runs because I used graph editor where it asks "graphs of percent of frequencies within categories", please suggest on this as well.


      Additionally, below is an example of my dataset. The only thing is my dataset has season_adop and category_cat, both as categorical variables. If you could help on percentage code for this as well, it would be of great help.

      Example generated by -dataex-. To install: ssc install dataex
      clear
      input long(season_adop category_cat) float adoption int _freq
      1 1 0 1006
      1 1 1 1144
      2 1 0 1485
      2 1 1 665
      3 1 0 1921
      3 1 1 229
      4 1 0 1121
      4 1 1 1029
      5 1 0 1560
      5 1 1 590
      6 1 0 1300
      6 1 1 850
      7 1 0 1811
      7 1 1 339
      8 1 0 1107
      8 1 1 1043
      9 1 0 1435
      9 1 1 715
      10 1 0 1015
      10 1 1 1135
      1 2 0 2755
      1 2 1 1085
      2 2 0 2279
      2 2 1 1561
      3 2 0 3363
      3 2 1 477
      4 2 0 2322
      4 2 1 1518
      5 2 0 2238
      5 2 1 1602
      6 2 0 1672
      6 2 1 2168
      7 2 0 3836
      7 2 1 4
      8 2 0 2290
      8 2 1 1550
      9 2 0 3153
      9 2 1 687
      10 2 0 2779
      10 2 1 1061
      1 3 0 3508
      1 3 1 1548
      2 3 0 3144
      2 3 1 1912
      3 3 0 4417
      3 3 1 639
      4 3 0 2766
      4 3 1 2290
      5 3 0 2774
      5 3 1 2282
      6 3 0 2361
      6 3 1 2695
      7 3 0 5051
      7 3 1 5

      Comment


      • #4
        Your syntax being legal does not make it correct. How could that code calculate percents?

        Thanks for the data example, but I believe that you only showed part of the output -- and omitted the display of value labels. In order to show

        % of farmers who adopted pulses

        we need to know what codes adoption in your data. But I assume it means adoption == 1 as in #1. There are still ambiguities in your question, but you should be able to adapt this to your goals.


        Code:
        egen denom = total(_freq), by(category_cat season_adop)
        egen numer = total(_freq * (adoption == 1)), by(category_cat season_adop)
        gen pc = 100 * numer/denom 
        label var pc "% adopting"
        
        gen toshow = string(pc, "%2.0f") + "% (" + string(_freq) + ")"
        
        set scheme s1color 
        
        twoway bar pc category_cat, barw(0.8) || scatter pc category_cat , xla(1 2 3) mla(toshow) mlabpos(12) ms(none) yla(, ang(h)) by(season_adop, legend(off))

        Comment


        • #5
          Thanks a lot for this, and yes, adoption is a binary variable with values 0 and 1.

          Last edited by Komal Jain; 24 Oct 2020, 09:05.

          Comment


          • #6
            Hi,

            _freq code is not working for me, which should I use instead? I have given a clearer description below, I hope it helps:

            Basically, I have 10 categories for season_adop; adoption is a binary variable; category_cat has 4 categories.

            As you might already know from my post, I want to calculate % of farmers in each category who have adopted pulses in a particular season.

            I tried following codes as well:
            bysort season_adop category_cat: gen denom = _N
            bysort season_adop category_cat: gen numer = (_N * (adoption == 1))

            instead of what you suggested using "_freq".

            Yet again, I have attached a snippet of the data with value labels at the end of this message, hope this helps.

            input long(season_adop category_cat) float adoption int _freq
            1 1 0 1006
            1 1 1 1144
            2 1 0 1485
            2 1 1 665
            3 1 0 1921
            3 1 1 229
            4 1 0 1121
            4 1 1 1029
            5 1 0 1560
            5 1 1 590
            6 1 0 1300
            6 1 1 850
            7 1 0 1811
            7 1 1 339
            8 1 0 1107
            8 1 1 1043
            9 1 0 1435
            9 1 1 715
            10 1 0 1015
            10 1 1 1135
            1 2 0 2755
            1 2 1 1085
            2 2 0 2279
            2 2 1 1561
            3 2 0 3363
            3 2 1 477
            4 2 0 2322
            4 2 1 1518
            5 2 0 2238
            5 2 1 1602
            6 2 0 1672
            6 2 1 2168
            7 2 0 3836
            7 2 1 4
            8 2 0 2290
            8 2 1 1550
            9 2 0 3153
            9 2 1 687
            10 2 0 2779
            10 2 1 1061
            1 3 0 3508
            1 3 1 1548
            2 3 0 3144
            2 3 1 1912
            3 3 0 4417
            3 3 1 639
            4 3 0 2766
            4 3 1 2290
            5 3 0 2774
            5 3 1 2282
            6 3 0 2361
            6 3 1 2695
            7 3 0 5051
            7 3 1 5
            8 3 0 3304
            8 3 1 1752
            9 3 0 4112
            9 3 1 944
            10 3 0 3748
            10 3 1 1308
            1 4 0 7608
            1 4 1 3839
            2 4 0 7480
            2 4 1 3967
            3 4 0 9506
            3 4 1 1941
            4 4 0 6253
            4 4 1 5194
            5 4 0 5842
            5 4 1 5605
            6 4 0 5699
            6 4 1 5748
            7 4 0 11220
            7 4 1 227
            8 4 0 7172
            8 4 1 4275
            9 4 0 9216
            9 4 1 2231
            10 4 0 8786
            10 4 1 2661
            end
            label values season_adop season_adop
            label def season_adop 1 "Kharif 2017", modify
            label def season_adop 2 "Kharif 2018", modify
            label def season_adop 3 "Kharif 2019", modify
            label def season_adop 4 "Rabi 2017", modify
            label def season_adop 5 "Rabi 2018", modify
            label def season_adop 6 "Rabi 2019", modify
            label def season_adop 7 "Zaid 2017", modify
            label def season_adop 8 "Zaid 2018", modify
            label def season_adop 9 "Zaid 2019", modify
            label def season_adop 10 "Zaid 2020", modify
            label values category_cat category_cat
            label def category_cat 1 "Reserve", modify
            label def category_cat 2 "Low", modify
            label def category_cat 3 "Medium", modify
            label def category_cat 4 "High", modify

            Comment


            • #7
              "not working" is no kind of problem report that I can comment on directly. Please see FAQ Advice #12 where we say
              Never say just that something "doesn't work" or "didn't work", but explain precisely in what sense you didn't get what you wanted.
              To confirm: the code of #4 does work with the data of #6. It produces a crowded graph, but the problem is clear there: You have 40 bars and you want to show the associated sample size as well as the percent concerned. That is difficult without compromising somewhere.

              At a wild guess -- compare #2 -- you may be doing this

              Code:
              preserve  
              contract category_cat season_adopt adoption  
              dataex  
              restore
              But the restore does what it says -- it restores the original data and _freq is then no longer part of the data.

              The choice is between

              1. Not issuing the
              restore -- and running code like that I gave in terms of _freq --

              and

              2. Writing your own code for the uncontracted dataset . I am fine with your doing that, but what you show of that goes wrong quickly.

              I am guessing and have not tested this but I think the code should be more like

              Code:
              bysort category_cat season_adop adoption : gen numer = _N if adoption == 1  
              by category_cat season_adop : gen denom = _N
              by category_cat season_adop : gen pc = 100 * numer / denom if _n == _N  
              by category_cat season_adop : replace pc = 0 if pc == . & _n == _N  
              label var pc "% adopting"  
              gen toshow = string(pc, "%2.0f") + "% (" + string(denom) + ")"  
              set scheme s1color  
              
              twoway bar pc category_cat, barw(0.8) || scatter pc category_cat , xla(1 2 3 4) mla(toshow) mlabpos(12) ms(none) yla(, ang(h)) by(season_adop, legend(off))


              But back to #1. I think you can do a little better with tabplot from the Stata Journal -- here with an assist from mycolours from SSC.

              Here is my complete code using data from #6 and producing a poor graph that I don't show and a better graph that I do show.

              Would your seasons better be sorted into time order? Urdu [if that is what it is] alphabetical order may not make substantive sense.

              Code:
              clear
              
              input long(season_adop category_cat) float adoption int _freq
              1 1 0 1006
              1 1 1 1144
              2 1 0 1485
              2 1 1 665
              3 1 0 1921
              3 1 1 229
              4 1 0 1121
              4 1 1 1029
              5 1 0 1560
              5 1 1 590
              6 1 0 1300
              6 1 1 850
              7 1 0 1811
              7 1 1 339
              8 1 0 1107
              8 1 1 1043
              9 1 0 1435
              9 1 1 715
              10 1 0 1015
              10 1 1 1135
              1 2 0 2755
              1 2 1 1085
              2 2 0 2279
              2 2 1 1561
              3 2 0 3363
              3 2 1 477
              4 2 0 2322
              4 2 1 1518
              5 2 0 2238
              5 2 1 1602
              6 2 0 1672
              6 2 1 2168
              7 2 0 3836
              7 2 1 4
              8 2 0 2290
              8 2 1 1550
              9 2 0 3153
              9 2 1 687
              10 2 0 2779
              10 2 1 1061
              1 3 0 3508
              1 3 1 1548
              2 3 0 3144
              2 3 1 1912
              3 3 0 4417
              3 3 1 639
              4 3 0 2766
              4 3 1 2290
              5 3 0 2774
              5 3 1 2282
              6 3 0 2361
              6 3 1 2695
              7 3 0 5051
              7 3 1 5
              8 3 0 3304
              8 3 1 1752
              9 3 0 4112
              9 3 1 944
              10 3 0 3748
              10 3 1 1308
              1 4 0 7608
              1 4 1 3839
              2 4 0 7480
              2 4 1 3967
              3 4 0 9506
              3 4 1 1941
              4 4 0 6253
              4 4 1 5194
              5 4 0 5842
              5 4 1 5605
              6 4 0 5699
              6 4 1 5748
              7 4 0 11220
              7 4 1 227
              8 4 0 7172
              8 4 1 4275
              9 4 0 9216
              9 4 1 2231
              10 4 0 8786
              10 4 1 2661
              end
              label values season_adop season_adop
              label def season_adop 1 "Kharif 2017", modify
              label def season_adop 2 "Kharif 2018", modify
              label def season_adop 3 "Kharif 2019", modify
              label def season_adop 4 "Rabi 2017", modify
              label def season_adop 5 "Rabi 2018", modify
              label def season_adop 6 "Rabi 2019", modify
              label def season_adop 7 "Zaid 2017", modify
              label def season_adop 8 "Zaid 2018", modify
              label def season_adop 9 "Zaid 2019", modify
              label def season_adop 10 "Zaid 2020", modify
              label values category_cat category_cat
              label def category_cat 1 "Reserve", modify
              label def category_cat 2 "Low", modify
              label def category_cat 3 "Medium", modify
              label def category_cat 4 "High", modify
              
              
              egen denom = total(_freq), by(category_cat season_adop)
              egen numer = total(_freq * (adoption == 1)), by(category_cat season_adop)
              gen pc = 100 * numer/denom
              label var pc "% adopting"
              
              gen toshow = string(pc, "%2.0f") + "% (" + string(_freq) + ")"
              
              set scheme s1color
              
              * ssc inst mycolours
              mycolours
              
              
              twoway bar pc category_cat, barw(0.8) || scatter pc category_cat , xla(1 2 3) mla(toshow) mlabpos(12) ms(none) yla(, ang(h)) by(season_adop, legend(off))
              
              forval j = 1/4 {
                  local call `call'  bar`j'(fcol("`OK`j''") lcol("`OK`j''"*2))
              }
              
              * tabplot is from the Stata Journal and must be installed first
              
              tabplot category_cat season_adop [iw=pc] , showval(toshow, mlabsize(tiny)) ytitle("") xla(, ang(v)) xtitle("") subtitle("") separate(category_cat) `call'



              Click image for larger version

Name:	adoption.png
Views:	1
Size:	41.0 KB
ID:	1578871

              Last edited by Nick Cox; 25 Oct 2020, 05:59.

              Comment


              • #8
                Thanks a lot, this is really helpful.

                Comment

                Working...
                X