Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to create histogram for sub-samples by year averages?

    Dear all,

    I use Stata16 and will appreciate any help with creating comparative histogram by the 5-year averages.

    I tried a couple of histogram codes that I am familiar with but none could plot the histogram by region and 5-year averages.

    Please here is my data:
    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte r_id str26 region float(period year) double(gfd shr kme klf)
    1 "East Asia & Pacific"        1 1  8325.703482570614 221713507.51180354 312545535391.22266  9.884943733931932
    1 "East Asia & Pacific"        2 2  9962.102881752555  291994199.0371205  374296140523.1998 10.281044165113197
    2 "Europe & Central Asia"      1 1   24102.8204969368   538919571.786615 538401194443.77637 19.022858084454136
    2 "Europe & Central Asia"      2 2  25799.98590965781  654494471.3396385  581745338410.1671 23.309416601936224
    3 "Latin America & Caribbean"  1 1  9455.280580949593  80081604.26769644  76151826017.24394  6.093767684310448
    3 "Latin America & Caribbean"  2 2   9545.11053660585 106220095.91957214  98404982310.05264  7.355575546132802
    4 "Middle East & North Africa" 1 1  7373.541711250463   89946551.4761219  81922279949.52919 16.905612650025958
    4 "Middle East & North Africa" 2 2 7777.9616878091565  99544773.86252138 106317425321.04172 19.484252374025836
    5 "North America"              1 1  49559.27810369779           83961914       2.209272e+11  7.247645902684366
    5 "North America"              2 2  53454.28768786681           98022165 270455750000.00006  7.239042277091585
    6 "South Asia"                 1 1 1368.5156808676866  12190005.91249332        2.40586e+10  11.19949571592668
    6 "South Asia"                 2 2 1780.9751084779448 23091459.198763825        3.49224e+10 10.671254320423737
    7 "Sub-Saharan Africa"         1 1  1645.103030610854 38776587.938939884 29995171741.764538 10.925456728424516
    7 "Sub-Saharan Africa"         2 2 1682.6993478548713  45598841.52158082  32372205773.37823  10.55520839362721
    end
    label values year fiveyr
    label def fiveyr 1 "2010-2014", modify
    label def fiveyr 2 "2015-2019", modify
    ------------------ copy up to and including the previous line ------------------
    Thank you.
    Ngozi

  • #2
    A region and 5-year period define an observation in your dataset. If you want to compare means, consider a bar graph.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte r_id str26 region float(period year) double(gfd shr kme klf)
    1 "East Asia & Pacific"        1 1  8325.703482570614 221713507.51180354 312545535391.22266  9.884943733931932
    1 "East Asia & Pacific"        2 2  9962.102881752555  291994199.0371205  374296140523.1998 10.281044165113197
    2 "Europe & Central Asia"      1 1   24102.8204969368   538919571.786615 538401194443.77637 19.022858084454136
    2 "Europe & Central Asia"      2 2  25799.98590965781  654494471.3396385  581745338410.1671 23.309416601936224
    3 "Latin America & Caribbean"  1 1  9455.280580949593  80081604.26769644  76151826017.24394  6.093767684310448
    3 "Latin America & Caribbean"  2 2   9545.11053660585 106220095.91957214  98404982310.05264  7.355575546132802
    4 "Middle East & North Africa" 1 1  7373.541711250463   89946551.4761219  81922279949.52919 16.905612650025958
    4 "Middle East & North Africa" 2 2 7777.9616878091565  99544773.86252138 106317425321.04172 19.484252374025836
    5 "North America"              1 1  49559.27810369779           83961914       2.209272e+11  7.247645902684366
    5 "North America"              2 2  53454.28768786681           98022165 270455750000.00006  7.239042277091585
    6 "South Asia"                 1 1 1368.5156808676866  12190005.91249332        2.40586e+10  11.19949571592668
    6 "South Asia"                 2 2 1780.9751084779448 23091459.198763825        3.49224e+10 10.671254320423737
    7 "Sub-Saharan Africa"         1 1  1645.103030610854 38776587.938939884 29995171741.764538 10.925456728424516
    7 "Sub-Saharan Africa"         2 2 1682.6993478548713  45598841.52158082  32372205773.37823  10.55520839362721
    end
    label values year fiveyr
    label def fiveyr 1 "2010-2014", modify
    label def fiveyr 2 "2015-2019", modify
    
    gr hbar gfd, over(year) over(region) asyvars ytitle("Some description") bar(1, color(red%50)) bar(2, color(blue%50))scheme(s1color)


    Click image for larger version

Name:	Graph.png
Views:	1
Size:	49.5 KB
ID:	1608907



    Last edited by Andrew Musau; 11 May 2021, 07:39.

    Comment


    • #3
      I am so very grateful, Andrew...thanks a million!!!
      Please how can modify the command to plot the bar chart by variables by year and regions?
      For instance show GFD for 2010-2014, and 2015-2019 by regions such that 201-2014 are clustered together and same for 2015-2019. I believe that will show the comparison distinctly.
      Thanks.

      Comment


      • #4
        Thanks for the data example, I can't see any histogram code here and, perhaps more crucially, I don't understand why a histogram could be of use or interest for these data.

        There are four outcome variables and I don't know what any of them means, but here is some technique with klf

        Code:
        set scheme s1color 
        graph dot (asis) klf, over(year) over(region, sort(1) descending) linetype(line) lines(lc(gs12) lw(vthin)) ytitle(klf) ysc(alt)


        Conventionally, time goes on the horizontal but wanting to respect the region names implies otherwise. Similarly, asyvars is available as an option, but to me it makes the graph harder to understand. The main imperative is to explain klf

        Click image for larger version

Name:	dotchart.png
Views:	1
Size:	42.6 KB
ID:	1608916

        Comment


        • #5
          Originally posted by Ngozi ADELEYE View Post
          I am so very grateful, Andrew...thanks a million!!!
          Please how can modify the command to plot the bar chart by variables by year and regions?
          For instance show GFD for 2010-2014, and 2015-2019 by regions such that 201-2014 are clustered together and same for 2015-2019. I believe that will show the comparison distinctly.
          Thanks.
          The scales are too different, so you will need to generate the graphs one by one and then combine. Below I use grc1leg by Mead Over.

          Code:
          net install grc1leg2.pkg, from (http://digital.cgdev.org/doc/stata/MO/Misc/)
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input byte r_id str26 region float(period year) double(gfd shr kme klf)
          1 "East Asia & Pacific"        1 1  8325.703482570614 221713507.51180354 312545535391.22266  9.884943733931932
          1 "East Asia & Pacific"        2 2  9962.102881752555  291994199.0371205  374296140523.1998 10.281044165113197
          2 "Europe & Central Asia"      1 1   24102.8204969368   538919571.786615 538401194443.77637 19.022858084454136
          2 "Europe & Central Asia"      2 2  25799.98590965781  654494471.3396385  581745338410.1671 23.309416601936224
          3 "Latin America & Caribbean"  1 1  9455.280580949593  80081604.26769644  76151826017.24394  6.093767684310448
          3 "Latin America & Caribbean"  2 2   9545.11053660585 106220095.91957214  98404982310.05264  7.355575546132802
          4 "Middle East & North Africa" 1 1  7373.541711250463   89946551.4761219  81922279949.52919 16.905612650025958
          4 "Middle East & North Africa" 2 2 7777.9616878091565  99544773.86252138 106317425321.04172 19.484252374025836
          5 "North America"              1 1  49559.27810369779           83961914       2.209272e+11  7.247645902684366
          5 "North America"              2 2  53454.28768786681           98022165 270455750000.00006  7.239042277091585
          6 "South Asia"                 1 1 1368.5156808676866  12190005.91249332        2.40586e+10  11.19949571592668
          6 "South Asia"                 2 2 1780.9751084779448 23091459.198763825        3.49224e+10 10.671254320423737
          7 "Sub-Saharan Africa"         1 1  1645.103030610854 38776587.938939884 29995171741.764538 10.925456728424516
          7 "Sub-Saharan Africa"         2 2 1682.6993478548713  45598841.52158082  32372205773.37823  10.55520839362721
          end
          label values year fiveyr
          label def fiveyr 1 "2010-2014", modify
          label def fiveyr 2 "2015-2019", modify
          
          local graphs
          foreach var in gfd shr kme klf{
          gr hbar `var', over(year) over(region) asyvars bar(1, color(red%50)) bar(2, color(blue%50))scheme(s1color) saving(`var', replace)
          local graphs "`graphs' `var'.gph"
          }
          grc1leg2 `graphs', scheme(s1color)

          You need to adjust the labels, y-titles, etc., which I do not do. Therefore it may be easier to create each graph separately. Also, there may be benefits to sorting the graphs, but this may change the order across variables (by categories).
          Click image for larger version

Name:	Graph.png
Views:	1
Size:	79.7 KB
ID:	1608922

          Last edited by Andrew Musau; 11 May 2021, 08:20.

          Comment


          • #6
            Oh I see, this is so very helpful...thanks Andrew.
            I can use this.
            May God bless you real good.
            ...gracias!!!

            Comment


            • #7
              This is another take, now understanding that you want to see all four measures.

              I usually want to avoid "mean of", graph combine and messing with legends whenever possible.

              When some values are in billions and some definitely aren't, you may benefit from some work on units.

              This required some fiddling with ranks, and some other details, but it may help.

              multidot is from SSC. See also https://www.statalist.org/forums/for...ailable-on-ssc

              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input byte r_id str26 region float(period year) double(gfd shr kme klf)
              1 "East Asia & Pacific"        1 1  8325.703482570614 221713507.51180354 312545535391.22266  9.884943733931932
              1 "East Asia & Pacific"        2 2  9962.102881752555  291994199.0371205  374296140523.1998 10.281044165113197
              2 "Europe & Central Asia"      1 1   24102.8204969368   538919571.786615 538401194443.77637 19.022858084454136
              2 "Europe & Central Asia"      2 2  25799.98590965781  654494471.3396385  581745338410.1671 23.309416601936224
              3 "Latin America & Caribbean"  1 1  9455.280580949593  80081604.26769644  76151826017.24394  6.093767684310448
              3 "Latin America & Caribbean"  2 2   9545.11053660585 106220095.91957214  98404982310.05264  7.355575546132802
              4 "Middle East & North Africa" 1 1  7373.541711250463   89946551.4761219  81922279949.52919 16.905612650025958
              4 "Middle East & North Africa" 2 2 7777.9616878091565  99544773.86252138 106317425321.04172 19.484252374025836
              5 "North America"              1 1  49559.27810369779           83961914       2.209272e+11  7.247645902684366
              5 "North America"              2 2  53454.28768786681           98022165 270455750000.00006  7.239042277091585
              6 "South Asia"                 1 1 1368.5156808676866  12190005.91249332        2.40586e+10  11.19949571592668
              6 "South Asia"                 2 2 1780.9751084779448 23091459.198763825        3.49224e+10 10.671254320423737
              7 "Sub-Saharan Africa"         1 1  1645.103030610854 38776587.938939884 29995171741.764538 10.925456728424516
              7 "Sub-Saharan Africa"         2 2 1682.6993478548713  45598841.52158082  32372205773.37823  10.55520839362721
              end
              label values year fiveyr
              label def fiveyr 1 "2010-2014", modify
              label def fiveyr 2 "2015-2019", modify
              
              gen axis = _n
              gen label = region + " 2010-2014" if mod(_n, 2)
              replace label = "2015-2019" if missing(label)
              labmask axis, values(label)
              
              * rank on anything interesting that's not region name
              egen rank = rank(-klf) if year == 2, unique
              
              list region year klf rank
              
              replace rank = rank * 2
              bysort region (rank) : replace rank = rank[1] - 1 if _n == 2  
              
              list region year klf rank
              
              gen KME = kme / 1e9
              label var KME "kme (billions)"
              gen SHR = shr / 1e6
              label var SHR "shr (millions)"
              
              set scheme s1color
              multidot gfd SHR KME klf, over(axis) sort(rank) ytitle("") descending
              Click image for larger version

Name:	dotchart2.png
Views:	1
Size:	56.0 KB
ID:	1608954

              Last edited by Nick Cox; 11 May 2021, 09:42.

              Comment


              • #8
                More tweaks:

                Different marker symbols.

                Removing the repetition of "20".

                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input byte r_id str26 region float(period year) double(gfd shr kme klf)
                1 "East Asia & Pacific"        1 1  8325.703482570614 221713507.51180354 312545535391.22266  9.884943733931932
                1 "East Asia & Pacific"        2 2  9962.102881752555  291994199.0371205  374296140523.1998 10.281044165113197
                2 "Europe & Central Asia"      1 1   24102.8204969368   538919571.786615 538401194443.77637 19.022858084454136
                2 "Europe & Central Asia"      2 2  25799.98590965781  654494471.3396385  581745338410.1671 23.309416601936224
                3 "Latin America & Caribbean"  1 1  9455.280580949593  80081604.26769644  76151826017.24394  6.093767684310448
                3 "Latin America & Caribbean"  2 2   9545.11053660585 106220095.91957214  98404982310.05264  7.355575546132802
                4 "Middle East & North Africa" 1 1  7373.541711250463   89946551.4761219  81922279949.52919 16.905612650025958
                4 "Middle East & North Africa" 2 2 7777.9616878091565  99544773.86252138 106317425321.04172 19.484252374025836
                5 "North America"              1 1  49559.27810369779           83961914       2.209272e+11  7.247645902684366
                5 "North America"              2 2  53454.28768786681           98022165 270455750000.00006  7.239042277091585
                6 "South Asia"                 1 1 1368.5156808676866  12190005.91249332        2.40586e+10  11.19949571592668
                6 "South Asia"                 2 2 1780.9751084779448 23091459.198763825        3.49224e+10 10.671254320423737
                7 "Sub-Saharan Africa"         1 1  1645.103030610854 38776587.938939884 29995171741.764538 10.925456728424516
                7 "Sub-Saharan Africa"         2 2 1682.6993478548713  45598841.52158082  32372205773.37823  10.55520839362721
                end
                label values year fiveyr
                label def fiveyr 1 "2010-2014", modify
                label def fiveyr 2 "2015-2019", modify
                
                gen axis = _n
                gen label = region + " 10-14" if mod(_n, 2)
                replace label = "15-19" if missing(label)
                labmask axis, values(label)
                
                * rank on anything interesting that's not region name
                egen rank = rank(-klf) if year == 2, unique
                
                list region year klf rank
                
                replace rank = rank * 2
                bysort region (rank) : replace rank = rank[1] - 1 if _n == 2  
                
                list region year klf rank
                
                gen KME = kme / 1e9
                label var KME "kme (billions)"
                gen SHR = shr / 1e6
                label var SHR "shr (millions)"
                
                set scheme s1color
                multidot gfd SHR KME klf, over(axis) sort(rank) ytitle("") descending sepby(year) ms(X Oh) msize(medlarge medsmall)
                Click image for larger version

Name:	dotchart3.png
Views:	1
Size:	46.7 KB
ID:	1608980


                Comment


                • #9
                  Awesome!!! This is so VERY helpful.
                  Thank you so much, Nick!!!
                  You have always been coming to our rescue.
                  God bless you.

                  Comment

                  Working...
                  X