Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Histogram with multiple bars

    Dear All,
    I have a dataset of two variables like this

    Code:
    list in 1/10
    
         +-------------------------------------------------------+
         | user_id          education                     survey |
         |-------------------------------------------------------|
      1. |     188                PHD                   Only pre |
      2. |     362           graduate              Only Advanced |
      3. |     363   secondary school   Both basic and andvanced |
      4. |     364      undergraduate                   Only pre |
      5. |     365   secondary school   Both basic and andvanced |
         |-------------------------------------------------------|
      6. |     366   secondary school   Both basic and andvanced |
      7. |     367           graduate                 only basic |
      8. |     368   secondary school   Both basic and andvanced |
      9. |     369      undergraduate   Both basic and andvanced |
     10. |     370   secondary school   Both basic and andvanced |
         +-------------------------------------------------------+
    I want to create an histogram graph like this (I did it in Excel). Thank you all for the help!
    Kind regards,
    William
    Click image for larger version

Name:	Immagine 2022-07-12 103944.jpg
Views:	1
Size:	45.4 KB
ID:	1673154

  • #2
    Copy and paste the result of

    Code:
    dataex user_id education survey

    Comment


    • #3
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int user_id byte education float survey
      188 4 1
      362 3 3
      363 1 4
      364 2 1
      365 1 4
      366 1 4
      367 3 2
      368 1 4
      369 2 4
      370 1 4
      371 1 4
      372 3 2
      373 1 4
      374 3 4
      375 1 4
      376 1 4
      378 1 4
      379 3 3
      380 3 4
      381 3 4
      382 3 4
      383 3 4
      384 3 4
      385 2 4
      387 1 4
      388 3 4
      389 3 4
      391 1 2
      392 1 4
      393 3 4
      395 3 4
      396 3 4
      397 3 4
      398 3 2
      399 1 4
      400 1 2
      401 3 4
      402 3 4
      405 2 3
      407 3 4
      408 1 1
      409 3 4
      410 2 2
      411 1 4
      412 3 2
      413 3 4
      414 3 1
      415 1 2
      416 1 4
      417 3 4
      418 1 3
      419 1 4
      420 1 2
      421 3 4
      423 1 4
      425 1 1
      426 1 4
      427 1 1
      428 2 1
      429 1 4
      431 3 2
      432 1 2
      433 3 4
      434 3 1
      438 3 4
      440 3 1
      441 1 1
      442 1 2
      444 3 4
      445 1 4
      446 1 2
      447 1 1
      448 1 2
      450 3 4
      451 1 3
      452 2 2
      453 3 4
      454 1 4
      456 2 4
      457 3 4
      460 3 3
      461 4 4
      462 3 4
      463 1 2
      464 1 1
      465 1 1
      466 1 1
      467 1 1
      470 1 1
      471 1 2
      472 1 3
      475 3 1
      481 3 1
      482 3 2
      483 1 2
      487 3 2
      489 3 1
      490 2 4
      493 3 2
      494 3 1
      end
      label values education education
      label def education 1 "high school", modify
      label def education 2 "undergraduate", modify
      label def education 3 "graduate", modify
      label def education 4 "PHD", modify
      label values survey survey
      label def survey 1 "Only pre", modify
      label def survey 2 "only basic", modify
      label def survey 3 "only advanced", modify
      label def survey 4 "basic+advanced", modify

      Comment


      • #4
        What does bar height encode? Frequency or percent (and if percent, relative to what?).

        As a variation on #2 showing the results of

        Code:
        contract education survey 
        dataex
        would be especially helpful.

        Comment


        • #5
          The bar height encodes frequency.

          contract education survey

          . dataex

          ----------------------- copy starting from the next line -----------------------
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input byte education float survey byte _freq
          1 1 27
          1 2 22
          1 3 10
          1 4 45
          2 1  7
          2 2  4
          2 3  2
          2 4  9
          3 1 11
          3 2 18
          3 3  6
          3 4 33
          4 1  2
          4 4  1
          end
          label values education education
          label def education 1 "high school", modify
          label def education 2 "undergraduate", modify
          label def education 3 "graduate", modify
          label def education 4 "PHD", modify
          label values survey survey
          label def survey 1 "Only pre", modify
          label def survey 2 "only basic", modify
          label def survey 3 "only advanced", modify
          label def survey 4 "basic+advanced", modify

          Comment


          • #6
            Thanks for the data examples. You can get quite close to the Excel output with graph bar.


            Here I made some alterations to value labels for consistency and correctness. The colours used are just examples but to distinguish on an ordered scale from high school to Ph.D. I recommend an ordered series of colours.

            I find the second form using tabplot from the Stata Journal to be generally clear, as a legend, although widely conventional, obliges mental back and forth from the reader.

            For tabplot the most focused search for resourcesis


            Code:
            . search gr0066, entry
            
            Search of official help files, FAQs, Examples, and Stata Journals
            
            SJ-20-3 gr0066_2  . . . . . . . . . . . . . . . .  Software update for tabplot
                    (help tabplot if installed) . . . . . . . . . . . . . . . .  N. J. Cox
                    Q3/20   SJ 20(3):757--758
                    added new options frame() and frameopts() allowing framing
                    of bars and so-called thermometer plots or charts
            
            SJ-17-3 gr0066_1  . . . . . . . . . . . . . . . .  Software update for tabplot
                    (help tabplot if installed) . . . . . . . . . . . . . . . .  N. J. Cox
                    Q3/17   SJ 17(3):779
                    added options for reversing axis scales; improved handling of
                    axis labels containing quotation marks
            
            SJ-16-2 gr0066  . . . . . .  Speaking Stata: Multiple bar charts in table form
                    (help tabplot if installed) . . . . . . . . . . . . . . . .  N. J. Cox
                    Q2/16   SJ 16(2):491--510
                    provides multiple bar charts in table form representing
                    contingency tables for one, two, or three categorical variables
            where the 2016 paper is (for the foreseeable future) the best overview and the latest update (at the time of writing that in 2020) is the best place to download the code and help file. There is an overview at https://www.statalist.org/forums/for...updated-on-ssc

            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input byte education float survey byte _freq
            1 1 27
            1 2 22
            1 3 10
            1 4 45
            2 1  7
            2 2  4
            2 3  2
            2 4  9
            3 1 11
            3 2 18
            3 3  6
            3 4 33
            4 1  2
            4 4  1
            end
            label values education education
            label def education 1 "high school", modify
            label def education 2 "undergraduate", modify
            label def education 3 "graduate", modify
            label def education 4 "Ph.D.", modify
            label values survey survey
            label def survey 1 "only pre", modify
            label def survey 2 "only basic", modify
            label def survey 3 "only advanced", modify
            label def survey 4 "basic + advanced", modify
            
            graph bar _freq, over(education) over(survey, descending) asyvars ytitle(frequency) yla(, ang(h)) name(G1, replace) ///
            bar(1, fcolor(red*0.6) lcolor(red)) bar(2, fcolor(red*0.2) lcolor(red)) bar(3, fcolor(blue*0.2) lcolor(blue)) bar(4, fcolor(blue*0.6) lcolor(blue))
            
            tabplot education survey [w=_freq], subtitle(frequency) yasis showval name(G2, replace) xsc(reverse) yla(, noticks) separate(education) bar1(fcolor(red*0.6) lcolor(red)) bar2(fcolor(red*0.2) lcolor(red)) bar3(fcolor(blue*0.2) lcolor(blue)) bar4(fcolor(blue*0.6) lcolor(blue))
            Click image for larger version

Name:	rossi_G1.png
Views:	1
Size:	24.3 KB
ID:	1673184


            Click image for larger version

Name:	rossi_G2.png
Views:	1
Size:	22.6 KB
ID:	1673185

            Comment


            • #7
              The most recent update to tabplot is in fact at Stata Journal 22(2) (just published) and will no doubt be mentioned in key files when Stata is next updated.


              Code:
              net describe gr0066_3
              gets you a link to that update.

              Comment

              Working...
              X