Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to graph bar of categorical variables

    Dear Stata users,

    Suppose there is a categorical variable which has values of 1, 2, 3 representing different groups. If I want to display frequencies of different groups, I can use -catplot-(SSC) command. However, If there are several similar categorical variables and I want to graph frequencies of different groups of all these categorical variables, it seems that the -catplot- command do not allow varlist. To make it more understandable, you can try these codes, and what I want is to graph all three resulting bar plots in one graph. Example data is provided in the second code.

    Code:
    catplot a1, blabel(bar, format(%9.1f)) name(a1)
    catplot b1, blabel(bar, format(%9.1f)) name(b1)
    catplot c1, blabel(bar, format(%9.1f)) name(c1)
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(a1 b1 c1)
    2 1 2
    1 2 1
    2 2 2
    2 2 1
    2 2 2
    2 1 1
    1 2 2
    3 3 3
    2 2 2
    2 2 1
    1 2 2
    1 1 1
    3 3 3
    3 3 3
    3 3 3
    3 3 3
    1 1 2
    2 1 1
    2 1 2
    3 3 3
    1 2 2
    2 2 1
    1 1 2
    1 1 1
    1 2 2
    2 2 1
    1 2 2
    1 2 1
    1 2 2
    2 2 1
    end
    label values a1 size
    label values b1 size
    label values c1 size
    label def size 1 "big", modify
    label def size 2 "small", modify
    label def size 3 "medium", modify

  • #2
    add: or we can generalize it, what I want is to realize it in one command, and we can perhaps pick out one group only to display.
    Click image for larger version

Name:	cat.png
Views:	1
Size:	47.0 KB
ID:	1479421

    Comment


    • #3
      catplot takes a varlist: one, two or three categorical variables, as its help explains. What it doesn't do is produce several graphs at once in the way that you wish with the data structure you have.

      Here is some technique for you:

      Code:
      rename (a1 b1 c1) (size=) 
      gen long id = _n 
      reshape long size, i(id) j(which) string 
      
      catplot size which , blabel(bar) 
      
      catplot which if size == 1, blabel(bar) l1title(# of big values)
      I don't know why you are using %9.1f as a display format for counts. I recommend against that.

      Comment


      • #4
        Dear Nick Cox, thanks as usual. Your solution is perfect, athough it imply that I must reshape ahead. What I mean by saying catplot "do not allow varlist" is that varlist behind catplot is not as same as varlist behind graph bar, the former will graph tabulation of these varlist [catvar1, catvar2, catvar3], and the latter will graph bars of varlist as juxtaposed variable [yvar1, yvar2, yvar3, etc.]. I use %9.1f to display percents in those bar plots.

        Comment


        • #5
          Below is resulting plot with Nick's codes.

          Click image for larger version

Name:	cat2.png
Views:	1
Size:	50.0 KB
ID:	1479435

          Comment


          • #6
            Sure; graph bar and catplot syntax is not the same, and what they can do easily is not the same; otherwise there would be little point to catplot.

            catplot was written when in 2003 the new graphics in Stata 8 to my surprise did not make these kinds of plots easier. But it is just a wrapper for graph hbar (in the example here).

            I then changed the syntax in 2010.

            Then in 2014 StataCorp added more support for these kinds of graphs. So,

            1. I wasn't prescient in 2003 or 2010 to know what StataCorp would do when and what syntax addition they would make.

            2. Even if I had guessed the syntax now on offer I couldn't have implemented it in a community-contributed command.

            All that aside, I think you need a different data structure to do what you want to do nicely with graph hbar in just the same way.

            Comment


            • #7
              I notice that Stata 15 add (percent) in graph bar command [graph bar (percent) varlists], however It seems that graph bar command still cannot realize what I want as described in this thread.

              Comment


              • #8
                The big picture here is that you want a composite display with the same kind of graph repeated for different variables. That is naturally an entirely reasonable question, as are the answers:

                1. Use the same kind of syntax and graph combine (presumably what you used for #2)

                2. A different data structure will make this easier.

                It seems that you would like a third answer:

                3. Here is the different syntax so that you get what you want in one graph call.

                Fair expectation, but I don't think a different syntax would be possible that wasn't itself horrendously complicated. Naturally, you could write a command to do what you want.

                All that aside, I notice that in #5 you have the same graph twice in essence. All your groups are 30, so whether you show frequencies or % in group, the bars are the same.

                You should not need to do that. Here's another take using tabplot (Stata Journal) . I give all the code starting with your data example, so no one is in doubt what starts where. Be warned that aligning text and bar is a dark art.

                BTW I would tend to think that big -- medium -- small was the right sequence.

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input byte(a1 b1 c1)
                2 1 2
                1 2 1
                2 2 2
                2 2 1
                2 2 2
                2 1 1
                1 2 2
                3 3 3
                2 2 2
                2 2 1
                1 2 2
                1 1 1
                3 3 3
                3 3 3
                3 3 3
                3 3 3
                1 1 2
                2 1 1
                2 1 2
                3 3 3
                1 2 2
                2 2 1
                1 1 2
                1 1 1
                1 2 2
                2 2 1
                1 2 2
                1 2 1
                1 2 2
                2 2 1
                end
                label values a1 size
                label values b1 size
                label values c1 size
                label def size 1 "big", modify
                label def size 2 "small", modify
                label def size 3 "medium", modify
                
                rename (a1 b1 c1) (size=)
                gen long id = _n
                reshape long size, i(id) j(which) string
                
                bysort which size : gen freq = _N
                by which : gen pc = 100 * freq / _N
                gen show = string(freq) + "; " + string(pc, "%2.1f") + "% "
                
                tabplot size, by(which, col(1) compact note("")) ///
                showval(show, offset(0.18)) ///
                subtitle(, pos(9) nobox nobexpand fcolor(none)) ///
                xtitle("") ytitle("") ///
                horizontal scheme(s1color) bfcolor(green*0.2) blcolor(green)
                Click image for larger version

Name:	chen.png
Views:	1
Size:	48.7 KB
ID:	1479449


                Comment


                • #9
                  Dear Nick Cox, thank you very much. Maybe my expression was unclear, what I report in #4, #5 and #7 are:
                  1. It is difficult (or impossible) to graph bar of categorial variables as describe in #1 with official graph (h)bar command in Stata.
                  2. The -catplot- command that you written will do that job partly, that is to say, it can graph bar of categorial variable, but only one at a time.
                  3. Up to now it seems that I can (only) firstly reshape my data and then use -catplot- command (in one graph call) to attain the resulting plot that I desire.
                  4. So the -catplot- command do well and I must reshape data structure which is a deserved step.
                  5. In #5 I provided a picture composed of two parts. I only want to using that picture to suggest that (powerful) -catplot- can give whatever you want, frequencies, percents, percents given distinct category, etc. And yes, whether I show frequencies or % in group, the bars are the same.
                  ps. -tabplot- is also fabulous, however in this case I prefer using -catplot-.

                  Comment


                  • #10
                    I find there's a "bug" when I use -if- qualifier and get percent display. Example is below, we can see that when add if size==1, the percent display in subgraph in bottom-left is somewhat weird.

                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input byte(a1 b1 c1)
                    2 1 2
                    1 2 1
                    2 2 2
                    2 2 1
                    2 2 2
                    2 1 1
                    1 2 2
                    3 3 3
                    2 2 2
                    2 2 1
                    1 2 2
                    1 1 1
                    3 3 3
                    3 3 3
                    3 3 3
                    3 3 3
                    1 1 2
                    2 1 1
                    2 1 2
                    3 3 3
                    1 2 2
                    2 2 1
                    1 1 2
                    1 1 1
                    1 2 2
                    2 2 1
                    1 2 2
                    1 2 1
                    1 2 2
                    2 2 1
                    end
                    label values a1 size
                    label values b1 size
                    label values c1 size
                    label def size 1 "big", modify
                    label def size 2 "small", modify
                    label def size 3 "medium", modify
                    
                    gen id=_n
                    ren (a1 b1 c1) (size=)
                    reshape long size, i(id) j(which) string
                    
                    graph drop _all
                    catplot size which if size == 1, blabel(bar) l1title(# of big values) title("frequency of size==1") name(g1)
                    catplot size which if size == 1, percent blabel(bar, format(%9.1f)) l1title(# of big values) title("percent of size==1") name(g2)
                    catplot size which, percent(which) blabel(bar, format(%9.1f)) title("percent of both size") name(g3)
                    graph combine g1 g2, name(p1) row(2)
                    graph combine p1 g3, name(p2) row(1)
                    Click image for larger version

Name:	c2.png
Views:	1
Size:	37.8 KB
ID:	1479506

                    Comment


                    • #11
                      Percents (with sum 100) displayed in subgraph in bottom-left are the total frequency for size == 1. This implies that I still cannot regard yvars as separate or put it more correctly irrelevant variables.

                      Comment

                      Working...
                      X