Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Horizontal Bars with Catplot Command and Categorical Vars

    Dear Stata - Community,

    I am using the "catplot" command to create a graph with multiple stacked horizontal bars. The four categorical variables being used are having the same categories. In the end, I would like to have a graph similar to the one attached in this post.

    Yet, after reshaping my data, I am still facing the problem that I have too many binary variables (indicating the respective categorical variable) to run the catplot command (please see a simplified version of my code below). I tried to group these variables, modify the setting of the command and using other command, but in the end it still did not work. Can anyone help?

    Thanks for any help in advance!

    Warm greetings, Bianca

    Code:
    ** four categorical vars with same categories
    local vars1 x1 x2 x3 x4
    
    ** gen binary variables for each category of the four vars
            foreach var of local vars1{
            tab `var' , gen(`var'_cat_)
            }
            
    ** reshape data
    gen id = _n
    
    reshape long x1_cat_ x2_cat_ x3_cat_ x4_cat_,  i(id)
    
    ** define labels of newly created _j var
    label variable _j ""
    label define  _j 1 "Strongly agree" 2 "Agree" 3 "Neither agree nor disagree" 4 "Disagree" 5"Strongly disagree" 6 "No answer"
    label value _j _j
    
    ** catplot
    * for one var, the following command works perfectly
    catplot  _j ,  over(x1_cat_) percent(x1_cat_)  asyvars stack
    
    * yet, I need a command to combine all vars in one hbar --> the following command would be needed, but does not work (too many variables)
    catplot  _j ,  over(x1_cat_ x2_cat_ x3_cat_ x4_cat_) percent(x1_cat_ x2_cat_ x3_cat_ x4_cat_)  asyvars stack
    Click image for larger version

Name:	Unbenannt.PNG
Views:	1
Size:	48.3 KB
ID:	1605620

  • #2
    catplot is from SSC (FAQ Advice #12). Your data example is incomplete as it does not generate the x variables. Here is one way:

    Code:
    clear
    set seed 04242021
    set obs 20
    ** four categorical vars with same categories
    local vars1 x1 x2 x3 x4
    foreach var of local vars1{
        gen `var'= runiformint(1,6)
    }
    gen id=_n
    reshape long x, i(id) j(which)
    tab x which
    tab x, gen(xx)
    reshape long xx, i(id which) j(cat)
    label define  cat 1 "Strongly agree" 2 "Agree" 3 "Neither agree nor disagree" 4 "Disagree" 5"Strongly disagree" 6 "No answer"
    label values cat cat
    keep if xx
    catplot cat, over(which) percent(which) l1title("") blab(bar, pos(inside)) asyvars stack scheme(s1color)
    Res.:

    Code:
    . tab x which
    
               |                    which
             x |         1          2          3          4 |     Total
    -----------+--------------------------------------------+----------
             1 |         4          1          2          5 |        12
             2 |         3          2          7          5 |        17
             3 |         4          5          5          3 |        17
             4 |         2          1          1          1 |         5
             5 |         3          6          2          1 |        12
             6 |         4          5          3          5 |        17
    -----------+--------------------------------------------+----------
         Total |        20         20         20         20 |        80
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	21.5 KB
ID:	1605633

    Last edited by Andrew Musau; 24 Apr 2021, 17:53.

    Comment


    • #3
      The frst error that graph is likely to notice is that you are reaching through catplot to call up the option of graph hbar


      Code:
       
       over(x1_cat_ x2_cat_ x3_cat_ x4_cat_)
      However, the over() option allows only one variable name. Hence the error is in using graph hbar.


      I stole Andrew Musau's helpful example. I The catplot above can be got a little more directly (see code below) but my main concern is to show tabplot from Stata Journal as an alternative. No design is perfect here, but -- although popular -- stacking doesn't always seem very helpful.

      Code:
      clear
      set seed 04242021
      set obs 20
      ** four categorical vars with same categories
      local vars1 x1 x2 x3 x4
      foreach var of local vars1{
          gen `var'= runiformint(1,6)
      }
      gen id=_n
      
      list 
      
      reshape long x, i(id) j(which)
      
      label define  x 1 "Strongly agree" 2 "Agree" 3 "Neither agree nor disagree" 4 "Disagree" 5"Strongly disagree" 6 "No answer"
      label values x x 
      
      set scheme s1color 
      
      catplot x which, percent(which) l1title("") blab(bar, pos(inside)) asyvars stack 
      
      tabplot x which, percent(which) showval(format(%2.0f)) separate(x) ytitle("") xtitle(question) subtitle(percent) ///
      bar1(bfcolor(blue)) bar2(bcolor(blue*0.4)) bar3(bcolor(gs8)) bar4(bcolor(red*0.4)) bar5(bcolor(red)) bar6(bcolor(teal))
      Click image for larger version

Name:	tabplot2.png
Views:	1
Size:	33.5 KB
ID:	1605671

      Comment


      • #4
        Thanks a lot to both of you - your answers helped a lot!

        Comment


        • #5
          Dear Andrew and/or Nick,

          may I ask you another question about the catplot example:

          Is there any way that I can add another aggregated hbar to the catplot created above? So that in our example, these two codes are written in one:

          Code:
          catplot x which, percent(which) l1title("") blab(bar, pos(inside)) asyvars stack
          
          catplot x, percent l1title("") blab(bar, pos(inside)) asyvars stack
          I tried to adjust the variable "which" and also combine these two catplots with (1) "grc1leg" and (2) "graph combine", but it didn't work.


          Thanks a lot for any help in advance!

          Warm greetings, Bianca

          Comment


          • #6
            Code:
            expand 2, g(new)
            replace which=99 if new
            catplot x which, percent(which) l1title("") blab(bar, pos(inside)) asyvars stack
            with appropriate labeling of the categorical axis (99= "Total").

            Comment


            • #7
              it didn't work
              Compare our FAQ Advice:

              Never say just that something "doesn't work" or "didn't work", but explain precisely in what sense you didn't get what you wanted.
              graph combine will put two graphs side by side, or on top of each other, but it won't look good. I can't speak precisely about what you did with grc1leg (from http://www.stata.com/users/vwiggins, as you are asked to explain) because you don't tell us or show any results.


              But indeed, I think there is a better solution than what I guess you did, as discussed at length in https://www.stata-journal.com/articl...article=gr0058

              Temporarily, double up the dataset. Then relabel the copy as some kind of "all" category. Suppose you had k categories before. Now you have k + 1.

              Here's some code, extending the previous example.

              Code:
              clear
              set seed 04242021
              set obs 20
              ** four categorical vars with same categories
              local vars1 x1 x2 x3 x4
              foreach var of local vars1{
                  gen `var'= runiformint(1,6)
              }
              gen id=_n
              
              list 
              
              reshape long x, i(id) j(which)
              
              label define  x 1 "Strongly agree" 2 "Agree" 3 "Neither agree nor disagree" 4 "Disagree" 5"Strongly disagree" 6 "No answer"
              label values x x 
              
              preserve 
              
              expand 2, gen(new)
              replace which = 5 if new 
              label define which 5 "all questions"
              label val which which 
              
              set scheme s1color 
              
              catplot x which, percent(which) l1title("") blab(bar, pos(inside)) asyvars stack 
              
              tabplot x which, percent(which) showval(format(%2.0f)) separate(x) ytitle("") xtitle(question) subtitle(percent) ///
              bar1(bfcolor(blue)) bar2(bcolor(blue*0.4)) bar3(bcolor(gs8)) bar4(bcolor(red*0.4)) bar5(bcolor(red)) bar6(bcolor(teal))
              
              restore

              Comment


              • #8
                Hi guys! I need help with coding. I am making a horizontal bar chart with the ff categories on the y-axis:
                Each cat is further brokendown into 2000 and 2015
                C 1 (2000, 2015)
                C 2 (2000, 2015)
                C 3 (2000, 2015)
                C4 (2000, 2015)
                all categories C1-C3 are all in one var C_sub, however, C4 is in another var D_sub. My main problem is how to add C4 in the y-axis. by the way, C4 is a category for values from C1-C3 and I only need 1 category "Emerging" which classifies C1-C3. (please see attached file)

                My x-axis has 3 categories (all perecentages 0 20 40 60 80 100)
                pop above 10M
                pop 10-5
                pop less than 5
                CODE YEAR POPABOVE10M POP5TO10M POPLESSTHAN5 C_SUB D_SUB
                1 1900 20% 40% 10% C1 EMERGING
                1 1950 30% 70% 20% C2 NOT
                1 2000 50% 60% 30% C3 EMERGING
                1 2015 60% 20% 40% C2 NOT
                2 1900 10% 10% 20% C1 EMERGING
                2 1950 20% 90% 40% C1 EMERGING
                2 2000 30% 400% 40% C3 NOT
                2 2015 40% 60% 70% C3 NOT
                3 1900 20% 10% 60% C2 EMERGING
                3 1950 40% 20% 20% C2 EMERGING
                3 2000 80% 30% 10% C2 NOT
                3 2015 90% 40% 90% C1 NOT
                4 1900 10% 20% 30% C1 NOT
                4 1950 40% 10% 50% C1 EMERGING
                4 2000 70% 10% 60% C2 EMERGING
                4 2015 60% 20% 10% C2 EMERGING
                5 1900 20% 30% 20% C3 NOT
                5 1950 10% 40% 30% C3 EMERGING
                5 2000 90% 20% 50% C1 NOT
                5 2015 70% 40% 60% C2 EMERGING
                My code w/o C4 bec I don't know how: (kinda works, altho C3 just ended at 80%, 81-100 space is blank)
                graph hbar pop10m pop5to10m pop<5m if year==2000 | if year==2015, over(year) over(C_sub) nofill asyvars stack

                My code with C4 which I need: (DOES NOT WORK)

                graph hbar pop10m pop5to10m pop<5m if year==2000 | year==2015, over(year) over(C_sub)) nofill asyvars stack || hbar pop10m pop5to10m pop<5m if year==2000 | year==2015 & D_sub == "Emerging", over(year) over(dev_sub) nofill asyvars stack

                I need something like this...
                C1 2000 yellow blue blue green
                2015
                C2 2000
                2015
                C3 2000
                2015
                C4 2000
                2015
                0 20 40 60 80 100
                blue green yellow
                POP>10M POP5-10 POP<5
                Please help. Thanks.

                Comment


                • #9
                  =8 This is very hard for me to follow. Please visit https://www.statalist.org/forums/help#stata and post a data example using dataex.

                  Comment


                  • #10
                    Dear Andrew and Nick,

                    I am sorry to bother you again, but I am still having some troubles with my graphs and hope you can help me again. It is all about very long labels which I can shorten a bit but unfortunately not enough. Therefore, I would love to split the labels into another row if they are too long. I noticed that this can be easiliy done in a legend and also when the command over() is used. However, for my tabplots and catplots I cannot use the over() specification and have no legend. Following you can see the simplified example from above plus my main attempts. Thanks for any help!!! Greetings from Germany, Bianca

                    Code:
                    clear
                    set seed 04242021
                    set obs 20
                    ** four categorical vars with same categories
                    local vars1 x1 x2 x3 x4
                    foreach var of local vars1{
                        gen `var'= runiformint(1,6)
                    }
                    gen id=_n
                    
                    reshape long x, i(id) j(which)
                    
                    label define  x 1 "Strongly agree" 2 "Agree" 3 "Neither agree nor disagree" 4 "Disagree" 5"Strongly disagree" 6 "No answer"
                    label values x x
                    
                    
                    label define which 1 "First very long label" 2 "Second very long label" 3 "Third very long label" 4 "Fourth very long label"
                    label values which which
                    
                    *First try
                    splitvallabels which
                    tabplot x which, percent(which) xlabel(which, relabel(`r(relabel)'))
                    
                    *Second try
                    tabplot x which, percent(which) xlabel(1 "First very long" "label" 2 "Second very long label" 3 "Third very long label" 4 "Fourth very long label")
                    
                    /*Other attempts - graph editor:
                        (1) adjusting label size to small
                        (2) changing label position (45°) */
                        
                    ********************************************************************************
                    * Same issues with the catplot command (even though it works for these rather short labels)
                    catplot x which, percent(which) asyvars stack

                    Comment


                    • #11
                      tabplot is from Stata Journal, as pointed out in #3.

                      splitvallabels is from SSC, as you are asked to explain (FAQ Advice #12).

                      Even more minutely, over() is an option not a command.

                      The issue is splitting long axis labels you want to show on two or more lines.

                      Thanks for the almost reproducible examples. I had to install splitvallabels. on my current machine. I already have tabplot installed. In general, a user needs both installed.

                      The first try fails because
                      xlabel() which is just a standard twoway option does not support a relabel() suboption. That was a guess based on a hope that some syntax allowed within over() for graph dot, graph bar and graph hbar would apply here, but it doesn't. It's undoubtedly confusing that those commands share some options with twoway, but not all.

                      So you need to use the result of
                      splitvallabels to define a new set of value labels.

                      The second try fails because you need compound double quotes around the double quotes.

                      Here is revised code. Note how you can reclaim some space by omitting axis titles.

                      Code:
                      clear
                      set seed 04242021
                      set obs 20
                      ** four categorical vars with same categories
                      local vars1 x1 x2 x3 x4
                      foreach var of local vars1 {
                          gen `var'= runiformint(1,6)
                      }
                      gen id=_n
                      
                      reshape long x, i(id) j(which)
                      
                      label define  x 1 "Strongly agree" 2 "Agree" 3 "Neither agree nor disagree" 4 "Disagree" 5"Strongly disagree" 6 "No answer"
                      label values x x
                      
                      
                      label define which 1 "First very long label" 2 "Second very long label" 3 "Third very long label" 4 "Fourth very long label"
                      label values which which
                      
                      set scheme s1color 
                      
                      *First try
                      splitvallabels which
                      label def newwhich `r(relabel)'
                      label val which newwhich 
                      
                      tabplot x which, percent(which) name(G1, replace) ytitle("") xtitle("")
                       
                      *Second try
                      tabplot x which, percent(which) xlabel(1 `" "First very long" "label" "'  2 "Second very long label" 3 "Third very long label" 4 "Fourth very long label") name(G2, replace) ytitle("") xtitle("")





                      Comment


                      • #12
                        Hello everyone,

                        I am writing here because mine is a similar problem to the one above. The five categorical variables used have the same categories. I attach the graph reproduced by the following code:

                        **graph bar stacked with important/not important
                        loc i = 1
                        foreach var in Health Plans Children Shareholders Offer {
                        **options for graph
                        if `i' == 1 loc legg `"legend(none, region(lcolor(none)))"'
                        if `i' !=1 loc axis `"ylabel(none, nolabels nogrid) "'
                        if `i' == 1 loc axis `"yla(0(25)100, nogrid) "'
                        **graph
                        catplot `var', percent asyvars stack ///
                        bar(1, color(ltblue) lwidth(medium)) ///
                        bar(2, color(ebblue) lwidth(medium)) ///
                        legend(label(1 "Not Important") label(2 "Important") ///
                        ring(0) pos(12) col(1) size(small)) ///
                        blabel(bar, format(%3.1f) size(small) position(inside) color(black)) ///
                        name(g`i', replace)
                        loc plots `"`plots' g`i' "'
                        loc `++i'
                        }


                        gr combine `plots', colfirst ycommon cols(1) imargin(zero) graphregion(margin(large))

                        This graph has multiple legends and x-axes, I would like to have an unique legend under the x-axis and I would also like to have only one x-axis.

                        I have tried several approach:

                        For the legend:

                        1) gr combine doesn't allow an option "legend" : (option legend() not allowed r(198)).

                        2) Turn the legend(off) is not working. I used this approach: graph display, legend(label(1 "Not Important") label(2 "Important") ring(0) pos(12) col(1) size(small)), but I doesn't modify the code.

                        For the x-axis:

                        1) I tried xcommon as for ycommon. Apparently it exists because it does not give me an error but does not change the appearance of the graph as I would like.

                        Could you please help me?

                        Thank you very much!

                        Beatrice
                        Attached Files

                        Comment


                        • #13
                          #12 doesn't include a data example. However, one can be constructed easily that would give exactly the same graph.

                          The strategy in #12 is to produce distinct graphs, combine them and then try to reduce redundancy and clutter. That's hard work and doesn't yield the desired result. I doubt you would get all the way, but there is an easier approach.

                          A better strategy is to reshape your data so that you have just two variables.

                          Other than that, the code here shows some different small choices. You need, I suggest, colours that contrast more and more readable numeric labels -- if they deserve being shown, they deserve to be very easily readable. But you can make your own choices, naturally.

                          I follow first the idea that the order of variables Health Plans Children Shareholders Offer may be deliberate and have some definite meaning. Then I ignore that and use myaxis from the Stata Journal to order the categories. More at https://journals.sagepub.com/doi/pdf...6867X211045582

                          (The graphs are posted in the opposite order.)

                          Code:
                          clear
                          set obs 1000
                          
                          tokenize "406 416 366 129 153"
                          
                          local j = 0
                          
                          foreach v in Health Plans Children Shareholders Offer {
                              local ++j
                              gen `v' = _n <= ``j''
                          }
                          
                          * you start about here
                          rename (Health Plans Children Shareholders Offer) (Answer=)
                          
                          gen Id = _n
                          
                          reshape long Answer, i(Id) j(Which) string
                          
                          label define Axis 1 Health 2 Plans 3 Children 4 Shareholders 5 Offer
                          
                          encode Which, label(Axis) gen(Axis)
                          
                          label def Answer 0 "Not important" 1 "Important"
                          
                          label val Answer Answer
                          
                          catplot Answer Axis, percent(Axis) bar(1, lcolor(stc1) fcolor(stc1*0.4)) bar(2, lcolor(stc2) fcolor(stc2*0.4)) ///
                          asyvars stack blabel(bar, position(center) size(medlarge) format(%2.1f)) name(G1, replace) ///
                          legend(pos(6) row(1)) ysc(alt)
                          
                          myaxis NewAxis=Axis, sort(mean Answer) descending
                          
                          catplot Answer NewAxis, percent(Axis) bar(1, lcolor(stc1) fcolor(stc1*0.4)) bar(2, lcolor(stc2) fcolor(stc2*0.4)) ///
                          asyvars stack blabel(bar, position(center) size(medlarge) format(%2.1f)) name(G2, replace) ///
                          legend(pos(6) row(1)) ysc(alt)



                          A different issue I don't address with an example graph is that the two percentages add to 100, so why not just use the percent that say that something is important?

                          Code:
                          graph hbar Answer, over(NewAxis)
                          is a start on the code.
                          Click image for larger version

Name:	whatever_G2.png
Views:	1
Size:	42.3 KB
ID:	1752452

                          Click image for larger version

Name:	whatever_G1.png
Views:	1
Size:	42.3 KB
ID:	1752451

                          Last edited by Nick Cox; 06 May 2024, 11:43.

                          Comment


                          • #14
                            Extra code for the last idea


                            Code:
                            gen Answer2 = 100 * Answer 
                            
                            graph hbar Answer2, over(NewAxis) ytitle(% saying Important) ysc(r(0 43)) blabel(bar, size(medlarge)) ysc(alt) name(G3, replace)

                            Comment


                            • #15
                              Thank you so much Nick Cox .
                              Is there any way I can change the size of the variable names on the y-axis? I mean: ‘Child’, ‘Health’ etc. Could you please give me a hint?

                              Thank you!
                              Last edited by Beatrice Raspa; 08 May 2024, 08:34.

                              Comment

                              Working...
                              X