Horizontal Bars with Catplot Command and Categorical Vars

Bianca Duelken

Join Date: Dec 2020
Posts: 19

Horizontal Bars with Catplot Command and Categorical Vars

24 Apr 2021, 14:00

Dear Stata - Community,

I am using the "catplot" command to create a graph with multiple stacked horizontal bars. The four categorical variables being used are having the same categories. In the end, I would like to have a graph similar to the one attached in this post.

Yet, after reshaping my data, I am still facing the problem that I have too many binary variables (indicating the respective categorical variable) to run the catplot command (please see a simplified version of my code below). I tried to group these variables, modify the setting of the command and using other command, but in the end it still did not work. Can anyone help?

Thanks for any help in advance!

Warm greetings, Bianca

Code:

** four categorical vars with same categories
local vars1 x1 x2 x3 x4

** gen binary variables for each category of the four vars
        foreach var of local vars1{
        tab `var' , gen(`var'_cat_)
        }
        
** reshape data
gen id = _n

reshape long x1_cat_ x2_cat_ x3_cat_ x4_cat_,  i(id)

** define labels of newly created _j var
label variable _j ""
label define  _j 1 "Strongly agree" 2 "Agree" 3 "Neither agree nor disagree" 4 "Disagree" 5"Strongly disagree" 6 "No answer"
label value _j _j

** catplot
* for one var, the following command works perfectly
catplot  _j ,  over(x1_cat_) percent(x1_cat_)  asyvars stack

* yet, I need a command to combine all vars in one hbar --> the following command would be needed, but does not work (too many variables)
catplot  _j ,  over(x1_cat_ x2_cat_ x3_cat_ x4_cat_) percent(x1_cat_ x2_cat_ x3_cat_ x4_cat_)  asyvars stack

Click image for larger version

Name: Unbenannt.PNG
Views: 1
Size: 48.3 KB
ID: 1605620

Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10141

24 Apr 2021, 17:03

catplot is from SSC (FAQ Advice #12). Your data example is incomplete as it does not generate the x variables. Here is one way:

Code:

clear
set seed 04242021
set obs 20
** four categorical vars with same categories
local vars1 x1 x2 x3 x4
foreach var of local vars1{
    gen `var'= runiformint(1,6)
}
gen id=_n
reshape long x, i(id) j(which)
tab x which
tab x, gen(xx)
reshape long xx, i(id which) j(cat)
label define  cat 1 "Strongly agree" 2 "Agree" 3 "Neither agree nor disagree" 4 "Disagree" 5"Strongly disagree" 6 "No answer"
label values cat cat
keep if xx
catplot cat, over(which) percent(which) l1title("") blab(bar, pos(inside)) asyvars stack scheme(s1color)

Res.:

Code:

. tab x which

           |                    which
         x |         1          2          3          4 |     Total
-----------+--------------------------------------------+----------
         1 |         4          1          2          5 |        12
         2 |         3          2          7          5 |        17
         3 |         4          5          5          3 |        17
         4 |         2          1          1          1 |         5
         5 |         3          6          2          1 |        12
         6 |         4          5          3          5 |        17
-----------+--------------------------------------------+----------
     Total |        20         20         20         20 |        80

Click image for larger version

Name: Graph.png
Views: 1
Size: 21.5 KB
ID: 1605633

Last edited by Andrew Musau; 24 Apr 2021, 17:53.

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35502

25 Apr 2021, 02:25

The frst error that graph is likely to notice is that you are reaching through catplot to call up the option of graph hbar

Code:

 
 over(x1_cat_ x2_cat_ x3_cat_ x4_cat_)

However, the over() option allows only one variable name. Hence the error is in using graph hbar.

I stole Andrew Musau's helpful example. I The catplot above can be got a little more directly (see code below) but my main concern is to show tabplot from Stata Journal as an alternative. No design is perfect here, but -- although popular -- stacking doesn't always seem very helpful.

Code:

clear
set seed 04242021
set obs 20
** four categorical vars with same categories
local vars1 x1 x2 x3 x4
foreach var of local vars1{
    gen `var'= runiformint(1,6)
}
gen id=_n

list 

reshape long x, i(id) j(which)

label define  x 1 "Strongly agree" 2 "Agree" 3 "Neither agree nor disagree" 4 "Disagree" 5"Strongly disagree" 6 "No answer"
label values x x 

set scheme s1color 

catplot x which, percent(which) l1title("") blab(bar, pos(inside)) asyvars stack 

tabplot x which, percent(which) showval(format(%2.0f)) separate(x) ytitle("") xtitle(question) subtitle(percent) ///
bar1(bfcolor(blue)) bar2(bcolor(blue*0.4)) bar3(bcolor(gs8)) bar4(bcolor(red*0.4)) bar5(bcolor(red)) bar6(bcolor(teal))

Click image for larger version

Name: tabplot2.png
Views: 1
Size: 33.5 KB
ID: 1605671

Comment

Bianca Duelken

Join Date: Dec 2020

Posts: 19
#4

25 Apr 2021, 07:33

Thanks a lot to both of you - your answers helped a lot!
Comment
Bianca Duelken

Join Date: Dec 2020

Posts: 19
#5

27 Apr 2021, 00:32

Dear Andrew and/or Nick,

may I ask you another question about the catplot example:

Is there any way that I can add another aggregated hbar to the catplot created above? So that in our example, these two codes are written in one:

Code:

catplot x which, percent(which) l1title("") blab(bar, pos(inside)) asyvars stack catplot x, percent l1title("") blab(bar, pos(inside)) asyvars stack

I tried to adjust the variable "which" and also combine these two catplots with (1) "grc1leg" and (2) "graph combine", but it didn't work.

Thanks a lot for any help in advance!

Warm greetings, Bianca
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10141
#6

27 Apr 2021, 02:41

Code:

expand 2, g(new) replace which=99 if new catplot x which, percent(which) l1title("") blab(bar, pos(inside)) asyvars stack

with appropriate labeling of the categorical axis (99= "Total").
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35502
#7

27 Apr 2021, 02:49

it didn't work

Compare our FAQ Advice:

Never say just that something "doesn't work" or "didn't work", but explain precisely in what sense you didn't get what you wanted.

graph combine will put two graphs side by side, or on top of each other, but it won't look good. I can't speak precisely about what you did with grc1leg (from http://www.stata.com/users/vwiggins, as you are asked to explain) because you don't tell us or show any results.

But indeed, I think there is a better solution than what I guess you did, as discussed at length in https://www.stata-journal.com/articl...article=gr0058

Temporarily, double up the dataset. Then relabel the copy as some kind of "all" category. Suppose you had k categories before. Now you have k + 1.

Here's some code, extending the previous example.

Code:

clear set seed 04242021 set obs 20 ** four categorical vars with same categories local vars1 x1 x2 x3 x4 foreach var of local vars1{ gen `var'= runiformint(1,6) } gen id=_n list reshape long x, i(id) j(which) label define x 1 "Strongly agree" 2 "Agree" 3 "Neither agree nor disagree" 4 "Disagree" 5"Strongly disagree" 6 "No answer" label values x x preserve expand 2, gen(new) replace which = 5 if new label define which 5 "all questions" label val which which set scheme s1color catplot x which, percent(which) l1title("") blab(bar, pos(inside)) asyvars stack tabplot x which, percent(which) showval(format(%2.0f)) separate(x) ytitle("") xtitle(question) subtitle(percent) /// bar1(bfcolor(blue)) bar2(bcolor(blue*0.4)) bar3(bcolor(gs8)) bar4(bcolor(red*0.4)) bar5(bcolor(red)) bar6(bcolor(teal)) restore
Comment

Zurie Phoenix

Join Date: Apr 2021
Posts: 1

28 Apr 2021, 03:10

Hi guys! I need help with coding. I am making a horizontal bar chart with the ff categories on the y-axis:
Each cat is further brokendown into 2000 and 2015
C 1 (2000, 2015)
C 2 (2000, 2015)
C 3 (2000, 2015)
C4 (2000, 2015)
all categories C1-C3 are all in one var C_sub, however, C4 is in another var D_sub. My main problem is how to add C4 in the y-axis. by the way, C4 is a category for values from C1-C3 and I only need 1 category "Emerging" which classifies C1-C3. (please see attached file)

My x-axis has 3 categories (all perecentages 0 20 40 60 80 100)
pop above 10M
pop 10-5
pop less than 5

CODE	YEAR	POPABOVE10M	POP5TO10M	POPLESSTHAN5	C_SUB	D_SUB
1	1900	20%	40%	10%	C1	EMERGING
1	1950	30%	70%	20%	C2	NOT
1	2000	50%	60%	30%	C3	EMERGING
1	2015	60%	20%	40%	C2	NOT
2	1900	10%	10%	20%	C1	EMERGING
2	1950	20%	90%	40%	C1	EMERGING
2	2000	30%	400%	40%	C3	NOT
2	2015	40%	60%	70%	C3	NOT
3	1900	20%	10%	60%	C2	EMERGING
3	1950	40%	20%	20%	C2	EMERGING
3	2000	80%	30%	10%	C2	NOT
3	2015	90%	40%	90%	C1	NOT
4	1900	10%	20%	30%	C1	NOT
4	1950	40%	10%	50%	C1	EMERGING
4	2000	70%	10%	60%	C2	EMERGING
4	2015	60%	20%	10%	C2	EMERGING
5	1900	20%	30%	20%	C3	NOT
5	1950	10%	40%	30%	C3	EMERGING
5	2000	90%	20%	50%	C1	NOT
5	2015	70%	40%	60%	C2	EMERGING

My code w/o C4 bec I don't know how: (kinda works, altho C3 just ended at 80%, 81-100 space is blank)
graph hbar pop10m pop5to10m pop<5m if year==2000 | if year==2015, over(year) over(C_sub) nofill asyvars stack

My code with C4 which I need: (DOES NOT WORK)

graph hbar pop10m pop5to10m pop<5m if year==2000 | year==2015, over(year) over(C_sub)) nofill asyvars stack || hbar pop10m pop5to10m pop<5m if year==2000 | year==2015 & D_sub == "Emerging", over(year) over(dev_sub) nofill asyvars stack

I need something like this...

C1	2000	yellow		blue		blue		green
	2015

C2	2000
	2015

C3	2000
	2015

C4	2000
	2015
		0	20	40	60	80	100


		blue	green	yellow
		POP>10M	POP5-10	POP<5

Please help. Thanks.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35502
#9

29 Apr 2021, 01:33

=8 This is very hard for me to follow. Please visit https://www.statalist.org/forums/help#stata and post a data example using dataex.
Comment

Bianca Duelken

Join Date: Dec 2020
Posts: 19

#10

06 May 2021, 15:21

Dear Andrew and Nick,

I am sorry to bother you again, but I am still having some troubles with my graphs and hope you can help me again. It is all about very long labels which I can shorten a bit but unfortunately not enough. Therefore, I would love to split the labels into another row if they are too long. I noticed that this can be easiliy done in a legend and also when the command over() is used. However, for my tabplots and catplots I cannot use the over() specification and have no legend. Following you can see the simplified example from above plus my main attempts. Thanks for any help!!! Greetings from Germany, Bianca

Code:

clear
set seed 04242021
set obs 20
** four categorical vars with same categories
local vars1 x1 x2 x3 x4
foreach var of local vars1{
    gen `var'= runiformint(1,6)
}
gen id=_n

reshape long x, i(id) j(which)

label define  x 1 "Strongly agree" 2 "Agree" 3 "Neither agree nor disagree" 4 "Disagree" 5"Strongly disagree" 6 "No answer"
label values x x


label define which 1 "First very long label" 2 "Second very long label" 3 "Third very long label" 4 "Fourth very long label"
label values which which

*First try
splitvallabels which
tabplot x which, percent(which) xlabel(which, relabel(`r(relabel)'))

*Second try
tabplot x which, percent(which) xlabel(1 "First very long" "label" 2 "Second very long label" 3 "Third very long label" 4 "Fourth very long label")

/*Other attempts - graph editor:
    (1) adjusting label size to small
    (2) changing label position (45°) */
    
********************************************************************************
* Same issues with the catplot command (even though it works for these rather short labels)
catplot x which, percent(which) asyvars stack

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35502
#11

07 May 2021, 01:48

tabplot is from Stata Journal, as pointed out in #3.

splitvallabels is from SSC, as you are asked to explain (FAQ Advice #12).

Even more minutely, over() is an option not a command.

The issue is splitting long axis labels you want to show on two or more lines.

Thanks for the almost reproducible examples. I had to install splitvallabels. on my current machine. I already have tabplot installed. In general, a user needs both installed.

The first try fails because xlabel() which is just a standard twoway option does not support a relabel() suboption. That was a guess based on a hope that some syntax allowed within over() for graph dot, graph bar and graph hbar would apply here, but it doesn't. It's undoubtedly confusing that those commands share some options with twoway, but not all.

So you need to use the result of splitvallabels to define a new set of value labels.

The second try fails because you need compound double quotes around the double quotes.

Here is revised code. Note how you can reclaim some space by omitting axis titles.

Code:

clear set seed 04242021 set obs 20 ** four categorical vars with same categories local vars1 x1 x2 x3 x4 foreach var of local vars1 { gen `var'= runiformint(1,6) } gen id=_n reshape long x, i(id) j(which) label define x 1 "Strongly agree" 2 "Agree" 3 "Neither agree nor disagree" 4 "Disagree" 5"Strongly disagree" 6 "No answer" label values x x label define which 1 "First very long label" 2 "Second very long label" 3 "Third very long label" 4 "Fourth very long label" label values which which set scheme s1color *First try splitvallabels which label def newwhich `r(relabel)' label val which newwhich tabplot x which, percent(which) name(G1, replace) ytitle("") xtitle("") *Second try tabplot x which, percent(which) xlabel(1 `" "First very long" "label" "' 2 "Second very long label" 3 "Third very long label" 4 "Fourth very long label") name(G2, replace) ytitle("") xtitle("")
Comment
Beatrice Raspa

Join Date: May 2024

Posts: 5
#12

06 May 2024, 10:38

Hello everyone,

I am writing here because mine is a similar problem to the one above. The five categorical variables used have the same categories. I attach the graph reproduced by the following code:

**graph bar stacked with important/not important
loc i = 1
foreach var in Health Plans Children Shareholders Offer {
**options for graph
if `i' == 1 loc legg `"legend(none, region(lcolor(none)))"'
if `i' !=1 loc axis `"ylabel(none, nolabels nogrid) "'
if `i' == 1 loc axis `"yla(0(25)100, nogrid) "'
**graph
catplot `var', percent asyvars stack ///
bar(1, color(ltblue) lwidth(medium)) ///
bar(2, color(ebblue) lwidth(medium)) ///
legend(label(1 "Not Important") label(2 "Important") ///
ring(0) pos(12) col(1) size(small)) ///
blabel(bar, format(%3.1f) size(small) position(inside) color(black)) ///
name(g`i', replace)
loc plots `"`plots' g`i' "'
loc `++i'
}

gr combine `plots', colfirst ycommon cols(1) imargin(zero) graphregion(margin(large))

This graph has multiple legends and x-axes, I would like to have an unique legend under the x-axis and I would also like to have only one x-axis.

I have tried several approach:

For the legend:

1) gr combine doesn't allow an option "legend" : (option legend() not allowed r(198)).

2) Turn the legend(off) is not working. I used this approach: graph display, legend(label(1 "Not Important") label(2 "Important") ring(0) pos(12) col(1) size(small)), but I doesn't modify the code.

For the x-axis:

1) I tried xcommon as for ycommon. Apparently it exists because it does not give me an error but does not change the appearance of the graph as I would like.

Could you please help me?

Thank you very much!

Beatrice
Attached Files
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35502
#13

06 May 2024, 11:40

#12 doesn't include a data example. However, one can be constructed easily that would give exactly the same graph.

The strategy in #12 is to produce distinct graphs, combine them and then try to reduce redundancy and clutter. That's hard work and doesn't yield the desired result. I doubt you would get all the way, but there is an easier approach.

A better strategy is to reshape your data so that you have just two variables.

Other than that, the code here shows some different small choices. You need, I suggest, colours that contrast more and more readable numeric labels -- if they deserve being shown, they deserve to be very easily readable. But you can make your own choices, naturally.

I follow first the idea that the order of variables Health Plans Children Shareholders Offer may be deliberate and have some definite meaning. Then I ignore that and use myaxis from the Stata Journal to order the categories. More at https://journals.sagepub.com/doi/pdf...6867X211045582

(The graphs are posted in the opposite order.)

Code:

clear set obs 1000 tokenize "406 416 366 129 153" local j = 0 foreach v in Health Plans Children Shareholders Offer { local ++j gen `v' = _n <= ``j'' } * you start about here rename (Health Plans Children Shareholders Offer) (Answer=) gen Id = _n reshape long Answer, i(Id) j(Which) string label define Axis 1 Health 2 Plans 3 Children 4 Shareholders 5 Offer encode Which, label(Axis) gen(Axis) label def Answer 0 "Not important" 1 "Important" label val Answer Answer catplot Answer Axis, percent(Axis) bar(1, lcolor(stc1) fcolor(stc1*0.4)) bar(2, lcolor(stc2) fcolor(stc2*0.4)) /// asyvars stack blabel(bar, position(center) size(medlarge) format(%2.1f)) name(G1, replace) /// legend(pos(6) row(1)) ysc(alt) myaxis NewAxis=Axis, sort(mean Answer) descending catplot Answer NewAxis, percent(Axis) bar(1, lcolor(stc1) fcolor(stc1*0.4)) bar(2, lcolor(stc2) fcolor(stc2*0.4)) /// asyvars stack blabel(bar, position(center) size(medlarge) format(%2.1f)) name(G2, replace) /// legend(pos(6) row(1)) ysc(alt)

A different issue I don't address with an example graph is that the two percentages add to 100, so why not just use the percent that say that something is important?

Code:

graph hbar Answer, over(NewAxis)

is a start on the code.

Last edited by Nick Cox; 06 May 2024, 11:43.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35502

#14

06 May 2024, 12:14

Extra code for the last idea

Code:

gen Answer2 = 100 * Answer 

graph hbar Answer2, over(NewAxis) ytitle(% saying Important) ysc(r(0 43)) blabel(bar, size(medlarge)) ysc(alt) name(G3, replace)

Comment

Beatrice Raspa

Join Date: May 2024

Posts: 5
#15

08 May 2024, 08:31

Thank you so much Nick Cox .
Is there any way I can change the size of the variable names on the y-axis? I mean: ‘Child’, ‘Health’ etc. Could you please give me a hint?

Thank you!

Last edited by Beatrice Raspa; 08 May 2024, 08:34.
Comment

Announcement

Horizontal Bars with Catplot Command and Categorical Vars

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment