Clustered Bar chart with N values bewlo x-axis in each cluster

Kim Vaarts

Join Date: May 2025

Posts: 21
#1

Clustered Bar chart with N values bewlo x-axis in each cluster

18 May 2025, 13:45

Can someone please asap help me how to make the below clsutered graph in STATA? I am struggeling to make it so nice!
Also please hoe to insert the N values of each cluster? Thanks inadvance!

Attached Files
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30170
#2

18 May 2025, 13:51

This is not really up my alley, and I rarely respond to graph questions unless they are pretty simple, but there are many others on this forum who can help you. I think your chances of getting a timely and helpful response from one of them would be greatly increased if you provide example data for them to work with.

The most helpful way to give example data is by using the -dataex- command. If you are running version 16 or later, or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
1 like
Comment
Kim Vaarts

Join Date: May 2025

Posts: 21
#3

18 May 2025, 15:01

Line 1 Line 2 Line 3

A C C

B A A

D B

A B

C D D

fictive dataset (I am not allowed to share real data): I have 5 patients receiving different types of medicines namely A, B, C and D. They get 3 lines and in each line they get a medicine. I want to put it exactly like in the above chart. With the three lines (line1 line2 line3) on the x-axis and the perccent of patients on the y-axis. I also want to include the text under the x-axis showing the N= 5 in line1, N=3 in line2 and N=4 in line3.

Last edited by Kim Vaarts; 18 May 2025, 15:57.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35782
#4

18 May 2025, 15:46

Your graph in #1 shows 5 categories for one variable and 5 categories for another. Now you're referring to 5 and 4, which isn't a big difference in principle.

More puzzlingly your example graph shows several thousand patients but here you seem to be saying that you have 5 patients only.

So I am lost here, or least puzzled. I can't think of a graph that doesn't improve on the table with some extra numbers that you can calculate by hand.
Comment
Kim Vaarts

Join Date: May 2025

Posts: 21
#5

18 May 2025, 16:00

Originally posted by Nick Cox View Post

Your graph in #1 shows 5 categories for one variable and 5 categories for another. Now you're referring to 5 and 4, which isn't a big difference in principle.

More puzzlingly your example graph shows several thousand patients but here you seem to be saying that you have 5 patients only.

So I am lost here, or least puzzled. I can't think of a graph that doesn't improve on the table with some extra numbers that you can calculate by hand.

I used a fictive dataset because I cannot share the actual/real data (no permission to share). In the actual dataset I have similar amount of subjects and lines of medicine just like the picture I posted first. There a five types of medicine subjects are given A B C or D. I left some empty cells because I also have missing values in the real dataset. I don't know the code/command in STATA to make such a clustered graph. Please someone help.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35782

18 May 2025, 16:26

It's fine to invent datasets when the real data are confidential. You don't need to explain or apologise there, as we urge doing exactly that in the FAQ Advice.
You don't seem to have read that FAQ Advice yet, or at least you stopped before the final section.

My problem remains that I still don't follow clearly what you have.

Here are some guesses. You may need to make several small changes to match your set-up.

Code:

clear 
set seed 314159
set obs 1200 
gen line = ceil(_n/400)
gen drug = runiformint(1, 4) if runiform() < 0.8 
label def drug 1 "A" 2 "B" 3 "C" 4 "D" 
label val drug drug 

capture set scheme stcolor 

quietly forval j = 1/3 {
    count if drug < . & line == `j'
    local which = word("first second third", `j')
    label def line `j' `" "`which'" "{it:n} = `r(N)'" "', add 
}

label val line line 

* you must install this
ssc inst catplot 

catplot , over(drug) over(line) percent(line) blabel(bar, format(%2.1f)) asyvars name(D1, replace)

catplot , over(drug) over(line) percent(line) blabel(bar, format(%2.1f)) asyvars recast(bar) legend(row(1) pos(12)) name(D2, replace)

Click image for larger version

Name: D1.png
Views: 1
Size: 34.0 KB
ID: 1777596

Click image for larger version

Name: D2.png
Views: 1
Size: 38.0 KB
ID: 1777597

Comment

Kim Vaarts

Join Date: May 2025

Posts: 21
#7

18 May 2025, 17:02

Originally posted by Nick Cox View Post

It's fine to invent datasets when the real data are confidential. You don't need to explain or apologise there, as we urge doing exactly that in the FAQ Advice.
You don't seem to have read that FAQ Advice yet, or at least you stopped before the final section.

My problem remains that I still don't follow clearly what you have.

Here are some guesses. You may need to make several small changes to match your set-up.

Code:

clear set seed 314159 set obs 1200 gen line = ceil(_n/400) gen drug = runiformint(1, 4) if runiform() < 0.8 label def drug 1 "A" 2 "B" 3 "C" 4 "D" label val drug drug capture set scheme stcolor quietly forval j = 1/3 { count if drug < . & line == `j' local which = word("first second third", `j') label def line `j' `" "`which'" "{it:n} = `r(N)'" "', add } label val line line * you must install this ssc inst catplot catplot , over(drug) over(line) percent(line) blabel(bar, format(%2.1f)) asyvars name(D1, replace) catplot , over(drug) over(line) percent(line) blabel(bar, format(%2.1f)) asyvars recast(bar) legend(row(1) pos(12)) name(D2, replace)

[ATTACH=CONFIG]n1777596[/ATTACH] [ATTACH=CONFIG]n1777597[/ATTACH]

This is exactly what I need! I just don't understand the code. I already have the Line variables in my dataset. Line 1 t/m Line 5. Why do I need to generate a new line variable:
gen line = ceil(_n/400)?? Each line is a seperate variable just like the fictive example. Line1 is one variable, Line 2 is another variable, Line 3 is another variable. Within each of these variables the medicines are present. So in the variable Line1 you have medine A, B, C etc and in the other variable Line2 you have medince A, B ,C. Why do I need to create a new line variable?

Ps I have string variables only.

Last edited by Kim Vaarts; 18 May 2025, 17:09.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35782
#8

18 May 2025, 17:22

You don't need to generate a new line variable. I do need to do that because you didn't give a very good data example.

You will need to reshape your data to use my code. String variables should not be a great problem.

I guess we're in different time zones, and it is late where I am, so either others may be able and willing to answer any other questions or you will have to wait until I can answer them.
Comment
Kim Vaarts

Join Date: May 2025

Posts: 21
#9

18 May 2025, 17:28

Originally posted by Nick Cox View Post

You don't need to generate a new line variable. I do need to do that because you didn't give a very good data example.

You will need to reshape your data to use my code. String variables should not be a great problem.

I guess we're in different time zones, and it is late where I am, so either others may be able and willing to answer any other questions or you will have to wait until I can answer them.

Thank you Nick very much. Please help me when you're awake. It is also late here, but I have a presentation next week so I am still working at 01.30 am. I will wait for your response or from anybody else. I am not strong with STATA but I need to learn it. Sleep well and hope to hear from you as soon as possible. I appreaciate your help and excuses for all my many questions. Goodnight.
Comment
Kim Vaarts

Join Date: May 2025

Posts: 21
#10

18 May 2025, 17:36

Originally posted by Nick Cox View Post

You don't need to generate a new line variable. I do need to do that because you didn't give a very good data example.

You will need to reshape your data to use my code. String variables should not be a great problem.

I guess we're in different time zones, and it is late where I am, so either others may be able and willing to answer any other questions or you will have to wait until I can answer them.

I tried after reshaping. It did not work. I cannot make the code where in j I get the N. I need the whole code. I will wait for you. I will try in the mean time.

Last edited by Kim Vaarts; 18 May 2025, 17:40.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30170
#11

18 May 2025, 17:39

The first 7 lines of Nick's code in #6 are just there to create a toy data set that demonstrates the approach. The tableau you set out in #3 to illustrate your data was helpful to show the kind of data you had and the way it was arrayed, but it was not something that could actually be directly used in Stata to develop and test the code to solve your problem. So Nick wrote 7 lines of code that would create a data set similar to yours that he could work with.

So you don't need to run those first 7 lines: you would replace all of those lines just by a command to -use- your data set. The rest of the code, starting from -capture set scheme stcolor- actually provides the solution to your graphing problem.

Your remark that you have string variables only, however, suggests that you will have to modify the code, because the subsequent commands involving the variable line assume it is numeric. I will assume here that the values of the variable line are "first line", "second line", and "third line". Then the solution to your problem would look like this:

Code:

capture set scheme stcolor levelsof line, local(lines) foreach l of local lines { count if !missing(drug) & line == `"`l'"' replace line = line + " {it:n} = `r(N)'" if line == `"`l'"' } ssc inst catplot catplot , over(drug) over(line) percent(line) blabel(bar, format(%2.1f)) asyvars name(D1, replace) catplot , over(drug) over(line) percent(line) blabel(bar, format(%2.1f)) asyvars recast(bar) legend(row(1) pos(12)) name(D2, replace)

Added: Crossed with #8, 9, 10.

Last edited by Clyde Schechter; 18 May 2025, 17:45.
1 like
Comment
Kim Vaarts

Join Date: May 2025

Posts: 21
#12

18 May 2025, 17:51

Originally posted by Clyde Schechter View Post

The first 7 lines of Nick's code in #6 are just there to create a toy data set that demonstrates the approach. The tableau you set out in #3 to illustrate your data was helpful to show the kind of data you had and the way it was arrayed, but it was not something that could actually be directly used in Stata to develop and test the code to solve your problem. So Nick wrote 7 lines of code that would create a data set similar to yours that he could work with.

So you don't need to run those first 7 lines: you would replace all of those lines just by a command to -use- your data set. The rest of the code, starting from -capture set scheme stcolor- actually provides the solution to your graphing problem.

Your remark that you have string variables only, however, suggests that you will have to modify the code, because the subsequent commands involving the variable line assume it is numeric. I will assume here that the values of the variable line are "first line", "second line", and "third line". Then the solution to your problem would look like this:

Code:

capture set scheme stcolor levelsof line, local(lines) foreach l of local lines { count if !missing(drug) & line == `"`l'"' replace line = line + " {it:n} = `r(N)'" if line == `"`l'"' } ssc inst catplot catplot , over(drug) over(line) percent(line) blabel(bar, format(%2.1f)) asyvars name(D1, replace) catplot , over(drug) over(line) percent(line) blabel(bar, format(%2.1f)) asyvars recast(bar) legend(row(1) pos(12)) name(D2, replace)

Added: Crossed with #8, 9, 10.

Dear Clyde, I don't have two variables Lines and Drugs Nick Cox . I just have the variables Lines. I have Line1, Line2 Line3. Within these lines the drug categories are present A, B C. In your code you are using two seperate variables Lines and Drugs. Furthermore, the Line variables are string. Please help. I think the answer in in yours and Nick's but I cannot see it.

Last edited by Kim Vaarts; 18 May 2025, 18:34.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30170

#13

18 May 2025, 18:43

So, I think I understand what your data looks like. Run this and take a look in the data browser to see if this resembles your data set in the relevant respects:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float id str1(line1 line2 line3)
 1 ""  "D" "C"
 2 "C" "B" ""
 3 "C" "B" "D"
 4 "D" ""  "C"
 5 "A" ""  "C"
 6 ""  ""  "C"
 7 ""  "D" "B"
 8 "B" "A" "D"
 9 ""  "A" "D"
10 "B" "A" ""
11 "D" "A" "B"
12 "A" "C" "D"
13 "A" "D" ""
14 "A" ""  "C"
15 "C" ""  "D"
16 "C" ""  "B"
17 "C" "B" ""
18 "A" "C" "B"
19 "C" ""  "D"
20 "C" ""  "B"
21 ""  "C" "D"
22 ""  "B" "A"
23 "C" "D" "B"
24 "C" "B" "A"
25 ""  "C" "D"
26 "B" "C" "D"
27 "D" "A" ""
28 "A" "D" "C"
29 "A" "C" ""
30 "A" ""  "D"
31 "D" "C" ""
32 ""  "D" "B"
33 "D" "C" "A"
34 "D" "C" "B"
35 "C" "B" "A"
36 "C" ""  "A"
37 "A" "D" "C"
38 "D" "C" "A"
39 ""  "C" "A"
40 "A" ""  "B"
41 "D" ""  ""
42 "C" "A" "B"
43 "C" "A" "D"
44 "D" "C" "B"
45 "D" "B" ""
46 "D" "B" ""
47 "C" "B" "D"
48 ""  ""  "A"
49 "D" "C" "A"
50 "A" "D" "C"
end

Note that I am assuming that your data set includes some kind of id variable, perhaps a patient MRN, and that that variable uniquely identifies observations in your data set. If you do not have such a variable, and have only the line1, line2, and line3 variables, then you need to create one, which you can easily do just with:

Code:

gen `c(obs_t)' id = _n

Assuming that we are now on the same page about what your data looks like, the best solution is to transform your data set so that it looks like what Nick created in #6. Then we can apply Nick's original solution to that:

Code:

capture set scheme stcolor

//  Nick already suggested -reshape-; I'm just giving explicit code here.
rename line* _drug*
reshape long _drug, i(id) j(line)
encode _drug, gen(drug)
drop _drug

// From here down it's Nick's original code with just one tiny tweak.
quietly forval j = 1/3 {
    count if drug < . & line == `j'
    local which = word("first second third", `j')
    label def line `j' `" "`which' line" "{it:n} = `r(N)'" "', add
}

label val line line

catplot , over(drug) over(line) percent(line) blabel(bar, format(%2.1f)) asyvars name(D1, replace)

catplot , over(drug) over(line) percent(line) blabel(bar, format(%2.1f)) asyvars recast(bar) legend(row(1) pos(12)) name(D2, replace)

I have eliminated the -ssc install catplot- command because you have clearly already done that, and there is no reason to do it again.

I will add that the original organization of your data, with three line variables that, I suppose, are drug names, is not conducive to analysis in Stata. It is not just a matter of this particular graphing problem. It is a matter of Stata working better with data in long layout rather than wide for almost everything. It is likely that whatever other analysis of this data you plan, it will be facilitated by using this revised data organization. To avoid having to re-create it each time, I suggest you actually -save- it as a new data set after you have used it for this purpose.

Comment

Kim Vaarts

Join Date: May 2025

Posts: 21
#14

18 May 2025, 19:07

Originally posted by Clyde Schechter View Post

So, I think I understand what your data looks like. Run this and take a look in the data browser to see if this resembles your data set in the relevant respects:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float id str1(line1 line2 line3) 1 "" "D" "C" 2 "C" "B" "" 3 "C" "B" "D" 4 "D" "" "C" 5 "A" "" "C" 6 "" "" "C" 7 "" "D" "B" 8 "B" "A" "D" 9 "" "A" "D" 10 "B" "A" "" 11 "D" "A" "B" 12 "A" "C" "D" 13 "A" "D" "" 14 "A" "" "C" 15 "C" "" "D" 16 "C" "" "B" 17 "C" "B" "" 18 "A" "C" "B" 19 "C" "" "D" 20 "C" "" "B" 21 "" "C" "D" 22 "" "B" "A" 23 "C" "D" "B" 24 "C" "B" "A" 25 "" "C" "D" 26 "B" "C" "D" 27 "D" "A" "" 28 "A" "D" "C" 29 "A" "C" "" 30 "A" "" "D" 31 "D" "C" "" 32 "" "D" "B" 33 "D" "C" "A" 34 "D" "C" "B" 35 "C" "B" "A" 36 "C" "" "A" 37 "A" "D" "C" 38 "D" "C" "A" 39 "" "C" "A" 40 "A" "" "B" 41 "D" "" "" 42 "C" "A" "B" 43 "C" "A" "D" 44 "D" "C" "B" 45 "D" "B" "" 46 "D" "B" "" 47 "C" "B" "D" 48 "" "" "A" 49 "D" "C" "A" 50 "A" "D" "C" end

Note that I am assuming that your data set includes some kind of id variable, perhaps a patient MRN, and that that variable uniquely identifies observations in your data set. If you do not have such a variable, and have only the line1, line2, and line3 variables, then you need to create one, which you can easily do just with:

Code:

gen `c(obs_t)' id = _n

Assuming that we are now on the same page about what your data looks like, the best solution is to transform your data set so that it looks like what Nick created in #6. Then we can apply Nick's original solution to that:

Code:

capture set scheme stcolor // Nick already suggested -reshape-; I'm just giving explicit code here. rename line* _drug* reshape long _drug, i(id) j(line) encode _drug, gen(drug) drop _drug // From here down it's Nick's original code with just one tiny tweak. quietly forval j = 1/3 { count if drug < . & line == `j' local which = word("first second third", `j') label def line `j' `" "`which' line" "{it:n} = `r(N)'" "', add } label val line line catplot , over(drug) over(line) percent(line) blabel(bar, format(%2.1f)) asyvars name(D1, replace) catplot , over(drug) over(line) percent(line) blabel(bar, format(%2.1f)) asyvars recast(bar) legend(row(1) pos(12)) name(D2, replace)

I have eliminated the -ssc install catplot- command because you have clearly already done that, and there is no reason to do it again.

I will add that the original organization of your data, with three line variables that, I suppose, are drug names, is not conducive to analysis in Stata. It is not just a matter of this particular graphing problem. It is a matter of Stata working better with data in long layout rather than wide for almost everything. It is likely that whatever other analysis of this data you plan, it will be facilitated by using this revised data organization. To avoid having to re-create it each time, I suggest you actually -save- it as a new data set after you have used it for this purpose.

Thank you very much! It works! I am sooo happy! I have been struggling with this for two weeks straight! And in one night you have helped me! You don't know hopw much this means to me! THANK YOU! One last question: I get the following graph, see below. The lay-out needs some work. Is there a code somewhere that I can copy and change myself? I am not good with STATA. I need to see the codes and than copy them and makes changes step by step so I can see visually what changes or else I cannot do it. The help ifunction n STATA does not help me. This is a programming language and I am bad at it. Ps i copy-pasted just a part of the graph due to privacy reasons. They are very strict with the data. Thanks in advance and God bless!

Attached Files
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30170
#15

18 May 2025, 19:23

The problem is that with the number of lines and drugs you have, the number of bars is large enough that you can't really accommodate all those percentages at the bars ends without them overlapping. All this requires is that you specify a smaller size. You want to find one that fits gracefully in the available space, but is still large enough to read. Try this:

Code:

catplot , over(drug) over(line) percent(line) blabel(bar, size(vsmall) format(%2.1f)) asyvars name(D1, replace) catplot , over(drug) over(line) percent(line) blabel(bar, size(vsmall) format(%2.1f)) asyvars recast(bar) legend(row(1) pos(12)) name(D2, replace)

If that's not a good size, you can go larger or smaller by choosing from among the sizes you will find by running -help textsizestyle-.
1 like
Comment

Line 1	Line 2	Line 3
A	C	C
B	A	A
D		B
A	B
C	D	D

Announcement