Adding column labels to stacked catplot

Robert Shaw

Join Date: Nov 2021

Posts: 37
#1

Adding column labels to stacked catplot

13 Nov 2023, 15:01

Dear Statalisters,

I am tying myself into knots and am hoping someone may be able to provide some guidance. I am plotting a number of bar graphs in a loop. The bar graphs are stacked for different categories. I have elected not to use tabplot, as visually comparing between groups the overall number of datapoints with any non-missing category rating is valuable. Instead, perhaps with poor judgement, I am using catplot recast as hbar. I have two options here, I can either plot the percentages or the frequencies - both provide valuable information and I am trying to find a way to display both on the graph. This is because the total numbers of datapoints varies between graphs produced in the loop, and seeing that the N is different is important. I have elected to plot percentages. I would like to label each stack of columns with the total number of datapoints within that stack (some of the individual stacks are physically too small to have the individual number labelled per stack). I don't think catplot allows addplot and since it is not a twoway graph cannot have another eg (scatter with invisible points and appropriate labels) added to it. Could anyone please help me?

Kind regards
Robert Shaw

Code:

forval var1=1(1)2 { forval var2=0(1)1 { colorpalette yellow red, ipolate(4) nograph catplot MSS_category if class==`var1' & status==`var2' & MSS_category>0 & MSS_category<. & Episode==1, over(Grouping) by(MSS_category_type, note("")) stack asyvars bar(2, col("`r(p1)'")) bar(3, col("`r(p2)'")) bar(4, col("`r(p3)'")) bar(5, col("`r(p4)'")) graphregion(color(white)) bgcolor(white) legend(order(2 3 4)) missing ylabel(, labsize(2) alternate nogrid) } }
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35762
#2

13 Nov 2023, 15:27

catplot as you are using it is a wrapper for graph hbar. So it is, as you say, not making any use of graph twoway, and for the same reason no addplot() option can be supported. The text labels allowed in various places are precisely those allowed by graph hbar, and there is no scope for using any user-supplied variable there.

That's all negative, but you can just ensure that the categorical variable is string with desired content (as below) or is numeric with value labels with desired content.

I can't follow what your graphs look like, and you don't give a data example, but here is some technique that you may be able to adapt.

Code:

sysuse auto, clear bysort rep78 : gen freq = _N gen toshow = strofreal(rep78) + " ({it:n} = " + strofreal(freq) + ")" catplot foreign toshow if rep78 < ., percent(toshow) stack asyvars blabel(bar, pos(center) format(%2.1f)) bar(1, fcolor(stc1*0.2)) bar(2, fcolor(stc2*0.2)) l1title(Repair record and frequency)
Comment

Robert Shaw

Join Date: Nov 2021
Posts: 37

13 Nov 2023, 15:44

Dear Nick,
This is a nice way round the issue - putting the 'N' to the left (in the column name) rather than to the right (as a bar label). Thank you
Do you know if there is a way to put this information into the graph - the solution you have provided is more than adequate, but aesthetically (and coding wise over a large number of permutations of graph), it would be nicer (easier) to have it as a label. Part of me also wants to believe that there is a solution and that I wasn't flailing around in the abyss.
Kind regards
Robert Shaw

Example code and data appended below

Code:

forval var1=1(1)2 {
                forval var2=0(1)1 {
                                colorpalette yellow red, ipolate(4) nograph
                                catplot MaxSymptomScore if Symptomclass==`var1' & Serostatus==`var2' & MaxSymptomScore>0 & MaxSymptomScore<. & Diary==1, percentage over(Primecode, label(labsize(2))) by(Symptom, note("")) stack asyvars  bar(2, col("`r(p1)'")) bar(3, col("`r(p2)'")) bar(4, col("`r(p3)'")) bar(5, col("`r(p4)'")) graphregion(color(white)) bgcolor(white) legend(order(2 3 4)) missing ylabel(, labsize(2) alternate nogrid)          }
}

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float MaxSymptomScore byte Symptom float(Symptomclass Serostatus) byte Diary float Primecode
0  1 1 0 1 2
0  2 1 0 1 2
0  3 1 0 1 2
0  4 1 0 1 2
0  5 1 0 1 2
1  6 1 0 1 2
0  7 2 0 1 2
0  8 2 0 1 2
0  9 2 0 1 2
0 10 2 0 1 2
0 11 2 0 1 2
0 12 2 0 1 2
0 13 2 0 1 2
0 14 2 0 1 2
0 15 2 0 1 2
0 16 2 0 1 2
0 17 2 0 1 2
0  1 1 0 2 2
0  2 1 0 2 2
0  3 1 0 2 2
0  4 1 0 2 2
0  5 1 0 2 2
0  6 1 0 2 2
0  7 2 0 2 2
0  8 2 0 2 2
0  9 2 0 2 2
0 10 2 0 2 2
0 11 2 0 2 2
0 12 2 0 2 2
1 13 2 0 2 2
0 14 2 0 2 2
0 15 2 0 2 2
0 16 2 0 2 2
0 17 2 0 2 2
0  1 1 2 1 2
3  2 1 2 1 2
1  3 1 2 1 2
1  4 1 2 1 2
0  5 1 2 1 2
1  6 1 2 1 2
0  7 2 2 1 2
1  8 2 2 1 2
1  9 2 2 1 2
2 10 2 2 1 2
2 11 2 2 1 2
0 12 2 2 1 2
2 13 2 2 1 2
1 14 2 2 1 2
0 15 2 2 1 2
0 16 2 2 1 2
0 17 2 2 1 2
0  1 1 2 2 2
3  2 1 2 2 2
2  3 1 2 2 2
0  4 1 2 2 2
0  5 1 2 2 2
3  6 1 2 2 2
0  7 2 2 2 2
3  8 2 2 2 2
3  9 2 2 2 2
3 10 2 2 2 2
2 11 2 2 2 2
2 12 2 2 2 2
3 13 2 2 2 2
3 14 2 2 2 2
0 15 2 2 2 2
0 16 2 2 2 2
0 17 2 2 2 2
0  1 1 0 1 1
0  2 1 0 1 1
0  3 1 0 1 1
1  4 1 0 1 1
0  5 1 0 1 1
1  6 1 0 1 1
0  7 2 0 1 1
2  8 2 0 1 1
2  9 2 0 1 1
1 10 2 0 1 1
1 11 2 0 1 1
2 12 2 0 1 1
2 13 2 0 1 1
2 14 2 0 1 1
0 15 2 0 1 1
0 16 2 0 1 1
0 17 2 0 1 1
2  1 1 0 2 1
2  2 1 0 2 1
0  3 1 0 2 1
0  4 1 0 2 1
0  5 1 0 2 1
1  6 1 0 2 1
0  7 2 0 2 1
0  8 2 0 2 1
0  9 2 0 2 1
1 10 2 0 2 1
0 11 2 0 2 1
2 12 2 0 2 1
1 13 2 0 2 1
0 14 2 0 2 1
1 15 2 0 2 1
0 16 2 0 2 1
0 17 2 0 2 1
3  1 1 0 1 2
3  2 1 0 1 2
1  3 1 0 1 2
3  4 1 0 1 2
0  5 1 0 1 2
1  6 1 0 1 2
0  7 2 0 1 2
0  8 2 0 1 2
0  9 2 0 1 2
0 10 2 0 1 2
0 11 2 0 1 2
1 12 2 0 1 2
1 13 2 0 1 2
0 14 2 0 1 2
0 15 2 0 1 2
0 16 2 0 1 2
1 17 2 0 1 2
0  1 1 0 2 2
0  2 1 0 2 2
0  3 1 0 2 2
0  4 1 0 2 2
0  5 1 0 2 2
1  6 1 0 2 2
0  7 2 0 2 2
2  8 2 0 2 2
2  9 2 0 2 2
2 10 2 0 2 2
0 11 2 0 2 2
2 12 2 0 2 2
2 13 2 0 2 2
2 14 2 0 2 2
0 15 2 0 2 2
0 16 2 0 2 2
0 17 2 0 2 2
0  1 1 0 1 2
0  2 1 0 1 2
1  3 1 0 1 2
0  4 1 0 1 2
0  5 1 0 1 2
0  6 1 0 1 2
0  7 2 0 1 2
0  8 2 0 1 2
0  9 2 0 1 2
0 10 2 0 1 2
0 11 2 0 1 2
1 12 2 0 1 2
0 13 2 0 1 2
0 14 2 0 1 2
0 15 2 0 1 2
0 16 2 0 1 2
0 17 2 0 1 2
0  1 1 0 2 2
0  2 1 0 2 2
0  3 1 0 2 2
0  4 1 0 2 2
0  5 1 0 2 2
0  6 1 0 2 2
0  7 2 0 2 2
0  8 2 0 2 2
0  9 2 0 2 2
0 10 2 0 2 2
0 11 2 0 2 2
1 12 2 0 2 2
1 13 2 0 2 2
0 14 2 0 2 2
0 15 2 0 2 2
0 16 2 0 2 2
0 17 2 0 2 2
0  1 1 2 1 2
0  2 1 2 1 2
0  3 1 2 1 2
0  4 1 2 1 2
0  5 1 2 1 2
1  6 1 2 1 2
0  7 2 2 1 2
0  8 2 2 1 2
0  9 2 2 1 2
0 10 2 2 1 2
0 11 2 2 1 2
1 12 2 2 1 2
0 13 2 2 1 2
0 14 2 2 1 2
0 15 2 2 1 2
0 16 2 2 1 2
0 17 2 2 1 2
0  1 1 2 2 2
0  2 1 2 2 2
0  3 1 2 2 2
0  4 1 2 2 2
0  5 1 2 2 2
1  6 1 2 2 2
0  7 2 2 2 2
0  8 2 2 2 2
0  9 2 2 2 2
0 10 2 2 2 2
0 11 2 2 2 2
2 12 2 2 2 2
1 13 2 2 2 2
1 14 2 2 2 2
0 15 2 2 2 2
0 16 2 2 2 2
0 17 2 2 2 2
0  1 1 0 1 1
0  2 1 0 1 1
0  3 1 0 1 1
0  4 1 0 1 1
0  5 1 0 1 1
1  6 1 0 1 1
0  7 2 0 1 1
1  8 2 0 1 1
1  9 2 0 1 1
1 10 2 0 1 1
0 11 2 0 1 1
1 12 2 0 1 1
0 13 2 0 1 1
1 14 2 0 1 1
0 15 2 0 1 1
0 16 2 0 1 1
0 17 2 0 1 1
0  1 1 0 2 1
0  2 1 0 2 1
0  3 1 0 2 1
0  4 1 0 2 1
0  5 1 0 2 1
1  6 1 0 2 1
0  7 2 0 2 1
0  8 2 0 2 1
0  9 2 0 2 1
1 10 2 0 2 1
1 11 2 0 2 1
1 12 2 0 2 1
0 13 2 0 2 1
0 14 2 0 2 1
0 15 2 0 2 1
0 16 2 0 2 1
0 17 2 0 2 1
0  1 1 0 1 2
0  2 1 0 1 2
0  3 1 0 1 2
0  4 1 0 1 2
0  5 1 0 1 2
0  6 1 0 1 2
0  7 2 0 1 2
0  8 2 0 1 2
0  9 2 0 1 2
0 10 2 0 1 2
0 11 2 0 1 2
1 12 2 0 1 2
1 13 2 0 1 2
0 14 2 0 1 2
0 15 2 0 1 2
0 16 2 0 1 2
0 17 2 0 1 2
0  1 1 0 2 2
0  2 1 0 2 2
0  3 1 0 2 2
0  4 1 0 2 2
0  5 1 0 2 2
0  6 1 0 2 2
0  7 2 0 2 2
1  8 2 0 2 2
0  9 2 0 2 2
3 10 2 0 2 2
2 11 2 0 2 2
2 12 2 0 2 2
2 13 2 0 2 2
1 14 2 0 2 2
0 15 2 0 2 2
0 16 2 0 2 2
1 17 2 0 2 2
2  1 1 0 1 1
0  2 1 0 1 1
1  3 1 0 1 1
0  4 1 0 1 1
1  5 1 0 1 1
1  6 1 0 1 1
0  7 2 0 1 1
2  8 2 0 1 1
1  9 2 0 1 1
2 10 2 0 1 1
1 11 2 0 1 1
3 12 2 0 1 1
2 13 2 0 1 1
2 14 2 0 1 1
0 15 2 0 1 1
0 16 2 0 1 1
0 17 2 0 1 1
0  1 1 0 2 1
0  2 1 0 2 1
1  3 1 0 2 1
0  4 1 0 2 1
0  5 1 0 2 1
1  6 1 0 2 1
0  7 2 0 2 1
0  8 2 0 2 1
0  9 2 0 2 1
1 10 2 0 2 1
0 11 2 0 2 1
1 12 2 0 2 1
1 13 2 0 2 1
0 14 2 0 2 1
0 15 2 0 2 1
0 16 2 0 2 1
0 17 2 0 2 1
0  1 1 1 1 1
0  2 1 1 1 1
1  3 1 1 1 1
0  4 1 1 1 1
0  5 1 1 1 1
0  6 1 1 1 1
0  7 2 1 1 1
2  8 2 1 1 1
1  9 2 1 1 1
1 10 2 1 1 1
0 11 2 1 1 1
2 12 2 1 1 1
3 13 2 1 1 1
2 14 2 1 1 1
1 15 2 1 1 1
0 16 2 1 1 1
0 17 2 1 1 1
0  1 1 1 2 1
0  2 1 1 2 1
0  3 1 1 2 1
0  4 1 1 2 1
0  5 1 1 2 1
1  6 1 1 2 1
0  7 2 1 2 1
0  8 2 1 2 1
0  9 2 1 2 1
0 10 2 1 2 1
0 11 2 1 2 1
1 12 2 1 2 1
1 13 2 1 2 1
0 14 2 1 2 1
0 15 2 1 2 1
0 16 2 1 2 1
0 17 2 1 2 1
0  1 1 0 1 1
2  2 1 0 1 1
0  3 1 0 1 1
0  4 1 0 1 1
0  5 1 0 1 1
0  6 1 0 1 1
0  7 2 0 1 1
0  8 2 0 1 1
0  9 2 0 1 1
0 10 2 0 1 1
0 11 2 0 1 1
0 12 2 0 1 1
0 13 2 0 1 1
0 14 2 0 1 1
0 15 2 0 1 1
0 16 2 0 1 1
0 17 2 0 1 1
0  1 1 0 2 1
2  2 1 0 2 1
0  3 1 0 2 1
0  4 1 0 2 1
0  5 1 0 2 1
1  6 1 0 2 1
0  7 2 0 2 1
0  8 2 0 2 1
0  9 2 0 2 1
1 10 2 0 2 1
1 11 2 0 2 1
0 12 2 0 2 1
2 13 2 0 2 1
0 14 2 0 2 1
0 15 2 0 2 1
0 16 2 0 2 1
0 17 2 0 2 1
0  1 1 0 1 2
0  2 1 0 1 2
0  3 1 0 1 2
0  4 1 0 1 2
0  5 1 0 1 2
1  6 1 0 1 2
0  7 2 0 1 2
0  8 2 0 1 2
0  9 2 0 1 2
0 10 2 0 1 2
0 11 2 0 1 2
0 12 2 0 1 2
1 13 2 0 1 2
0 14 2 0 1 2
0 15 2 0 1 2
0 16 2 0 1 2
0 17 2 0 1 2
0  1 1 0 2 2
0  2 1 0 2 2
1  3 1 0 2 2
0  4 1 0 2 2
0  5 1 0 2 2
1  6 1 0 2 2
0  7 2 0 2 2
1  8 2 0 2 2
0  9 2 0 2 2
end

Last edited by Robert Shaw; 13 Nov 2023, 15:54.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35762
#4

14 Nov 2023, 03:33

Thanks for the data example which gives me a better idea of what you're doing.

graph hbar supports a text() option to add text within the graph. I don't know where you're going to put it given that the design already includes stacked bars always going from 0 and 100%.
Comment
Robert Shaw

Join Date: Nov 2021

Posts: 37
#5

14 Nov 2023, 03:37

Hi Nick,
Since, I'm plotting only non-missing and non-zero categories, the stacked bars never reach 100%, as there are usually >50% in the zero category. The aim would be to put the "non-zero-non-missing" 'N' at the right of the bar.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35762
#6

14 Nov 2023, 03:56

I am lost then about what you are doing, or trying to do, because

(1) as the author of catplot I know that stacked percents always sum to 100%

(2) I ran your code and observed the same in the 4 plots that it produces.
Comment
Robert Shaw

Join Date: Nov 2021

Posts: 37
#7

14 Nov 2023, 08:58

Dear Nick,
Apologies for delay in response and apologies if my thinking is muddled. Below I have appended one of the four graphs made with the example data above (var1==2, var2==0)
My understanding is that these are stacked bar charts as percentages of the total number of results (and do not reach 100% as I have chosen not to plot missing and zero values, excluded by if conditions).
I was aware that you are the catplot author, and it is more likely than not, that I have misunderstood something or mis-explained what I am trying to achieve. However, in this context, I am aiming to put the total count of non-zero values visually appearing as a "bar label". In the case of the panel labelled 13, this would have bar 1 with a label that is approximately in line with the 10% value and bar 2 having a label that is approximately in line with the 15% label

However, on close inspection of the help file and the output of the results, I noticed a deeper problem with my interpretation of how catplot works: "percent indicates that all frequencies should be shown as percents (with sum 100) of the total frequency of all values being represented in the graph."

It seems that the percentage is percentage of the total of both groups INCLUDING anything excluded by the if conditions, as presumably the if condition dictates display, but the denominator is determined by 'over' and 'by'.

If I wish to make the percentage refer to a denominator by group, presumably I would have to go back to hbar?

Kind regards
Robert Shaw

Attached Files

Last edited by Robert Shaw; 14 Nov 2023, 09:03.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35762
#8

14 Nov 2023, 09:18

I see no graph as yet, but to fix some details: :

catplot itself takes no account of any observation excluded by an if qualifier. The calculations and what is shown on the graph should always be consistent. If not, there is a bug and I am optimistic that there isn't.

By default catplot should also ignore missings, but non-standard ways of calling it might subvert that. In fact you are using one such. If you look at the help, the description doesn't mention over() at all -- either in text or in examples -- but it is allowed by virtue of being an option of graph hbar (or its siblings).

This is down to me, as back in 2003 when Stata 8 came out with all-new graphics I was very impressed, but there were a few gaps I tried to fill and catplot was one filler. My starting point was a regress type of syntax in which outcome is named first and then predictors.

In my own practice I use tabplot (Stata Journal) far more than I use catplot. but I am happy if anyone finds the latter useful.

Beyond that. whether you're misunderstanding anything is a matter (please) of showing data and command and graph where there appears to be a contradiction.
Comment

Robert Shaw

Join Date: Nov 2021
Posts: 37

14 Nov 2023, 16:33

Dear Nick,
I am not certain why the graph does not show up for you. After fretting about my capabilities to use twoway graphs, I dove in and produced the following with the following code, which is what I am aiming for (the percentage is a percentage of each individual total grouping including non-plotted `0' values), with the raw `N' as an mlabel at the end. A lesson for me in laziness. I am afraid I didn't delve too much into how I was abusing catplot.

Code:

gen Plotsymptoms=0
replace Plotsymptoms=1 if MaxSymptomScore>0 & MaxSymptomScore<.
bysort Serostatus Symptom Diary Primecode Plotsymptoms: egen plottotal=count(MaxSymptomScore) if Plotsymptoms==1
bysort Serostatus Symptom Diary Primecode: egen absolutetotal=count(MaxSymptomScore)
bysort Serostatus Symptom Diary Primecode MaxSymptomScore: egen numerator=count(MaxSymptomScore)
gen percent=numerator/absolutetotal*100

gen bottom=0

gen percent1=0
replace percent1=percent if MaxSymptomScore==1
bysort Serostatus Symptom Diary Primecode: ereplace percent1=max(percent1)

gen percent2=0
replace percent2=percent + percent1 if MaxSymptomScore==2
bysort Serostatus Symptom Diary Primecode: ereplace percent2=max(percent2)
replace percent2=percent1 if percent2==0

gen percent3=0
replace percent3=percent + percent2 if MaxSymptomScore==3
bysort Serostatus Symptom Diary Primecode: ereplace percent3=max(percent3)
replace percent3=percent2 if percent3==0

gen Primecode2=Primecode
replace Primecode2=3 if Primecode2==2

colorpalette yellow red, ipolate(4) nograph
#delimit ;
twoway 
rbar bottom percent1 Primecode2 if Serostatus==0 & Symptom==13 & Diary==1, horizontal col("`r(p1)'") || 
rbar percent1 percent2 Primecode2 if Serostatus==0 & Symptom==13 & Diary==1, horizontal col("`r(p2)'") || 
rbar percent2 percent3 Primecode2 if Serostatus==0 & Symptom==13 & Diary==1, horizontal col("`r(p3)'") ||
scatter Primecode2 percent3 if Serostatus==0 & Symptom==13 & Diary==1, msymbol(i) mlabel(plottotal) mlabposition(3) mlabsize(3.5) mlabcolor(black) graphregion(color(white)) bgcolor(white)   ylabel(1 `""ChAdOx1" "nCoV-19""' 3 "BNT162b2") legend(order(1 "Grade 1" 2 "Grade 2" 3 "Grade 3")) ytitle("") xtitle("Percentage by group") ;

Click image for larger version

Name: example.jpg
Views: 2
Size: 171.6 KB
ID: 1733902

Attached Files

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35762
#10

15 Nov 2023, 05:36

Thanks for #9. I am interpreting this as that you have the graphs, or at least the approach, that you want.

I think the graph in #7 was added in an edit. I was responding to what was first posted.

You have left the questions about catplot hanging, but for any future readers, I should summarize (and indeed correct).

catplot does not forbid use of over() options and it doesn't itself exclude observations that have missing values on any variables named therein. But the over()code by default will do that any way.

People unfamiliar with the auto data should know that there are no missing values on foreign but missing values on rep78.

You can try this for yourself, but graphs G1 and G2 are identical and exclude missing values and graphs G3 and G4 are identical and include missing values.

Code:

. sysuse auto, clear (1978 automobile data) . catplot foreign, over(rep78) name(G1) . catplot foreign rep78, name(G2) . catplot foreign, over(rep78) missing name(G3) . catplot foreign rep78, missing name(G4)
Comment
Robert Shaw

Join Date: Nov 2021

Posts: 37
#11

15 Nov 2023, 05:45

Yes, thanks Nick! In the end, I got there, but it was extremely helpful, nonetheless, to have your input, so thank you for taking the time. A note to myself, rather than to anyone else, and certainly not a criticism: the user written packages by yourself such as catplot and stripplot are unbelievably helpful when initially looking at the data and trialling visualisation. They are rapid and do what they are designed to do. Ultimately though, in order to produce the detailing that (I feel) is required for adequate visualisation, I invariably go back to twoway plots which have more flexibility, but are less of a readymade package and involved some preparation of the variables to get the desired output.
Kind regards
Robert Shaw
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35762
#12

15 Nov 2023, 06:29

I disagree slightly. graph bar, graph hbar and graph dot have flexibility that is often crucial to what you want. Other way round, they have a point of view that is often not at all what people want, which is that one axis is categorical. Not understanding that has led to some frustration and disappointment.

I write wrappers for twoway (stripplot is an example) and wrappers for graph dot, graph bar, or graph hbar. Most of my commands are the former.

A law of programming is simply that if my (*) program doesn't do what you want, you're free to write your own instead.
(*) "my" is generic here and could be the comment of any programmer not employed to follow instructions.

(I don't have to incriminate myself, but I once wrote a wrapper for graph pie.)
Comment

Tingting Tan

Join Date: May 2018
Posts: 8

#13

22 Nov 2023, 13:23

Hello,

I am learning how to use 'catplot' and wonder if I can have the race label show up only on the left (instead of both places)? Here is my code:

Click image for larger version

Name: fig6. statalist.png
Views: 1
Size: 194.2 KB
ID: 1734797

Code:

catplot careerint_bh race, by(female, note("") l1title(" ") title("Fig 5.Career Interest by Race and Gender at the beginning of HS", size(medium))) ///
        blabel(bar, position(outside) format(%3.1f) size(1.5))   ///
        percent(careerint_bh)    ///
        ylab(0(10)50) ///
        ytitle("Proportion(%)")

Comment

Maarten Buis

Join Date: Mar 2014
Posts: 3466

#14

23 Nov 2023, 02:27

You are probably looking for twby, which you can get by typing in Stata ssc install twby

You did not provide example data (see the FAQ on how to do that, black bar near the top of this page). I create some example data myself (I will do this once for you, but I will not do it again):

Code:

// create some example data
clear all
input female race careerint_bh prop
1 1 1 .26
1 1 2 .297
1 1 3 .108
1 2 1 .118
1 2 2 .123
1 2 3 .063
1 3 1 .05
1 3 2 .039
1 3 3 .026
1 4 1 .04
1 4 2 .093
1 4 3 .049
1 5 1 .019
1 5 2 .024
1 5 3 .009
2 1 1 .268
2 1 2 .208
2 1 3 .34
2 2 1 .123
2 2 2 .096
2 2 3 .214
2 3 1 .053
2 3 2 .035
2 3 3 .078
2 4 1 .054
2 4 2 .069
2 4 3 .093
2 5 1 .015
2 5 2 .016
2 5 3 .021
end

gen m = 5000*1.careerint_bh+ 1000*2.careerint_bh + 500*3.careerint_bh
gen freq = round(m*prop)
label define field 1 "non-stem & non-health" ///
                   2 "stem" ///
                   3 "doctor/health"
label define female 2 "female" ///
                    1 "male"
label define race 1 "white" ///
                  2 "hispanic" ///
                  3 "black" ///
                  4 "asian" ///
                  5 "other"
label value careerint_bh field
label value female female
label value race race                  
label var careerint_bh "career interest at end HS"
expand freq
drop freq

So now I have data that probably looks like yours. We can start working out what table we want to visualize. I suspect that the graph you showed is not the table you want to show. Instead, I assume you want to see the percentages of interest within both race and sex

Code:

table (race careerint_bh) (female), stat(percent, across(careerint_bh ))

we are going to recreate that table in the data, so we can plot it. This is going to drastically change the data, so we do this in a different frame.

Code:

frame copy default tograph
frame change tograph
contract careerint_bh female race, zero nomiss
egen tot = total(_freq), by(female race)
gen perc = _freq / tot *100

// to display the percentages we fix the display format and the location
format perc %5.0f
gen y = -5

//two variants of the same graph

twby female race, compact legend(off): ///
    twoway bar perc careerint_bh,      ///
         horizontal ylab(1/3,val)      ///
         barw(.75)                     ///
         xtitle(percent)            || ///
    scatter careerint_bh y ,           ///
        mlab(perc) msymbol(none)       ///
        mlabpos(0) mlabcolor(black)    ///
        name(variant1, replace)
        
twby female careerint_bh, compact legend(off): ///
    twoway bar perc race,                      ///
         horizontal ylab(1/5,val)              ///
         barw(.75)                             ///
         xtitle(percent)                    || ///
    scatter race y ,                           ///
        mlab(perc) msymbol(none)               ///
        mlabpos(0) mlabcolor(black)            ///
        name(variant2, replace)

Click image for larger version

Name: variant1.png
Views: 1
Size: 78.8 KB
ID: 1734837

Click image for larger version

Name: variant2.png
Views: 1
Size: 83.9 KB
ID: 1734838

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35762
#15

23 Nov 2023, 03:25

In addition to the excellent suggestions of Maarten Buis note that tabplot from the Stata Journal has very similar flavour. The command name tabplot is a search term for examples in this forum.
Comment

Announcement