Catplot or graph hbar - how to specify appropriate percentages

Joanna Davies

Join Date: Nov 2015

Posts: 57
#1

Catplot or graph hbar - how to specify appropriate percentages

08 Jun 2016, 02:15

Hello All,

I am trying to produce a horizontal bar graph using 3 categorical variables
1. akps (measure of a patients function): 10 categories, 0-100
2. phase3 (phase of palliative illness): 4 cats, stable, unstable, deteriorating and dying
3. setting: 4 cats, hospital, hospice, community

I want to display the percentage distribution for the akps categories in each setting, with a separate graph for each phase. The point is to see whether the distribution for akps in each phase is similar across settings. I want to display the proportions because there is an uneven number of cases in each setting which makes patterns using the counts less clear, and stretches the graph.

Using catplot from SCC in stata 13, i have produced this....
catplot setting, over(akps) by(phase3) asyvars

And using graph hbar, i have produced the exact same graph....
gen count=1
replace count=. if akps==.
graph hbar (sum)count, over(setting) over(akps) by(phase3) asyvars

I would be grateful for advice on how to achieve the desired display of percentages - I have been trying various combinations and re-structures for some time now and unfortunately cannot resolve it.

Your help is much appreciated.

Many thanks,
Joanna
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35699
#2

08 Jun 2016, 02:53

Dataset example?

With catplot (from SSC, not SCC!) the percent() option is to be thought of as specifying predictors that define separate conditional distributions.

The difference between

Code:

sysuse auto, clear catplot rep78 foreign, percent(foreign) catplot foreign rep78, percent(rep78)

should make that clear, or clearer.

By the way, you are mixing graph hbar and catplot syntax here; that's allowed because catplot is here just a wrapper for graph hbar, but the syntax of catplot was designed so that people could think

catplot response predictor [predictor]

just as they would for a model fitting command.

I can't see that akps is a predictor here; it sounds like the outcome or response of interest. It's on an ordinal scale, so other graph forms might work better here.

Naturally I don't have your data, but for a graded response that is 0(10)100 (11 categories, not 10!) and 4 phases and 3 settings, I did this as a graph sketch using tabplot (SSC; Stata Journal in press):

Code:

clear set scheme s1color set seed 2803 set obs 1200 egen setting = seq(), to(3) label def setting 1 community 2 hospital 3 hospice label val setting setting egen phase3 = seq(), to(4) block(100) label def phase3 1 stable 2 unstable 3 deteriorating 4 dying label val phase3 phase3 gen akps = 10 * floor(11 * runiform()) tabplot akps setting, by(phase3, note("")) percent(setting phase3) showval(format("%2.0f") offset(7)) yla(0(20)100) bfcolor(none) horizontal barw(10) yasis

Last edited by Nick Cox; 08 Jun 2016, 03:06.
Comment
Joanna Davies

Join Date: Nov 2015

Posts: 57
#3

08 Jun 2016, 03:58

Hi Nick,

Thank you - this is really helpful.

I like the tabplot and I will use it. But I still see value in a catplot for being able to grasp, at a glance, similarities and differences in the distribution of akps across settings according to phase.

Re what is the predictor: akps (or functional status 0=dead; 100=perfect health) is associated with phase of illness (or, we expect to see lower function in patients who are dying or deteriorating). Phase is a new measure for us - i want to see (in a very descriptive preliminary way) if phase is being applied in a similar way across settings - or - does the association between akps and phase look similar or vastly different across settings?

So i think in the catplot, i am trying to predict akps based on setting and phase. Thank you for the example explanatory code re catplot - i think i understand where I was going wrong.

Using catplot from SSC (!!sorry)
catplot setting, over(akps) by(phase3) percent(setting phase3) asyvars

I think this gets me what im after - unless im miss-understanding the use of percentage()??

I realise its not easy to see the detail on this graph but i think the overall patters are still useful - any further thoughts/comments you have are much appreciated.

Best,
Joanna

Attached Files
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35699
#4

08 Jun 2016, 04:07

Good.

But you don't give us the equivalent tabplot for comparison. I see the three settings mushed up together inside each panel -- in principle they are separated, but in practice grasping each pattern is hard, so I can't make comparisons easily.
Comment
Joanna Davies

Join Date: Nov 2015

Posts: 57
#5

08 Jun 2016, 04:59

Ah, I see your point re providing (and actually doing!!) the tabplot for comparison. I take back everything i said about the merit in catplot in this case. Tabplot is a much better option for this data. Using your above code, i produced this

tabplot akps setting, by(phase3, note("")) percent(setting phase3) showval(format("%2.0f") offset(7)) yla(0(20)100) bfcolor(none) horizontal barw(10) yasis

Thank you Nick - this is much better.
Joanna

Thread closed
Comment
Joanna Davies

Join Date: Nov 2015

Posts: 57
#6

17 Jun 2016, 12:22

Hello Nick,

I have a further question - as it is about tabplot I am keeping it on this thread (?).

Im having trouble formatting a tabplot (SSC; Stata Journal in press) im trying to produce. The problem is that im not getting the usual bars - instead im getting a marker - I prefer the bars but I cant work out how to change this (after many attempts and reading the help files).

I have built a dataset of summary statistics showing the proportion of patients who had an improvement in symptom scores at t2, each symptom has a separate denominator because not all patients are assessed for every symptom, i need to display the denominators in the tabplot. This is why I have produced this summary data rather than just working with the individual-level data.

Dataset example:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float ipos3 str19 ipos_end float prop_improved_atend 1 "pain (n=69)" 55.07246 2 "sob (n=47)" 46.80851 3 "weakness (n=119)" 30.2521 4 "nausea (n=13)" 76.92308 5 "vomit (n=14)" 71.42857 6 "appetite (n=75)" 40 7 "constipation (n=34)" 64.70588 8 "mouth (n=25)" 64 9 "drowsiness (n=66)" 43.93939 10 "mobility (n=108)" 22.22222 11 "anxiety (n=60)" 48.33333 12 "family (n=98)" 1.9525802 13 "depressed (n=36)" 52.77778 14 "peace (n=47)" 38.29787 15 "feelings (n=35)" 40 16 "information (n=20)" 55 17 "practical (n=28)" 78.57143 end label values ipos3 ipos3 label def ipos3 1 "pain (n=69)", modify label def ipos3 2 "sob (n=47)", modify label def ipos3 3 "weakness (n=119)", modify label def ipos3 4 "nausea (n=13)", modify label def ipos3 5 "vomit (n=14)", modify label def ipos3 6 "appetite (n=75)", modify label def ipos3 7 "constipation (n=34)", modify label def ipos3 8 "mouth (n=25)", modify label def ipos3 9 "drowsiness (n=66)", modify label def ipos3 10 "mobility (n=108)", modify label def ipos3 11 "anxiety (n=60)", modify label def ipos3 12 "family (n=98)", modify label def ipos3 13 "depressed (n=36)", modify label def ipos3 14 "peace (n=47)", modify label def ipos3 15 "feelings (n=35)", modify label def ipos3 16 "information (n=20)", modify label def ipos3 17 "practical (n=28)", modify

Using the following syntax:
labmask ipos3, values(ipos_end)
format prop_improved_atend %2.0f
tabplot ipos3 prop_improved_atend, xasis showval(prop_improved_atend) horizontal barw(0.9) ///
yla(, noticks) ytitle("") subtitle("percent") xtitle("") xlabel(,labsize(vsmall))

I have produced this....

Is it possible to have the bars instead of these markers? Sorry if there is an obvious resolution to this but I cant find it.

Thank you.
Joanna

p.s. I realise that in this dataset the denominators are really too small to produce proportions - this is just an example dataset im using to develop the code for reporting, the real data is larger so the proportions will be more appropriate.
Comment
Joanna Davies

Join Date: Nov 2015

Posts: 57
#7

17 Jun 2016, 12:26

This additional post is in error - i cant see how to delete

Last edited by Joanna Davies; 17 Jun 2016, 12:42.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35699

17 Jun 2016, 12:49

tabplot is doing exactly as you ask. The problem is that prop_improved_atend is not a categorical variable and makes no sense as a column identifier. It needs to be supplied as a weight.

Further, what tabplot does by default is to count occurrences. In your case there is precisely one occurrence of each cross-combination, so you get bars all of length 1. They really are bars, not markers. Clearly they look small on your scale, which is a side-effect of your specifying xasis.

I guess this is closer to what you want. I would re-order the bars unless there is a psychological/psychiatric/clinical rationale for the order you use.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
set scheme s1color
input float ipos3 str19 ipos_end float prop_improved_atend
 1 "pain (n=69)"          55.07246
 2 "sob (n=47)"           46.80851
 3 "weakness (n=119)"      30.2521
 4 "nausea (n=13)"        76.92308
 5 "vomit (n=14)"         71.42857
 6 "appetite (n=75)"            40
 7 "constipation (n=34)"  64.70588
 8 "mouth (n=25)"               64
 9 "drowsiness (n=66)"    43.93939
10 "mobility (n=108)"     22.22222
11 "anxiety (n=60)"       48.33333
12 "family (n=98)"       1.9525802
13 "depressed (n=36)"     52.77778
14 "peace (n=47)"         38.29787
15 "feelings (n=35)"            40
16 "information (n=20)"         55
17 "practical (n=28)"     78.57143
end
label values ipos3 ipos3
label def ipos3 1 "pain (n=69)", modify
label def ipos3 2 "sob (n=47)", modify
label def ipos3 3 "weakness (n=119)", modify
label def ipos3 4 "nausea (n=13)", modify
label def ipos3 5 "vomit (n=14)", modify
label def ipos3 6 "appetite (n=75)", modify
label def ipos3 7 "constipation (n=34)", modify
label def ipos3 8 "mouth (n=25)", modify
label def ipos3 9 "drowsiness (n=66)", modify
label def ipos3 10 "mobility (n=108)", modify
label def ipos3 11 "anxiety (n=60)", modify
label def ipos3 12 "family (n=98)", modify
label def ipos3 13 "depressed (n=36)", modify
label def ipos3 14 "peace (n=47)", modify
label def ipos3 15 "feelings (n=35)", modify
label def ipos3 16 "information (n=20)", modify
label def ipos3 17 "practical (n=28)", modify
tabplot ipos3 [iw=prop_improved_atend] , showval(format(%2.0f) offset(0.45)) horizontal subtitle("        percent") bfcolor(green*0.2)  ytitle("")

Click image for larger version

Name: joanna.png
Views: 1
Size: 16.1 KB
ID: 1345760

Last edited by Nick Cox; 17 Jun 2016, 12:54.

Comment

Joanna Davies

Join Date: Nov 2015

Posts: 57
#9

18 Jun 2016, 02:01

Thanks Nick, I see where I was going wrong. This is exactly what I am after.

There is a reason for the order of the bars - it follows the order they appear on the measure so makes sense for the clinicians.

Thank you!
j

thread closed.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35699
#10

18 Jun 2016, 02:23

It should be pointed out that graph hbar will work fine here.

Code:

graph hbar (asis) prop_improved_atend, over(ipos3) blabel(total, format(%2.0f)) subtitle("percent") bar(1, bfcolor(green*0.2)) ysc(off)

I sometimes shift the bars away from the y axis with an extra option such as ysc(r(-2 .))
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35699
#11

18 Jun 2016, 05:02

When I say the y axis here I mean the left-hand axis. Stata describes the response axis, here horizontal, as the y axis with graph hbar.
1 like
Comment

Announcement