Adding the number of observations to y labels in catplot (or hbar)

David Schneider

Join Date: Mar 2021
Posts: 10

Adding the number of observations to y labels in catplot (or hbar)

24 Mar 2023, 00:41

Hello everyone,

I am not even sure whether this is possible, but here is an explanation what I would like to achieve:
I am using catplot to plot some data for a report I am writing.
I have several HH datasets which I merged and now I would like to plot the outcome variables over the datasets and another variable of interest, let's say the gender of the respondent.
Here an example to understand my problem better:

Code:

sysuse auto.dta, clear
gen expensive = .
replace   expensive =1 if price > 7000 & price!=.
replace   expensive =0 if price < 7000 & price!=.

gen himpg = mpg > 25
label def himpg 1 "mpg > 25" 0 "mpg <= 25"
label val himpg himpg

    catplot  expensive  foreign himp , ///
            percent(foreign himp ) ///
            var1opts(label(labsize(vsmall))) ///
            var2opts(label(labsize(vsmall))) ///
            var3opts(label(labsize(vsmall)))

Now what I would like to achieve is have the number of observations for each group appear next to the label of foreign (this would be in my case the gender of the respondent).
So far I have created a code which saves the number of observations in each group and then saves them to a local macro.
My idea was to then loop through

Code:

 gr_edit .grpaxis.edit_tick `i' `x' "Foreign (n=`y')", tickset(major)

to change the label of each subgroup individually.
The problem I ran into is that the y ticks/labels (the value you need to enter into `x') are not equally distributed on the y axis so it is hard to find an expression which distributes the new labels next to the graphs.
I do not want to do it manually for each graph I am creating, but would like to automate this process.
So my main questions are:
1. Is there a way to somehow extract/save the value of each y label/tick on the y axis and loop it back into the

Code:

gr_edit .grpaxis.edit_tick

function?
2. If not, is there a different way to add the number of observations next to each bar or label? (not manually, I am aware that I can show the number of obs. by plotting the frequencies instead of the percentage)
3. If not, can someone suggest an expression with which I can calculate each individual y value of the each tick/label on the y axis?

Thank you very much.

Here is the code I have written so far.
Sorry, that it is so messy, I am not very familiar with STATA coding practices and used a bunch of work arounds to get to what I am trying to do.

Code:

local num_obs
            local dataset dataset1 dataset2 dataset3 dataset4 dataset5 dataset6
            local outcome_var expensive  
            local over_var foreign
            tab `outcome_var' `over_var'
            levelsof `over_var', loc(f)
            foreach x of local dataset{
                foreach n in `f' {
                tab `outcome_var' if `over_var'==`n' & DATASET=="`x'" & `outcome_var'!=.
                loc f`n'`x' `"`r(N)'"'
                local num_obs `num_obs' `f`n'`x''
                dis "`f`n'`x''"
                    }
                }
            
            dis "`num_obs'"
            local not 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 // I included this to get rid of the cases where there are 0 observations. There is probably an easier way, however with other approaches the zeros were also removed inside other numbers (e.g. 420 to 42)
            local num_obs : list num_obs - not
            dis "`num_obs'"
            local varcount `:word count `num_obs''
            dis "`varcount'"

            local  quantils2
            local  quantils3
            local quantils =  100/`varcount' // 100 is the total height of the y axis and this is also where the problem lies as the y-labels/ticks are not equally distributed on the y axis
            display "`quantils'"

            forvalues i =  1/`varcount'{
                local quantils2 = `quantils' * `i'
                display "`quantils2'"
                local quantils3 `quantils3' `quantils2'
            }
            display "`quantils3'"
        
            tokenize `quantils3'
            forval j = 1 (2) `varcount'{
                local quantils3_uneven  `quantils3_uneven' ``j''
            }
            display "`quantils3_uneven'"
            tokenize `quantils3'
            forval j = 2 (2) `varcount'{
                local quantils3_even  `quantils3_even' ``j''
            }
            display "`quantils3_even'"



    catplot  expensive  foreign himp , //
            percent(foreign himp ) ///
            var1opts(label(labsize(vsmall))) ///
            var2opts(label(labsize(vsmall))) ///
            var3opts(label(labsize(vsmall)))
            


forvalues i =  `varcount' (-2) 1 {
                gettoken y num_obs: num_obs
                gettoken x quantils3_even: quantils3_even
                gr_edit .grpaxis.major.num_rule_ticks = 0
                gr_edit .grpaxis.edit_tick `i' `x' "Men (n=`y')", tickset(major)
            }
            
forvalues i =  `varcount_uneven' (-2) 1 {
                gettoken y num_obs: num_obs
                gettoken x quantils3_uneven: quantils3_uneven
            gr_edit .grpaxis.edit_tick `i' `x' "Women (n=`y')", tickset(major)
            }

I am happy for any help provided.

Best,
David

Last edited by David Schneider; 24 Mar 2023, 00:46.

Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35496

24 Mar 2023, 07:13

catplot is community-contributed and is from SSC.

Does this help?

Code:

sysuse auto.dta, clear
gen expensive = price > 7000 & price!=.
gen himpg = mpg > 25
label def himpg 1 "mpg > 25" 0 "mpg <= 25"
label val himpg himpg

egen groupvar = group(himp foreign)

su groupvar , meanonly 

forval j = 1/`r(max)' { 
    * by construction -foreign- is constant within groups of -groupvar- 
    * we just need to retrieve the constant value programmatically: max, min and mean would all work 
    su foreign if groupvar == `j', meanonly 
    label def  groupvar `j' "`: label (foreign) `r(max)'' ({it:n} = `r(N)')", modify 
}

label val groupvar groupvar 

    catplot  expensive  groupvar himp , ///
            percent(foreign himp) nofill ///
            var1opts(label(labsize(vsmall))) ///
            var2opts(label(labsize(vsmall))) ///
            var3opts(label(labsize(vsmall)))

Comment

David Schneider

Join Date: Mar 2021

Posts: 10
#3

27 Mar 2023, 02:25

Dear Nick,

Amazing thank you! That's exactly what I needed.
I made it a lot more complicated than necessary, could have thought earlier about just using egen group().

Many thanks for you help.

And thanks for adding that catplot is community-contributed, I did indeed forget to mention that.

Best,

David
Comment

Announcement

Adding the number of observations to y labels in catplot (or hbar)

Comment

Comment