Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding the number of observations to y labels in catplot (or hbar)

    Hello everyone,

    I am not even sure whether this is possible, but here is an explanation what I would like to achieve:
    I am using catplot to plot some data for a report I am writing.
    I have several HH datasets which I merged and now I would like to plot the outcome variables over the datasets and another variable of interest, let's say the gender of the respondent.
    Here an example to understand my problem better:

    Code:
    sysuse auto.dta, clear
    gen expensive = .
    replace   expensive =1 if price > 7000 & price!=.
    replace   expensive =0 if price < 7000 & price!=.
    
    gen himpg = mpg > 25
    label def himpg 1 "mpg > 25" 0 "mpg <= 25"
    label val himpg himpg
    
        catplot  expensive  foreign himp , ///
                percent(foreign himp ) ///
                var1opts(label(labsize(vsmall))) ///
                var2opts(label(labsize(vsmall))) ///
                var3opts(label(labsize(vsmall)))
    Now what I would like to achieve is have the number of observations for each group appear next to the label of foreign (this would be in my case the gender of the respondent).
    So far I have created a code which saves the number of observations in each group and then saves them to a local macro.
    My idea was to then loop through
    Code:
     gr_edit .grpaxis.edit_tick `i' `x' "Foreign (n=`y')", tickset(major)
    to change the label of each subgroup individually.
    The problem I ran into is that the y ticks/labels (the value you need to enter into `x') are not equally distributed on the y axis so it is hard to find an expression which distributes the new labels next to the graphs.
    I do not want to do it manually for each graph I am creating, but would like to automate this process.
    So my main questions are:
    1. Is there a way to somehow extract/save the value of each y label/tick on the y axis and loop it back into the
    Code:
    gr_edit .grpaxis.edit_tick
    function?
    2. If not, is there a different way to add the number of observations next to each bar or label? (not manually, I am aware that I can show the number of obs. by plotting the frequencies instead of the percentage)
    3. If not, can someone suggest an expression with which I can calculate each individual y value of the each tick/label on the y axis?

    Thank you very much.

    Here is the code I have written so far.
    Sorry, that it is so messy, I am not very familiar with STATA coding practices and used a bunch of work arounds to get to what I am trying to do.

    Code:
    local num_obs
                local dataset dataset1 dataset2 dataset3 dataset4 dataset5 dataset6
                local outcome_var expensive  
                local over_var foreign
                tab `outcome_var' `over_var'
                levelsof `over_var', loc(f)
                foreach x of local dataset{
                    foreach n in `f' {
                    tab `outcome_var' if `over_var'==`n' & DATASET=="`x'" & `outcome_var'!=.
                    loc f`n'`x' `"`r(N)'"'
                    local num_obs `num_obs' `f`n'`x''
                    dis "`f`n'`x''"
                        }
                    }
                
                dis "`num_obs'"
                local not 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 // I included this to get rid of the cases where there are 0 observations. There is probably an easier way, however with other approaches the zeros were also removed inside other numbers (e.g. 420 to 42)
                local num_obs : list num_obs - not
                dis "`num_obs'"
                local varcount `:word count `num_obs''
                dis "`varcount'"
    
                local  quantils2
                local  quantils3
                local quantils =  100/`varcount' // 100 is the total height of the y axis and this is also where the problem lies as the y-labels/ticks are not equally distributed on the y axis
                display "`quantils'"
    
                forvalues i =  1/`varcount'{
                    local quantils2 = `quantils' * `i'
                    display "`quantils2'"
                    local quantils3 `quantils3' `quantils2'
                }
                display "`quantils3'"
            
                tokenize `quantils3'
                forval j = 1 (2) `varcount'{
                    local quantils3_uneven  `quantils3_uneven' ``j''
                }
                display "`quantils3_uneven'"
                tokenize `quantils3'
                forval j = 2 (2) `varcount'{
                    local quantils3_even  `quantils3_even' ``j''
                }
                display "`quantils3_even'"
    
    
    
        catplot  expensive  foreign himp , //
                percent(foreign himp ) ///
                var1opts(label(labsize(vsmall))) ///
                var2opts(label(labsize(vsmall))) ///
                var3opts(label(labsize(vsmall)))
                
    
    
    forvalues i =  `varcount' (-2) 1 {
                    gettoken y num_obs: num_obs
                    gettoken x quantils3_even: quantils3_even
                    gr_edit .grpaxis.major.num_rule_ticks = 0
                    gr_edit .grpaxis.edit_tick `i' `x' "Men (n=`y')", tickset(major)
                }
                
    forvalues i =  `varcount_uneven' (-2) 1 {
                    gettoken y num_obs: num_obs
                    gettoken x quantils3_uneven: quantils3_uneven
                gr_edit .grpaxis.edit_tick `i' `x' "Women (n=`y')", tickset(major)
                }
    I am happy for any help provided.

    Best,
    David
    Last edited by David Schneider; 24 Mar 2023, 00:46.

  • #2
    catplot is community-contributed and is from SSC.

    Does this help?

    Code:
    sysuse auto.dta, clear
    gen expensive = price > 7000 & price!=.
    gen himpg = mpg > 25
    label def himpg 1 "mpg > 25" 0 "mpg <= 25"
    label val himpg himpg
    
    egen groupvar = group(himp foreign)
    
    su groupvar , meanonly 
    
    forval j = 1/`r(max)' { 
        * by construction -foreign- is constant within groups of -groupvar- 
        * we just need to retrieve the constant value programmatically: max, min and mean would all work 
        su foreign if groupvar == `j', meanonly 
        label def  groupvar `j' "`: label (foreign) `r(max)'' ({it:n} = `r(N)')", modify 
    }
    
    label val groupvar groupvar 
    
        catplot  expensive  groupvar himp , ///
                percent(foreign himp) nofill ///
                var1opts(label(labsize(vsmall))) ///
                var2opts(label(labsize(vsmall))) ///
                var3opts(label(labsize(vsmall)))

    Comment


    • #3
      Dear Nick,

      Amazing thank you! That's exactly what I needed.
      I made it a lot more complicated than necessary, could have thought earlier about just using egen group().

      Many thanks for you help.

      And thanks for adding that catplot is community-contributed, I did indeed forget to mention that.

      Best,

      David

      Comment

      Working...
      X