Hello everyone,
I am not even sure whether this is possible, but here is an explanation what I would like to achieve:
I am using catplot to plot some data for a report I am writing.
I have several HH datasets which I merged and now I would like to plot the outcome variables over the datasets and another variable of interest, let's say the gender of the respondent.
Here an example to understand my problem better:
Now what I would like to achieve is have the number of observations for each group appear next to the label of foreign (this would be in my case the gender of the respondent).
So far I have created a code which saves the number of observations in each group and then saves them to a local macro.
My idea was to then loop through
to change the label of each subgroup individually.
The problem I ran into is that the y ticks/labels (the value you need to enter into `x') are not equally distributed on the y axis so it is hard to find an expression which distributes the new labels next to the graphs.
I do not want to do it manually for each graph I am creating, but would like to automate this process.
So my main questions are:
1. Is there a way to somehow extract/save the value of each y label/tick on the y axis and loop it back into the
function?
2. If not, is there a different way to add the number of observations next to each bar or label? (not manually, I am aware that I can show the number of obs. by plotting the frequencies instead of the percentage)
3. If not, can someone suggest an expression with which I can calculate each individual y value of the each tick/label on the y axis?
Thank you very much.
Here is the code I have written so far.
Sorry, that it is so messy, I am not very familiar with STATA coding practices and used a bunch of work arounds to get to what I am trying to do.
I am happy for any help provided.
Best,
David
I am not even sure whether this is possible, but here is an explanation what I would like to achieve:
I am using catplot to plot some data for a report I am writing.
I have several HH datasets which I merged and now I would like to plot the outcome variables over the datasets and another variable of interest, let's say the gender of the respondent.
Here an example to understand my problem better:
Code:
sysuse auto.dta, clear gen expensive = . replace expensive =1 if price > 7000 & price!=. replace expensive =0 if price < 7000 & price!=. gen himpg = mpg > 25 label def himpg 1 "mpg > 25" 0 "mpg <= 25" label val himpg himpg catplot expensive foreign himp , /// percent(foreign himp ) /// var1opts(label(labsize(vsmall))) /// var2opts(label(labsize(vsmall))) /// var3opts(label(labsize(vsmall)))
So far I have created a code which saves the number of observations in each group and then saves them to a local macro.
My idea was to then loop through
Code:
gr_edit .grpaxis.edit_tick `i' `x' "Foreign (n=`y')", tickset(major)
The problem I ran into is that the y ticks/labels (the value you need to enter into `x') are not equally distributed on the y axis so it is hard to find an expression which distributes the new labels next to the graphs.
I do not want to do it manually for each graph I am creating, but would like to automate this process.
So my main questions are:
1. Is there a way to somehow extract/save the value of each y label/tick on the y axis and loop it back into the
Code:
gr_edit .grpaxis.edit_tick
2. If not, is there a different way to add the number of observations next to each bar or label? (not manually, I am aware that I can show the number of obs. by plotting the frequencies instead of the percentage)
3. If not, can someone suggest an expression with which I can calculate each individual y value of the each tick/label on the y axis?
Thank you very much.
Here is the code I have written so far.
Sorry, that it is so messy, I am not very familiar with STATA coding practices and used a bunch of work arounds to get to what I am trying to do.
Code:
local num_obs local dataset dataset1 dataset2 dataset3 dataset4 dataset5 dataset6 local outcome_var expensive local over_var foreign tab `outcome_var' `over_var' levelsof `over_var', loc(f) foreach x of local dataset{ foreach n in `f' { tab `outcome_var' if `over_var'==`n' & DATASET=="`x'" & `outcome_var'!=. loc f`n'`x' `"`r(N)'"' local num_obs `num_obs' `f`n'`x'' dis "`f`n'`x''" } } dis "`num_obs'" local not 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 // I included this to get rid of the cases where there are 0 observations. There is probably an easier way, however with other approaches the zeros were also removed inside other numbers (e.g. 420 to 42) local num_obs : list num_obs - not dis "`num_obs'" local varcount `:word count `num_obs'' dis "`varcount'" local quantils2 local quantils3 local quantils = 100/`varcount' // 100 is the total height of the y axis and this is also where the problem lies as the y-labels/ticks are not equally distributed on the y axis display "`quantils'" forvalues i = 1/`varcount'{ local quantils2 = `quantils' * `i' display "`quantils2'" local quantils3 `quantils3' `quantils2' } display "`quantils3'" tokenize `quantils3' forval j = 1 (2) `varcount'{ local quantils3_uneven `quantils3_uneven' ``j'' } display "`quantils3_uneven'" tokenize `quantils3' forval j = 2 (2) `varcount'{ local quantils3_even `quantils3_even' ``j'' } display "`quantils3_even'" catplot expensive foreign himp , // percent(foreign himp ) /// var1opts(label(labsize(vsmall))) /// var2opts(label(labsize(vsmall))) /// var3opts(label(labsize(vsmall))) forvalues i = `varcount' (-2) 1 { gettoken y num_obs: num_obs gettoken x quantils3_even: quantils3_even gr_edit .grpaxis.major.num_rule_ticks = 0 gr_edit .grpaxis.edit_tick `i' `x' "Men (n=`y')", tickset(major) } forvalues i = `varcount_uneven' (-2) 1 { gettoken y num_obs: num_obs gettoken x quantils3_uneven: quantils3_uneven gr_edit .grpaxis.edit_tick `i' `x' "Women (n=`y')", tickset(major) }
Best,
David
Comment