Hi,
Is there any way to plot a histogram where the density is not relative to all observations without missing data, but including missing data?
Here is the issue I am having: I want to plot the distribution of retirement age for my treated and control group. However, for a substantial portion of each group, I do not see them retiring in my data. My variable of interest is "retire7age1_year." "Treat5" is a dummy which = 1 if individuals are in the treated group, and 0 if individuals are in the control group. I am using byhist because I am reweighting the control group, and byhist allows for pweights.
The following code compares the distribution of all observed retirements in the treated group vs. all observed retirements in the control group:

However, I want to also count how many missing data (non-retired people) there are in each group. I've gotten around this in a very rudimentary way... I create a new variable "retire7age1_year_exit" which = 100 if the individual is missing a retirement year. Then I save the tabulate results in a matrix, save the matrix as a text file, add this text file as a new dataset and plot the tabulate results by group, relabeling 100 as "Not Retired."
This gets me what I want:

where NR = not retired.

which is a "zoomed in" version of the graph above, excluding NR individuals.
However, it has a lot of steps and is very specific to the data/outcome variable I am using. Is there an easier way? This seems like something that a lot of people might want to do, but I can't seem to find a similar question on Statalist.
Is there any way to plot a histogram where the density is not relative to all observations without missing data, but including missing data?
Here is the issue I am having: I want to plot the distribution of retirement age for my treated and control group. However, for a substantial portion of each group, I do not see them retiring in my data. My variable of interest is "retire7age1_year." "Treat5" is a dummy which = 1 if individuals are in the treated group, and 0 if individuals are in the control group. I am using byhist because I am reweighting the control group, and byhist allows for pweights.
The following code compares the distribution of all observed retirements in the treated group vs. all observed retirements in the control group:
Code:
byhist retire7age1_year [pw = treatage_weight], by(treat5) density discrete
However, I want to also count how many missing data (non-retired people) there are in each group. I've gotten around this in a very rudimentary way... I create a new variable "retire7age1_year_exit" which = 100 if the individual is missing a retirement year. Then I save the tabulate results in a matrix, save the matrix as a text file, add this text file as a new dataset and plot the tabulate results by group, relabeling 100 as "Not Retired."
Code:
tab retire7age1_year_exit treat5 [aw = treatage_weight], matcell(treatcontrol_tab) matrix rownames treatcontrol_tab = 49 50 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 71 100 matrix list treatcontrol_tab matrix missingages = (0,0\0,0\0,0) matrix rownames missingages = 51 52 70 matrix treatcontrol_tab = treatcontrol_tab\ missingages ssc install mat2txt mat2txt, matrix(treatcontrol_tab) saving(retire7age1_year_exit) replace preserve import delimited using "retire7age1_year_exit.txt", clear rename v1 retireage rename c1 control_freq rename c2 treat_freq drop v4 egen control_tot = sum(control_freq) egen treat_tot = sum(treat_freq) gen treat_prop = treat_freq / treat_tot gen control_prop = control_freq / control_tot twoway (bar treat_prop retireage if retireage < 100, color(ebg)) /// (bar control_prop retireage if retireage < 100, fcolor(none)), /// legend(order(1 "Treat" 2 "Control")) ytitle("Percent") xtitle("Retirement Age") graphregion(color(white)) xtick(49(1)71) graph export retire7age1_treatcontrol_reweight_nonretirescale.png, replace sort retireage tostring retireage, replace replace retireage = "NR" if retireage == "100" graph bar treat_prop control_prop, over(retireage) asyvars /// legend( label(1 "Treat") label(2 "Control")) bar(1, color(ebg)) bar(2, fcolor(erose) lcolor(maroon)) graphregion(color(white)) /// b1title("Retirement Age") ytitle("Percent") graph export retire7age1_treatcontrol_reweight_nonretire.png, replace restore
This gets me what I want:
where NR = not retired.
which is a "zoomed in" version of the graph above, excluding NR individuals.
However, it has a lot of steps and is very specific to the data/outcome variable I am using. Is there an easier way? This seems like something that a lot of people might want to do, but I can't seem to find a similar question on Statalist.
Comment