Histogram including missing observations

Maggie Shi

Join Date: Jan 2018

Posts: 16
#1

Histogram including missing observations

25 Sep 2018, 08:02

Hi,

Is there any way to plot a histogram where the density is not relative to all observations without missing data, but including missing data?

Here is the issue I am having: I want to plot the distribution of retirement age for my treated and control group. However, for a substantial portion of each group, I do not see them retiring in my data. My variable of interest is "retire7age1_year." "Treat5" is a dummy which = 1 if individuals are in the treated group, and 0 if individuals are in the control group. I am using byhist because I am reweighting the control group, and byhist allows for pweights.

The following code compares the distribution of all observed retirements in the treated group vs. all observed retirements in the control group:

Code:

byhist retire7age1_year [pw = treatage_weight], by(treat5) density discrete

However, I want to also count how many missing data (non-retired people) there are in each group. I've gotten around this in a very rudimentary way... I create a new variable "retire7age1_year_exit" which = 100 if the individual is missing a retirement year. Then I save the tabulate results in a matrix, save the matrix as a text file, add this text file as a new dataset and plot the tabulate results by group, relabeling 100 as "Not Retired."

Code:

tab retire7age1_year_exit treat5 [aw = treatage_weight], matcell(treatcontrol_tab) matrix rownames treatcontrol_tab = 49 50 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 71 100 matrix list treatcontrol_tab matrix missingages = (0,0\0,0\0,0) matrix rownames missingages = 51 52 70 matrix treatcontrol_tab = treatcontrol_tab\ missingages ssc install mat2txt mat2txt, matrix(treatcontrol_tab) saving(retire7age1_year_exit) replace preserve import delimited using "retire7age1_year_exit.txt", clear rename v1 retireage rename c1 control_freq rename c2 treat_freq drop v4 egen control_tot = sum(control_freq) egen treat_tot = sum(treat_freq) gen treat_prop = treat_freq / treat_tot gen control_prop = control_freq / control_tot twoway (bar treat_prop retireage if retireage < 100, color(ebg)) /// (bar control_prop retireage if retireage < 100, fcolor(none)), /// legend(order(1 "Treat" 2 "Control")) ytitle("Percent") xtitle("Retirement Age") graphregion(color(white)) xtick(49(1)71) graph export retire7age1_treatcontrol_reweight_nonretirescale.png, replace sort retireage tostring retireage, replace replace retireage = "NR" if retireage == "100" graph bar treat_prop control_prop, over(retireage) asyvars /// legend( label(1 "Treat") label(2 "Control")) bar(1, color(ebg)) bar(2, fcolor(erose) lcolor(maroon)) graphregion(color(white)) /// b1title("Retirement Age") ytitle("Percent") graph export retire7age1_treatcontrol_reweight_nonretire.png, replace restore

This gets me what I want:

where NR = not retired.

which is a "zoomed in" version of the graph above, excluding NR individuals.

However, it has a lot of steps and is very specific to the data/outcome variable I am using. Is there an easier way? This seems like something that a lot of people might want to do, but I can't seem to find a similar question on Statalist.
Attached Files

Last edited by Maggie Shi; 25 Sep 2018, 08:08.
Tags: None
Jane Doe

Join Date: Apr 2019

Posts: 2
#2

06 May 2019, 01:35

Hello, everyone,
I have a similar problem to Maggie. I want to create a histogram that takes missing values into account and also displays them as a separate value of the variable, or at least takes them into account when calculating the percentages (the scale level of the variable is nominal with four values). However, Maggie's solution doesn't seem to work for me, so I wonder if there is a simpler solution?
I am looking forward to answers and suggestions!
Jane
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35783
#3

06 May 2019, 02:01

Jane Doe: That's your real name. Commiseration on poor jokes ever since.

"Jane Doe": So that's not your real name. We do ask for real names here. Please read and act on http://www.statalist.org/forums/help#realnames and
#3 of https://www.statalist.org/forums/help#adviceextras

I don't recall #1. Perhaps I and maybe others looked at it and decided it was long and complicated and left someone else to work at it.

#3 "doesn't seem to work for me" is not a report that can be discussed without data, code or specific details. http://www.statalist.org/forums/help#stata gives crucial advice.

If you want to show missings on a histogram and the density or other calculations to include them then they must be assigned a distinct non-missing value and then you can draw a histogram. Here is a simple example.

Code:

sysuse auto, clear clonevar rep78_2 = rep78 replace rep78_2 = 7 if rep78_2 == . label def rep78_2 7 missing label val rep78_2 rep78_2 histogram rep78_2, discrete xla(1/5 7, valuelabel)
Comment
Jane Doe

Join Date: Apr 2019

Posts: 2
#4

06 May 2019, 02:12

Hi Nick,
I am sorry that the post and the name do not fit the requirements, I will act on them!
Thank you very much for your reply and the little example.
Again, sorry!
Antje
Comment

Announcement

Histogram including missing observations

Comment

Comment

Comment