Appropriate command (parameters) for density plot

Stephen Okiya

Join Date: Jun 2026

Posts: 280
#1

Appropriate command (parameters) for density plot

25 Nov 2021, 07:13

Hi Stata users,

I would like to replicate the plot below

I use the syntax

Code:

kdensity asset_index

and the result is the graph below

which has a similar pattern with the original pattern.

I decide to use the command

Code:

kdensity asset_index, bwidth(0.07573)

and the plot changes drastically as shown below

Any hints on the best approach to have similar plot?

P.S: I would be willing to share the data but a using DHS household and household members' data which are large files.

Thanks in advance!
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 36059
#2

25 Nov 2021, 07:48

The bandwidth is measured in the units of the variable being shown. Whether 0.07573 for the first graph you show is just some default or was chosen deliberately somehow I can't tell you, but if the intent was to present a smooth picture of the distribution the graph is a failure. If the intent was to be honest about the lumpiness of the distribution it is presumably more successful.

There is no reason whatsoever to suppose that 0.07353 is a good choice for your variable which evidently is quite different. Indeed, your distribution looks so lumpy that I would not reach for a kernel density estimate at all on this evidence here....

There is an intricate and impressive literature on automated choice of supposedly optimal bandwidth, which at a distance mostly shows how different criteria lead to different prescriptions and also how smart the authors are. Some automated choice may be needed by anyone who has a need to produce many, many such estimates without time or inclination to agonise over the choice for each graph.

Other way round, if this is a one-off graph, there is both an opportunity and an obligation to experiment with different bandwidths and to choose one which portrays the distribution helpfully in the light of what you know about the data generation process and of what you are trying to do with the graph.

I don't like the Epanechnikov default, if only because I then have to explain the name downstream but also because if I show a graph of the kernel it isn't obvious to people new to these graphs why it is a good idea. I have an irrational fondness for the biweight kernel.

Last edited by Nick Cox; 25 Nov 2021, 07:57.
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#3

25 Nov 2021, 07:50

I think a major reason is that your variable asset_index has different support from the original figure (0~50 vs -1~3). May need to rescale the variable and then make the graph.

Add: crossed with #2.
Comment
Stephen Okiya

Join Date: Jun 2026

Posts: 280
#4

25 Nov 2021, 08:34

Thanks Nick Cox and Fei Wang for your advice. Much appreciated.
Comment

Announcement

Appropriate command (parameters) for density plot

Comment

Comment

Comment