Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Appropriate command (parameters) for density plot

    Hi Stata users,

    I would like to replicate the plot below

    Click image for larger version

Name:	density plot.png
Views:	1
Size:	76.9 KB
ID:	1638098



    I use the syntax
    Code:
    kdensity asset_index
    and the result is the graph below

    Click image for larger version

Name:	my graph.jpg
Views:	1
Size:	21.3 KB
ID:	1638099


    which has a similar pattern with the original pattern.

    I decide to use the command
    Code:
    kdensity asset_index, bwidth(0.07573)
    and the plot changes drastically as shown below

    Click image for larger version

Name:	my graph with the same bandwiidth.jpg
Views:	1
Size:	21.4 KB
ID:	1638100



    Any hints on the best approach to have similar plot?

    P.S: I would be willing to share the data but a using DHS household and household members' data which are large files.

    Thanks in advance!

  • #2
    The bandwidth is measured in the units of the variable being shown. Whether 0.07573 for the first graph you show is just some default or was chosen deliberately somehow I can't tell you, but if the intent was to present a smooth picture of the distribution the graph is a failure. If the intent was to be honest about the lumpiness of the distribution it is presumably more successful.

    There is no reason whatsoever to suppose that 0.07353 is a good choice for your variable which evidently is quite different. Indeed, your distribution looks so lumpy that I would not reach for a kernel density estimate at all on this evidence here....

    There is an intricate and impressive literature on automated choice of supposedly optimal bandwidth, which at a distance mostly shows how different criteria lead to different prescriptions and also how smart the authors are. Some automated choice may be needed by anyone who has a need to produce many, many such estimates without time or inclination to agonise over the choice for each graph.

    Other way round, if this is a one-off graph, there is both an opportunity and an obligation to experiment with different bandwidths and to choose one which portrays the distribution helpfully in the light of what you know about the data generation process and of what you are trying to do with the graph.

    I don't like the Epanechnikov default, if only because I then have to explain the name downstream but also because if I show a graph of the kernel it isn't obvious to people new to these graphs why it is a good idea. I have an irrational fondness for the biweight kernel.
    Last edited by Nick Cox; 25 Nov 2021, 07:57.

    Comment


    • #3
      I think a major reason is that your variable asset_index has different support from the original figure (0~50 vs -1~3). May need to rescale the variable and then make the graph.

      Add: crossed with #2.

      Comment


      • #4
        Thanks Nick Cox and Fei Wang for your advice. Much appreciated.

        Comment

        Working...
        X