Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • codensity package available from SSC

    With thanks as ever to Kit Baum, a new package codensity is available from SSC.

    It is billed as requiring Stata 8, but in truth I have not tested it against Stata 8 which has long since been inaccessible to me. Naturally, if I get feedback that it will not work in some version of Stata greater than 8, I will do my best to document matters.

    I say new but this package is a revised version of multidensity as announced in 2020 at https://www.statalist.org/forums/for...lable-from-ssc That thread remains relevant as a source of graphical examples.

    As mentioned in that thread, I came quickly to regret the name, as multidensity could too easily be misread as being about multivariate density estimation (even bivariate as a key special case), which never was implemented and indeed was never the intention, immediate or ultimate. Also, the name was too long, at least for me to want to type repeatedly.

    codensity as a shorter name is at least concise, and is intended to convey minimally convenience and comparison. The help file says a little more, but the name itself is only a little deal. Names of commands should be distinct, concise and as far as possible self-explanatory, which can be difficult to achieve all at once. I have a personal preference to avoid command names which look as if they might be Kingon swearwords.

    multidensity is itself withdrawn. If you downloaded it and still use it, I would encourage you to move to codensity. Some functionality is new, and very little has been changed.

    The focus of this package -- just containing a single command with the same name -- is kernel density estimation, long since supported by official commands, but its selling point is as a convenience wrapper. The help file gives details as usual.

    Manifestly, kernel density estimates can be used in all kinds of ways, even graphically. I have absolutely no intention to trying to compete with Stata packages for violin plots or ridgeline plots, which can be found easily, and typically are written by highly expert Stata user-programmers. I do find many uses in literature to be unconvincing, but none of that is to be blamed on the programmers. The goal of a violin plot is to combine the summary of a box plot with the greater detail of a density estimate, but many examples end up crowded and cryptic. I am often sceptical on precisely why ridgeline plots are regarded as appealing, which I put down partly to evocations of rolling topography, whether of landscapes or of other forms. Letting density estimates obscure or occlude each other partially is not in my view itself a design goal. Superimposition with transparency is in contrast sometimes a great idea.

    What's new or different here, apart from the name?

    An extended help file with more examples.

    codensity generate now has an over() option rather than a by() option to specify different density estimates for different groups. That matches a shift in Stata conventions.

    codensity generate now has a showchar option to show what kernels and bandwidths you just used. The typical need for this is that you delegated choice to the command defaults, but now need to know what you did, so that you can report it or change it.

    There is a new subcommand codensity stack allowing a different graphical style for plotting results.

    You will have known long since whether this might be interesting or useful to you and if so I encourage you not only to download the package with

    Code:
    ssc install codensity
    but also to run the do-file with examples. One example uses the Palmer penguins dataset, which is supplied as ancillary.


  • #2
    Klingon, naturally

    Comment

    Working...
    X