Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convex hulls on scatter plots

    A search for discussions of convex hulls in Stata forums or outlets reveals various programs from 1995, 1997, 1998 in the Stata Technical Bulletin (all using Stata's old graphics), an ado file cvxhull posted by Allan Reese in 2004

    https://www.stata.com/statalist/arch.../msg00193.html

    and not much else. This is a surprise to me.

    In what follows, my focus is entirely on what you can do on or with scatter plots -- or points in two dimensions -- and not with one dimension or with three dimensions or more.

    For whatever reasons, convex hulls no longer seem popular or even known about in statistical graphics.

    I should back up, as some people will already be lost if they do not know about convex hulls, or at least do not know the term. The idea is likely to be familiar or at least immediate once exemplified and it may summon distant memories of childhood pastimes in which you connected the dots and Cinderella, or a horse, or something equally interesting emerged from a puzzle book.

    Here is a convex hull as produced by

    Code:
    ssc install cvxhull
    sysuse auto, clear 
    set scheme s1color 
    cvxhull mpg weight, hull(1) noreport

    Click image for larger version

Name:	cvxhull.png
Views:	1
Size:	34.4 KB
ID:	1517557


    So, a convex hull is the smallest convex polygon including all the points in a set. Some points are on the hull and the others are inside.

    A standard thought experiment is to imagine the points on the scatter plot as pins on a board. Summon up a giant rubber band (https://en.wikipedia.org/wiki/Rubber_band), stretch it to include all the points, and then let it go. The hull is now marked by the band.

    OK, but why should you find this interesting or useful? It's when there are two or more groups that this becomes of note. I will show some more results before giving the small sales pitch, although if you need the pitch after the pictures, then I have probably failed.

    cvxhull does not (does not promise to) give you all you may find helpful but it does leave behind variables that are essential for further processing. Each hull is presented by two variables defining different sides of the hull.

    Thus we can do things like this:



    Click image for larger version

Name:	cvxhull1.png
Views:	1
Size:	34.2 KB
ID:	1517558
    Click image for larger version

Name:	cvxhull2.png
Views:	1
Size:	31.6 KB
ID:	1517559



    To spell it out:

    0. This is pretty easy to explain. In my experience, the story of pins on a board nails it easily for people new to the idea. Thanks to cvxhull it is easy to implement.

    1. Convex hulls look good shown as areas contained. This enhances perception of point patterns as wholes.

    2. Transparency as introduced in Stata 15 is invaluable whenever, as will be common in interesting cases, hulls overlap.

    3. If the reaction is that the hull is unduly influenced by outliers -- indeed being on the hull is one way to identify outliers -- then we can carry out peeling. Onion-like, inside each convex hull lies another that is the convex hull of the remaining points (until we run out of data points). The second graph shows the second hulls.

    4. In the code below getting the sort order right is crucial detail.

    5. Old news to some, but orange and blue work well together.

    Here is the complete code for the last two graphs:


    Code:
    sysuse auto, clear
    ssc install cvxhull
    set scheme s1color 
    cvxhull mpg weight , group(foreign) noreport hull(2)
    sort weight mpg 
    local opts legend(off) aspect(1) yla(, ang(h)) ytitle("`: var label mpg'")
    
    twoway rarea _cvxh1l _cvxh1r weight if foreign, color(orange%20) sort /// 
    || rarea _cvxh1l _cvxh1r weight if !foreign, color(blue%20) sort      ///
    || scatter mpg weight if foreign, ms(Oh) mc(orange)                   ///
    || scatter mpg weight if !foreign, ms(+) mc(blue) `opts' name(G1, replace)
    
    twoway rarea _cvxh2l _cvxh2r weight if foreign, color(orange%20) sort ///
    || rarea _cvxh2l _cvxh2r weight if !foreign, color(blue%20) sort      ///
    || scatter mpg weight if foreign, ms(Oh) mc(orange)                   ///
    || scatter mpg weight if !foreign, ms(+) mc(blue) `opts' name(G2, replace)

    Detail: The contact address on the help file for cvxhull is out-of-date. Allan has moved twice since then.

  • #2
    Just stumbled across this kind mention. I still use cvxhull/cvxplot occasionally, and grateful to Nick for pointing out this enhancement. Apart from overlapping hulls for contrasted groups, cvxhull provides nested hulls which might be coloured with increasing density as contours. For the record, I'm now retired but still interested and my contact is [email protected]

    Comment


    • #3
      I have a small question. Does this command allow us to include the border points within it?

      Comment


      • #4
        I don’t understand the question in #3.

        Identifying their convex hull divides a set of points in the plane into disjoint subsets, those on the hull and those inside it.

        Comment


        • #5
          I was thinking if it is possible to include all the points of a group inside a rubber band like plot. Like you mentioned how the points can lie on the hull and inside it but what if we wanted all points to be within the plot. Something like the picture attached
          Click image for larger version

Name:	ggplot_masterlist_2.png
Views:	1
Size:	21.5 KB
ID:	1683340


          Also, I tried using price and mpg as variables to replicate your code but it seems to exclude one point from the visual which I am not sure why.

          Comment


          • #6
            Sorry; I still don't get what you are asking here.

            Comment

            Working...
            X