Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • upset_plot now available from SSC

    It is with many thanks to Kit Baum that I introduce upset_plot, a new command for creating UpSet plots in Stata (version 18.5 or higher).

    UpSet plots, first described by Lex et al., provide an alternative to Venn and Euler diagrams for visualizing intersections between multiple sets[1]. Intersections are represented by a binary membership grid, where each column corresponds to a distinct intersection pattern. The frequency of each intersection is displayed in a vertically-aligned bar chart, while a horizontally-aligned bar chart displays the sizes of the individual sets.

    Many of you will be famiiliar with Tim Morris and Nick Cox's excellent upsetplot command, which offers similar functionality. The main difference between the two is upset_plot's horizontal set-size bar chart and it's ability to stack bars over categorical variables. The display of this chart can be suppressed if desired, though users seeking an intersection-only display may be better served by Tim and Nick's upsetplot. A link to the original upsetplot discussion thread on Statalist is given below.

    https://www.statalist.org/forums/for...lable-from-ssc

    A few examples using real-world data are given below. The help file includes several additional examples demonstrating the available customization options.


    Example 1

    In out first example, we visualize data from a Venn diagram published by Emmons et al. in the form of an UpSet plot[2]. Emmons' Venn diagram showed the shared microbial genera across bone, soil, and gut samples taken human remains.

    Code:
    clear
    input byte(sb s hg gbc gbab) float freq
    1 1 1 1 1 9
    1 1 1 1 0 0
    1 1 1 0 1 0
    1 1 1 0 0 1
    1 1 0 1 1 250
    1 1 0 1 0 30
    1 1 0 0 1 7
    1 1 0 0 0 67
    1 0 1 1 1 25
    1 0 1 1 0 1
    1 0 1 0 1 6
    1 0 1 0 0 10
    1 0 0 1 1 344
    1 0 0 1 0 69
    1 0 0 0 1 46
    1 0 0 0 0 357
    0 1 1 1 1 0
    0 1 1 1 0 0
    0 1 1 0 1 0
    0 1 1 0 0 0
    0 1 0 1 1 22
    0 1 0 1 0 20
    0 1 0 0 1 3
    0 1 0 0 0 164
    0 0 1 1 1 5
    0 0 1 1 0 0
    0 0 1 0 1 2
    0 0 1 0 0 107
    0 0 0 1 1 107
    0 0 0 1 0 89
    0 0 0 0 1 28
    0 0 0 0 0 66
    end
    
    label var sb "Surface bone"
    label var s "Soil"
    label var hg "Human gut"
    label var gbc "Grave bone (C)"
    label var gbab "Grave bone (A/B)"
    
    upset_plot sb s hg gbc gbab[fw=freq], set(gap(0.35))
    Click image for larger version

Name:	2026-05-23-upset_plot_Emmons.png
Views:	1
Size:	181.5 KB
ID:	1786180




    Example 2

    We do the same with a Venn diagram from Jaun et al.'s 2025 paper on clinical remission in severe asthma[3].

    The diagram in question, Figure 2, shows two Venn diagrams describing which combinations of four remission criteria were met among patients who did and, separately, did not receive biologic therapies.

    Here, we can make use of upset_plot's over() option to present a single, combined UpSet plot that stacks bars over biologic receipt.
    Code:
    clear
    input byte(group a e f o) float freq
    1 1 1 1 1 6
    1 1 1 1 0 1
    1 1 1 0 1 9
    1 1 1 0 0 0
    1 1 0 1 1 6
    1 1 0 1 0 1
    1 1 0 0 1 9
    1 1 0 0 0 3
    1 0 1 1 1 6
    1 0 1 1 0 0
    1 0 1 0 1 1
    1 0 1 0 0 2
    1 0 0 1 1 16
    1 0 0 1 0 1
    1 0 0 0 1 21
    2 1 1 1 1 72
    2 1 1 1 0 6
    2 1 1 0 1 64
    2 1 1 0 0 10
    2 1 0 1 1 26
    2 1 0 1 0 2
    2 1 0 0 1 15
    2 1 0 0 0 28
    2 0 1 1 1 11
    2 0 1 1 0 4
    2 0 1 0 1 14
    2 0 1 0 0 4
    2 0 0 1 1 10
    2 0 0 1 0 6
    2 0 0 0 1 3
    end
    
    label define group 1 "Biologic naïve" 2 "Biologic treated"
    label values group group
    
    label var a "ACT controlled"
    label var e "No exacerbations"
    label var f "FEV ≥80% predicted"
    label var o "No OCS"
    
    upset_plot a e f o [fweight=freq], over(group) set(gap(0.4)) legend(pos(6) rows(1))
    Click image for larger version

Name:	2026-05-23-upset_plot_Jaun.png
Views:	1
Size:	164.6 KB
ID:	1786183




    Example 3

    Perhaps the most famous and enduring example of a Venn diagram is that by D'Hont et al., a six-way behemoth with some... aptly shaped groups[4]. A similarly whimsical diagram was published by Beale et al. a couple of years later[5].

    Here, we use D'Hont's data. With a few scaling tweaks to accommodate the considerable number of distinct intersection patterns (referred to as sequence clusters in the context of the study), we can summarize the information in a far more readable, if less charming, UpSet plot.
    Code:
    clear
    input byte(p m b s o a) float freq
    1 1 1 1 1 1 7674
    1 1 1 1 1 0 685
    1 1 1 1 0 1 113
    1 1 1 1 0 0 24
    1 1 1 0 1 1 80
    1 1 1 0 1 0 18
    1 1 1 0 0 1 7
    1 1 1 0 0 0 12
    1 1 0 1 1 1 149
    1 1 0 1 1 0 62
    1 1 0 1 0 1 23
    1 1 0 1 0 0 19
    1 1 0 0 1 1 28
    1 1 0 0 1 0 35
    1 1 0 0 0 1 206
    1 1 0 0 0 0 467
    1 0 1 1 1 1 258
    1 0 1 1 1 0 190
    1 0 1 1 0 1 11
    1 0 1 1 0 0 23
    1 0 1 0 1 1 5
    1 0 1 0 1 0 12
    1 0 1 0 0 1 3
    1 0 1 0 0 0 25
    1 0 0 1 1 1 21
    1 0 0 1 1 0 42
    1 0 0 1 0 1 4
    1 0 0 1 0 0 49
    1 0 0 0 1 1 6
    1 0 0 0 1 0 32
    1 0 0 0 0 1 105
    1 0 0 0 0 0 769
    0 1 1 1 1 1 1458
    0 1 1 1 1 0 368
    0 1 1 1 0 1 54
    0 1 1 1 0 0 13
    0 1 1 0 1 1 29
    0 1 1 0 1 0 28
    0 1 1 0 0 1 7
    0 1 1 0 0 0 9
    0 1 0 1 1 1 71
    0 1 0 1 1 0 64
    0 1 0 1 0 1 21
    0 1 0 1 0 0 49
    0 1 0 0 1 1 13
    0 1 0 0 1 0 29
    0 1 0 0 0 1 155
    0 1 0 0 0 0 759
    0 0 1 1 1 1 206
    0 0 1 1 1 0 2809
    0 0 1 1 0 1 14
    0 0 1 1 0 0 402
    0 0 1 0 1 1 18
    0 0 1 0 1 0 547
    0 0 1 0 0 1 10
    0 0 1 0 0 0 387
    0 0 0 1 1 1 40
    0 0 0 1 1 0 1151
    0 0 0 1 0 1 9
    0 0 0 1 0 0 827
    0 0 0 0 1 1 6
    0 0 0 0 1 0 1246
    0 0 0 0 0 1 1187
    0 0 0 0 0 0 0
    end
    
    label var p "Phoenix"
    label var m "Musa"
    label var b "Brachypodium"
    label var s "Sorghum"
    label var o "Oryza"
    label var a "Arabidopsis"
    
    local intopts ylabel(,tlength(0.01) labgap(0.005)) ytitle(,titlegap(0.06))
    local setopts gap(0.08) ysize(0.2)
    
    upset_plot p m b s o a [fw=freq], xsize(*2) intopts(`intopts') setopts(`setopts')
    Click image for larger version

Name:	2026-05-23-upset_plot_DHont without banana.png
Views:	1
Size:	158.4 KB
ID:	1786181




    You can add additional elements to the graph via the addplot() option, although the scope for this is limited to immediate commands (e.g., twoway scatteri) and those that do not rely the underlying data (e.g., twoway function). Behind the scenes, upset_plot significantly rescales the data, so any added elements will likewise need to be rescaled – the details of this are briefly discussed in the help file.

    An example is given below in the spirit of D'Hont's Venn diagram.
    Code:
    local intopts ylabel(,tlength(0.01) labgap(0.005)) ytitle(,titlegap(0.06))
    local setopts gap(0.08) ysize(0.2)
    
    local addopts range(0.22 0.82) lwidth(vvthick)
    local addplot                                                                 ///
        (function y = 0.5 * (2*x - 0.9)^2 + 1.6, col(gold) `addopts')             ///
        (function y = 0.5 * (2.6*x - 1.3)^2 + 1.5, col(gold*.8) `addopts')        ///
        (function y = 0.7 * (2.4*x -1.2)^2 + 1.4, col(gold*.6) `addopts')         ///
        (scatteri 1.77 0.22, msymbol(O) col(brown) msize(vlarge))
    
    upset_plot p m b s o a [fw=freq], xsize(*2) intopts(`intopts') setopts(`setopts') addplot(`addplot')
    Click image for larger version

Name:	2026-05-23-upset_plot_DHont with banana.png
Views:	1
Size:	171.8 KB
ID:	1786182




    References
    1. Lex, A., Gehlenborg, N., Strobelt, H. et al. (2014). UpSet: Visualization of Intersecting Sets. IEEE transactions on visualization and computer graphics, 20(12), 1983–1992. https://doi.org/10.1109/TVCG.2014.2346248
    2. Emmons, AL., Mundorff, AZ., Hoeland, KM. et al. (2012) Postmortem Skeletal Microbial Community Composition and Function in Buried Human Remains. mSystems, 7(2). https://doi.org/10.1128/msystems.00041-22
    3. Jaun, F., Boesing, M., Lüthi-Corridori, G. et al. (2025). Clinical Remission in Severe Asthma: A Comparative Analysis of Patients with and Without Biologics from the Swiss Severe Asthma Registry. Biomedicines, 13(12), 3074. https://doi.org/10.3390/biomedicines13123074
    4. D’Hont, A., Denoeud, F., Aury, JM. et al. (2012) The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature, 488, 213–217. https://doi.org/10.1038/nature11241
    5. Neale, DB., Wegrzyn, JL., Stevens, KA. et al. (2014). Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies. Genome biology, 15(3), R59. https://doi.org/10.1186/gb-2014-15-3-r59
    Last edited by Dylan James-Taylor; 23 May 2026, 15:26.

  • #2
    A warm welcome to this! Note that upsetplot stops short of the frequency bar charts at bottom left of Dylan's examples. If they're important to you, head straight for the new command. Meanwhile https://journals.sagepub.com/doi/pdf...6867X241258010 is a write-up and an update is due in Stata Journal 26(2).

    Comment

    Working...
    X