Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combine box plots sharing one common y-axis

    hi everyone,

    I would like to graph a box plot with two boxes within it. Moreover, because my data is kind of binary distributed, I only want to use part them by using if statement.

    The data looks as follows:
    time_point A_value A_label B_value B_label
    point 1 1 1 0.1 .
    point 2 0.6 1 0.2 .
    point 3 0.3 1 0.1 1
    point 4 0.2 . 0.1 1
    point 5 0.9 . 0 1
    point 6 0 1 0 1
    point 7 0.7 . 0.9 1
    point 8 0.3 1 1 1
    point 9 0.2 1 0.8 1
    point 10 0.2 . 0.7 1
    point 11 0.1 1 0.3 1
    point 12 0.1 1 0.3 1
    point 13 1 1 1 .
    point 14 0.8 1 0.6 .
    The code I used for making the plots is very straightforward:
    Code:
    graph box  A_value if  A_value>0.6& A_value!=.& label_A!=., ytitle("A_value") name(g1, replace ) noout
    graph box  B_value if  B_value>0.6& B_value!=.& label_B!=., ytitle("B_value") name(g2, replace ) noout
    graph combine g1 g2, row(1) ycom
    //The value ranges from 0~1, but I only want to plot values greater than 0.6 and happen to be "labeled" (i.e. in the label columns is labeled as 1).
    The plot looks like:
    untitled.png





    Because the A_value and B_value is equivalent, I only need one Y-axis for both boxes. But the code above will generate two Y-axes. Please give me some advice on this problem. Thanks so much.
    Last edited by Benny Hsieh; 11 Feb 2016, 03:20.

  • #2
    Thanks for providing a clear data example. I'd restructure the data at least temporarily. Then excluding values you don't want to show is more easily done. Presumably the real data are much more interesting than the example implies, but it seems possible that you can produce a graph much more informative than two box plots side by side (which is becoming the most over-rated form of display in statistical science).

    Code:
    clear
    input time_point    A_value    A_label    B_value    B_label
    1    1    1    0.1    .
    2    0.6    1    0.2    .
    3    0.3    1    0.1    1
    4    0.2    .    0.1    1
    5    0.9    .    0    1
    6    0    1    0    1
    7    0.7    .    0.9    1
    8    0.3    1    1    1
    9    0.2    1    0.8    1
    10    0.2        0.7    1
    11    0.1    1    0.3    1
    12    0.1    1    0.3    1
    13    1    1    1    .
    14    0.8    1    0.6    .
    end
    
    preserve
    
    rename (A_value B_value A_label B_label) (valueA valueB labelA  labelB)  
    reshape long value label, i(time_point) j(which) string
    drop if missing(value, label)
    keep if value > 0.6
    graph box value, over(which)
    * ssc inst stripplot
    stripplot value, over(which) box centre vertical cumul xla(, noticks) xtitle("") 
    
    save reshaped
    restore
    http://www.stata-journal.com/article...article=gr0062 may also be relevant to people interested in this thread.
    Last edited by Nick Cox; 11 Feb 2016, 03:38.

    Comment


    • #3
      Hi Nick,

      Thanks for the detailed reply including the example code. Now I know restructuring the data is probably the easiest way. And you are right, I think the real data is more interesting. From the two-way kdensity plot, I found all these samples are binary distributed (i.e. valueA, valueB, valueC, valueD....etc.), but the width of their shoulders are different. So, I tried to look at the data by using box plot. Thanks for your help again.

      Comment


      • #4
        Originally posted by Nick Cox View Post
        Thanks for providing a clear data example. I'd restructure the data at least temporarily. Then excluding values you don't want to show is more easily done. Presumably the real data are much more interesting than the example implies, but it seems possible that you can produce a graph much more informative than two box plots side by side (which is becoming the most over-rated form of display in statistical science).

        Code:
        clear
        input time_point A_value A_label B_value B_label
        1 1 1 0.1 .
        2 0.6 1 0.2 .
        3 0.3 1 0.1 1
        4 0.2 . 0.1 1
        5 0.9 . 0 1
        6 0 1 0 1
        7 0.7 . 0.9 1
        8 0.3 1 1 1
        9 0.2 1 0.8 1
        10 0.2 0.7 1
        11 0.1 1 0.3 1
        12 0.1 1 0.3 1
        13 1 1 1 .
        14 0.8 1 0.6 .
        end
        
        preserve
        
        rename (A_value B_value A_label B_label) (valueA valueB labelA labelB)
        reshape long value label, i(time_point) j(which) string
        drop if missing(value, label)
        keep if value > 0.6
        graph box value, over(which)
        * ssc inst stripplot
        stripplot value, over(which) box centre vertical cumul xla(, noticks) xtitle("")
        
        save reshaped
        restore
        http://www.stata-journal.com/article...article=gr0062 may also be relevant to people interested in this thread.
        Hi Nick,

        Is there a way that I can make a box plot just over part of my variables?
        For example, after restructuring my data, "which" variable contains A, B, C and D four kinds, Is there a way that I can plot a box plot only over A and C without subsidizing data permanently?
        For now, I want to simplify my plots by comparing 2~3 kinds each time. But by using over (which), it can plot all kinds of groups. Thanks for your help.

        Comment


        • #5
          Answering my own question:

          graph box value if (which=="A"|which=="C"), over(which)

          Comment

          Working...
          X