Combine box plots sharing one common y-axis

Benny Hsieh

Join Date: Feb 2016
Posts: 5

Combine box plots sharing one common y-axis

11 Feb 2016, 03:09

hi everyone,

I would like to graph a box plot with two boxes within it. Moreover, because my data is kind of binary distributed, I only want to use part them by using if statement.

The data looks as follows:

time_point	A_value	A_label	B_value	B_label
point 1	1	1	0.1	.
point 2	0.6	1	0.2	.
point 3	0.3	1	0.1	1
point 4	0.2	.	0.1	1
point 5	0.9	.	0	1
point 6	0	1	0	1
point 7	0.7	.	0.9	1
point 8	0.3	1	1	1
point 9	0.2	1	0.8	1
point 10	0.2	.	0.7	1
point 11	0.1	1	0.3	1
point 12	0.1	1	0.3	1
point 13	1	1	1	.
point 14	0.8	1	0.6	.

The code I used for making the plots is very straightforward:

Code:

graph box  A_value if  A_value>0.6& A_value!=.& label_A!=., ytitle("A_value") name(g1, replace ) noout
graph box  B_value if  B_value>0.6& B_value!=.& label_B!=., ytitle("B_value") name(g2, replace ) noout
graph combine g1 g2, row(1) ycom
//The value ranges from 0~1, but I only want to plot values greater than 0.6 and happen to be "labeled" (i.e. in the label columns is labeled as 1).

The plot looks like:

Because the A_value and B_value is equivalent, I only need one Y-axis for both boxes. But the code above will generate two Y-axes. Please give me some advice on this problem. Thanks so much.

Last edited by Benny Hsieh; 11 Feb 2016, 03:20.

Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35731

11 Feb 2016, 03:31

Thanks for providing a clear data example. I'd restructure the data at least temporarily. Then excluding values you don't want to show is more easily done. Presumably the real data are much more interesting than the example implies, but it seems possible that you can produce a graph much more informative than two box plots side by side (which is becoming the most over-rated form of display in statistical science).

Code:

clear
input time_point    A_value    A_label    B_value    B_label
1    1    1    0.1    .
2    0.6    1    0.2    .
3    0.3    1    0.1    1
4    0.2    .    0.1    1
5    0.9    .    0    1
6    0    1    0    1
7    0.7    .    0.9    1
8    0.3    1    1    1
9    0.2    1    0.8    1
10    0.2        0.7    1
11    0.1    1    0.3    1
12    0.1    1    0.3    1
13    1    1    1    .
14    0.8    1    0.6    .
end

preserve

rename (A_value B_value A_label B_label) (valueA valueB labelA  labelB)  
reshape long value label, i(time_point) j(which) string
drop if missing(value, label)
keep if value > 0.6
graph box value, over(which)
* ssc inst stripplot
stripplot value, over(which) box centre vertical cumul xla(, noticks) xtitle("") 

save reshaped
restore

http://www.stata-journal.com/article...article=gr0062 may also be relevant to people interested in this thread.

Last edited by Nick Cox; 11 Feb 2016, 03:38.

Comment

Benny Hsieh

Join Date: Feb 2016

Posts: 5
#3

11 Feb 2016, 12:13

Hi Nick,

Thanks for the detailed reply including the example code. Now I know restructuring the data is probably the easiest way. And you are right, I think the real data is more interesting. From the two-way kdensity plot, I found all these samples are binary distributed (i.e. valueA, valueB, valueC, valueD....etc.), but the width of their shoulders are different. So, I tried to look at the data by using box plot. Thanks for your help again.
Comment
Benny Hsieh

Join Date: Feb 2016

Posts: 5
#4

15 Feb 2016, 20:02

Originally posted by Nick Cox View Post

Thanks for providing a clear data example. I'd restructure the data at least temporarily. Then excluding values you don't want to show is more easily done. Presumably the real data are much more interesting than the example implies, but it seems possible that you can produce a graph much more informative than two box plots side by side (which is becoming the most over-rated form of display in statistical science).

Code:

clear input time_point A_value A_label B_value B_label 1 1 1 0.1 . 2 0.6 1 0.2 . 3 0.3 1 0.1 1 4 0.2 . 0.1 1 5 0.9 . 0 1 6 0 1 0 1 7 0.7 . 0.9 1 8 0.3 1 1 1 9 0.2 1 0.8 1 10 0.2 0.7 1 11 0.1 1 0.3 1 12 0.1 1 0.3 1 13 1 1 1 . 14 0.8 1 0.6 . end preserve rename (A_value B_value A_label B_label) (valueA valueB labelA labelB) reshape long value label, i(time_point) j(which) string drop if missing(value, label) keep if value > 0.6 graph box value, over(which) * ssc inst stripplot stripplot value, over(which) box centre vertical cumul xla(, noticks) xtitle("") save reshaped restore

http://www.stata-journal.com/article...article=gr0062 may also be relevant to people interested in this thread.

Hi Nick,

Is there a way that I can make a box plot just over part of my variables?
For example, after restructuring my data, "which" variable contains A, B, C and D four kinds, Is there a way that I can plot a box plot only over A and C without subsidizing data permanently?
For now, I want to simplify my plots by comparing 2~3 kinds each time. But by using over (which), it can plot all kinds of groups. Thanks for your help.
Comment
Benny Hsieh

Join Date: Feb 2016

Posts: 5
#5

15 Feb 2016, 22:24

Answering my own question:

graph box value if (which=="A"|which=="C"), over(which)
Comment

Announcement

Combine box plots sharing one common y-axis

Comment

Comment

Comment

Comment