boxplots of two variables by different categorical variables

Yue Qiu

Join Date: Aug 2022

Posts: 10
#1

boxplots of two variables by different categorical variables

27 Feb 2023, 06:21

I have a study where there are two timepoints where we have collected data (BTX2 (pre) and BTX3 (post)), one time point is prior to an intervention and one is after. I would like to plot the results in a boxplot showing the pre and post split first by treatment group (control vs. intervention) and then by study weight (normal vs overweight). So what i was envisioning is that pre and post would be designated in two different colors, which would be split into two side by side pairs of boxplot (control vs treatment) and then these would be plotted side by side grouped by normal vs overweight. See sketch below.

But what I'm struggling with is the grouping of normal vs overweight. And I'm not sure if STATA can graph this way. Basically each time point (BTX2 and BTX3) have individual binary variables for weight (btxweight_2 and btxweight_3) where 0= normal weight and 1=overweight. As far as I can tell STATA will only let me graph by one weight category.
graph box btx2 btx3, over(treat) over(btxweight_2)

Is there a way in STATA to split the boxplot correlating the pre (BTX2) with the weight variable btxweight_2 and the post with its corresponding weight variable?
I appreciate any input or suggestions on how else to represent the data in this way and thank you for your patience if this is a fairly easy thing to do that I am unaware as a new learner.
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35710

27 Feb 2023, 07:41

This can be done. You just need by() as well as over()

Code:

webuse nlswork, clear 

rename (ln_wage union south c_city) (BTX when c_or_t weight)

label def when 0 pre 1 post 
label val when when 

label def c_or_t 0 control 1 intervention 
label val c_or_t c_or_t 

label def weight 0 normal 1 overweight
label val weight weight 

separate BTX, by(when) veryshortlabel 

graph box BTX? , over(c_or_t) by(weight, note("")) ytitle(BTX)

You can start at the last line as you already have two outcome variables.

See also https://www.statalist.org/forums/help#spelling

Comment

Yue Qiu

Join Date: Aug 2022

Posts: 10
#3

27 Feb 2023, 12:05

thank you! also, apologies for incorrectly spelling Stata.
just to clarify as i am a little lost in the example code. I have two outcome variables "btx2" and "btx3" and one variable "treat" (0 = control, 1 = treatment) with two variables of the weight corresponding to BTX2 and BTX3 (btxweight_2 and btxweight_3, with both having code 0=normal, 1=overweight).
In the example code above, the weight seems to be one variable rather than two. how could i code the last line? it seems the code below would be wrong:
graph box btx2 btx3, over(treat) by(btxweight_2 btxweight_3)

or do you mean i should plot each outcome separate over "treat" by "btxweight_2"? like
graph box btx2, over(treat) by(btxweight_2)
in which case, is there a way to combine the graphs of pre intervention "btx2" and post intervention "btx3" if that is the only way to depict my sketch.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35710
#4

27 Feb 2023, 12:32

Please give a data example. However, as an interim reply, you can run my code to study what it does. by() here calls up just one variable.
Comment

Yue Qiu

Join Date: Aug 2022
Posts: 10

28 Feb 2023, 09:39

thank you for being patient. I watched a tutorial on dataex to share my data and I think this may help show it.
mine is longer and I've included the first 25 data points:

Code:

* Example generated by -dataex-. For more info, type help dataex
. dataex id treat btx2 btx3 btxweight_2 btxweight_3
clear
input int id double(treat btx2 btx3) float(btxweight_2 btxweight_3)
 1 0 12  3 1 1
 2 1  2  3 0 0
 3 0  2  0 1 1
 4 1 15 14 1 1
 5 0  1 10 1 1
 6 1  3  0 0 0
 7 0 16  4 1 1
 8 1  2  0 0 0
 9 0  3  3 0 1
10 1  2 14 1 1
11 0  8  0 0 0
12 0 15  5 1 1
13 1  6  3 0 0
14 0  7  . 1 0
15 0  2  3 0 0
16 0  0  0 0 0
17 1  1  1 1 1
18 1  3  2 1 1
19 0  7  3 1 1
20 0  3  1 1 0
21 0 13  3 1 1
22 0  1  0 0 0
23 1  3  6 0 0
24 1  4 10 1 .
25 0  1  2 1 1
end
label values treat labeltx
label def labeltx 0 "Control", modify
label def labeltx 1 "Treated", modify
label values btxweight_2 btxwt2
label def btxwt2 0 "Normal", modify
label def btxwt2 1 "Overweight", modify
label values btxweight_3 btxwt3
label def btxwt3 0 "Normal", modify
label def btxwt3 1 "Overweight", modify

I have tried the code you have kindly shared. But my problem continues to be that I can run the code by( )only by one weight variable.

Initially, by running the code:

Code:

 graph box btx2 btx3, over(treat) over(btxweight_2)

I am able to generate a graph similar to my sketch, however I worry I am mis-representing the data because I am only separating by the one weight variable. While overall the btxweight_2 and btxweight_3 variables are almost identical, there are some IDs that are different. The variables also represent two lab sampling times.

I ran the code that you have suggested:

Code:

 graph box btx2, over(treat) by(btxweight_2)

Code:

 graph box btx3, over(treat) by(btxweight_3)

Which depicts the data, but in two separate graphs. I am wondering if there is a way to represent it in one graph?

Thank you again, and I hope I have posted a more helpful post of my question with examples.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35710
#6

28 Feb 2023, 09:58

Thanks for the extra detail, which helps explain your puzzlement. But -- as you say -- for your data example, your two normal/over weight variables sometimes disagree but usually agree. However, they aren't identical.

Your question isn't a matter of Stata programming but a question about your data and what you want which I can't answer for you. If you want both graphs then (1) you have a more complicated problem than my code will tackle, but check out graph combine (2) you need better text on your combined graph to explain the difference.

Code:

. tab btxw* , missing btxweight_ | btxweight_3 2 | 0 1 . | Total -----------+---------------------------------+---------- 0 | 9 1 0 | 10 1 | 2 12 1 | 15 -----------+---------------------------------+---------- Total | 11 13 1 | 25
Comment
Yue Qiu

Join Date: Aug 2022

Posts: 10
#7

28 Feb 2023, 14:55

Thank you for your help! I will look at graph combine function. It may be that this is best left to two different graphs than a combined one as it is hard to represent the data together.
Comment

Announcement