Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • boxplots of two variables by different categorical variables

    I have a study where there are two timepoints where we have collected data (BTX2 (pre) and BTX3 (post)), one time point is prior to an intervention and one is after. I would like to plot the results in a boxplot showing the pre and post split first by treatment group (control vs. intervention) and then by study weight (normal vs overweight). So what i was envisioning is that pre and post would be designated in two different colors, which would be split into two side by side pairs of boxplot (control vs treatment) and then these would be plotted side by side grouped by normal vs overweight. See sketch below.
    Click image for larger version

Name:	boxplotsketch.jpg
Views:	1
Size:	475.8 KB
ID:	1703605

    But what I'm struggling with is the grouping of normal vs overweight. And I'm not sure if STATA can graph this way. Basically each time point (BTX2 and BTX3) have individual binary variables for weight (btxweight_2 and btxweight_3) where 0= normal weight and 1=overweight. As far as I can tell STATA will only let me graph by one weight category.
    graph box btx2 btx3, over(treat) over(btxweight_2)

    Is there a way in STATA to split the boxplot correlating the pre (BTX2) with the weight variable btxweight_2 and the post with its corresponding weight variable?
    I appreciate any input or suggestions on how else to represent the data in this way and thank you for your patience if this is a fairly easy thing to do that I am unaware as a new learner.

  • #2
    This can be done. You just need by() as well as over()

    Code:
    webuse nlswork, clear 
    
    rename (ln_wage union south c_city) (BTX when c_or_t weight)
    
    label def when 0 pre 1 post 
    label val when when 
    
    label def c_or_t 0 control 1 intervention 
    label val c_or_t c_or_t 
    
    label def weight 0 normal 1 overweight
    label val weight weight 
    
    separate BTX, by(when) veryshortlabel 
    
    graph box BTX? , over(c_or_t) by(weight, note("")) ytitle(BTX)

    You can start at the last line as you already have two outcome variables.

    See also https://www.statalist.org/forums/help#spelling

    Comment


    • #3
      thank you! also, apologies for incorrectly spelling Stata.
      just to clarify as i am a little lost in the example code. I have two outcome variables "btx2" and "btx3" and one variable "treat" (0 = control, 1 = treatment) with two variables of the weight corresponding to BTX2 and BTX3 (btxweight_2 and btxweight_3, with both having code 0=normal, 1=overweight).
      In the example code above, the weight seems to be one variable rather than two. how could i code the last line? it seems the code below would be wrong:
      graph box btx2 btx3, over(treat) by(btxweight_2 btxweight_3)

      or do you mean i should plot each outcome separate over "treat" by "btxweight_2"? like
      graph box btx2, over(treat) by(btxweight_2)
      in which case, is there a way to combine the graphs of pre intervention "btx2" and post intervention "btx3" if that is the only way to depict my sketch.

      Comment


      • #4
        Please give a data example. However, as an interim reply, you can run my code to study what it does. by() here calls up just one variable.

        Comment


        • #5
          thank you for being patient. I watched a tutorial on dataex to share my data and I think this may help show it.
          mine is longer and I've included the first 25 data points:
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          . dataex id treat btx2 btx3 btxweight_2 btxweight_3
          clear
          input int id double(treat btx2 btx3) float(btxweight_2 btxweight_3)
           1 0 12  3 1 1
           2 1  2  3 0 0
           3 0  2  0 1 1
           4 1 15 14 1 1
           5 0  1 10 1 1
           6 1  3  0 0 0
           7 0 16  4 1 1
           8 1  2  0 0 0
           9 0  3  3 0 1
          10 1  2 14 1 1
          11 0  8  0 0 0
          12 0 15  5 1 1
          13 1  6  3 0 0
          14 0  7  . 1 0
          15 0  2  3 0 0
          16 0  0  0 0 0
          17 1  1  1 1 1
          18 1  3  2 1 1
          19 0  7  3 1 1
          20 0  3  1 1 0
          21 0 13  3 1 1
          22 0  1  0 0 0
          23 1  3  6 0 0
          24 1  4 10 1 .
          25 0  1  2 1 1
          end
          label values treat labeltx
          label def labeltx 0 "Control", modify
          label def labeltx 1 "Treated", modify
          label values btxweight_2 btxwt2
          label def btxwt2 0 "Normal", modify
          label def btxwt2 1 "Overweight", modify
          label values btxweight_3 btxwt3
          label def btxwt3 0 "Normal", modify
          label def btxwt3 1 "Overweight", modify
          I have tried the code you have kindly shared. But my problem continues to be that I can run the code by( )only by one weight variable.

          Initially, by running the code:
          Code:
           graph box btx2 btx3, over(treat) over(btxweight_2)
          I am able to generate a graph similar to my sketch, however I worry I am mis-representing the data because I am only separating by the one weight variable. While overall the btxweight_2 and btxweight_3 variables are almost identical, there are some IDs that are different. The variables also represent two lab sampling times.

          I ran the code that you have suggested:
          Code:
           graph box btx2, over(treat) by(btxweight_2)
          Code:
           graph box btx3, over(treat) by(btxweight_3)
          Which depicts the data, but in two separate graphs. I am wondering if there is a way to represent it in one graph?

          Thank you again, and I hope I have posted a more helpful post of my question with examples.

          Comment


          • #6
            Thanks for the extra detail, which helps explain your puzzlement. But -- as you say -- for your data example, your two normal/over weight variables sometimes disagree but usually agree. However, they aren't identical.

            Your question isn't a matter of Stata programming but a question about your data and what you want which I can't answer for you. If you want both graphs then (1) you have a more complicated problem than my code will tackle, but check out graph combine (2) you need better text on your combined graph to explain the difference.

            Code:
            . tab btxw* , missing 
            
            btxweight_ |           btxweight_3
                     2 |         0          1          . |     Total
            -----------+---------------------------------+----------
                     0 |         9          1          0 |        10 
                     1 |         2         12          1 |        15 
            -----------+---------------------------------+----------
                 Total |        11         13          1 |        25

            Comment


            • #7
              Thank you for your help! I will look at graph combine function. It may be that this is best left to two different graphs than a combined one as it is hard to represent the data together.

              Comment

              Working...
              X