Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Several boxplots in an x-y plot

    Hi All,

    I need to create several boxplots for different bins that include a range of values in v1. The boxplots show the mean, median and spread of v2 in that range. Then I need to graph them in an x-y plot where v2 is on the y axis and v1 bins are on the x axis. I need one boxplot for negative v1s (so they all should be in one bin), one boxplot for missing v1s, one boxplot for v1s in 0-50, one for v1s in 50-100, one for v1 in 100-150, all the way up to 350-400.

    Here is a sample of my data:

    Code:
    clear
    input int v1 int v2
    
    .    8.33
    .    17
    .    2.94
    .    12.3
    .    7.87
    .    10.2
    87    7.01
    162    10.3
    162    7.23
    -74.09    11.4
    -74.09    9.8
    42    5.41
    42    11.7
    42    5.81
    42    18.9
    42    6.58
    79    14.4
    79    7.41
    79    9.84
    79    11.1
    149    10.1
    -74.09    9.01
    -74.09    10.7
    -74.09    10.1
    37    11.3
    37    10.4
    37    7.14
    -23    8.55
    -23    9.55
    -23    11.9
    -172    0
    -74.09   9.46
    -74.09    9.01
    -74.09    5.41
    56    9.78
    56    7.69
    56    11.5
    56    13.3
    82    9.38
    82    17.9
    82    7.81
    82    6.35
    62    4.44
    62    9.47
    -74    5.41
    -74    15.3
    -97    8.14
    -97    6.19
    -97    6.9
    -84    8.99
    -84    6.67
    -133    12.1
    -133    3.31
    -133    8.5
    -93    2.63
    -74    5.88
    104    6.6
    104    10.9
    104    8.4
    104    11.1
    82    10.6
    82    9.47
    82    12.6
    82    8
    44    4.94
    44    5.83
    -74.09   8.42
    -74.09    6.95
    -31    4.44
    -31    11.7
    -31    12
    -31    5.65
    150    9.2
    150    1.23
    81    11.3
    
    end
    I really appreciate your help

  • #2
    Be aware that you also have negative values for v1.
    Try something like this:
    Code:
    clear
    input int v1 int v2
    
    .    8.33
    .    17
    .    2.94
    .    12.3
    .    7.87
    .    10.2
    87    7.01
    162    10.3
    162    7.23
    -74.09    11.4
    -74.09    9.8
    42    5.41
    42    11.7
    42    5.81
    42    18.9
    42    6.58
    79    14.4
    79    7.41
    79    9.84
    79    11.1
    149    10.1
    -74.09    9.01
    -74.09    10.7
    -74.09    10.1
    37    11.3
    37    10.4
    37    7.14
    -23    8.55
    -23    9.55
    -23    11.9
    -172    0
    -74.09   9.46
    -74.09    9.01
    -74.09    5.41
    56    9.78
    56    7.69
    56    11.5
    56    13.3
    82    9.38
    82    17.9
    82    7.81
    82    6.35
    62    4.44
    62    9.47
    -74    5.41
    -74    15.3
    -97    8.14
    -97    6.19
    -97    6.9
    -84    8.99
    -84    6.67
    -133    12.1
    -133    3.31
    -133    8.5
    -93    2.63
    -74    5.88
    104    6.6
    104    10.9
    104    8.4
    104    11.1
    82    10.6
    82    9.47
    82    12.6
    82    8
    44    4.94
    44    5.83
    -74.09   8.42
    -74.09    6.95
    -31    4.44
    -31    11.7
    -31    12
    -31    5.65
    150    9.2
    150    1.23
    81    11.3
    end
    
    
    recode v1 (.=1 "missing")(min/-1=2 ">0")(0=3 "0")(0/50=4 "0-50") (50/100=5 "50-100")(100/300=6 "100-300")(300/max=7 "<300"), g(x_axis)
    graph box v2, over(x_axis)

    Comment


    • #3
      How do you intend to show the mean and median of the distribution in the same box? If the data are normally distributed or if the departure from normality is still relatively symmetrical the mean and median would likely be indistinguishable if plotted in the same box (unless you're able to have an extremely oversized graph so minimal differences could be perceived). Can you define what you mean by spread? Do you mean the minimum and maximum values, the difference between those values, or something else? Do you need to include the hinges as well or are you only looking for the box itself?

      When you say one box plot per category, do you literally mean a separate image with a single box, or do you mean to say that you need a box plot of v2 conditional on the categories of v1 (e.g., several boxes in a single image)?

      Do you have access to Michael Mitchell or Nick Cox's books on Stata graphics? If not, I'd recommend getting copies. The information in them may not all be directly related to your current problem, but will give you a better understanding of how graphics in Stata work more generally and how to manipulate them as you see fit.

      Comment


      • #4
        Thanks Oded. I'll try it.

        wbuchanan, sorry I was confusing. What I'm looking for is a regular boxplot. Including mean in the description was just a careless mistake and by spread I mean the exact information we can get from a boxplot, range and interquartile range. What I need is a y-x plot in which there are several boxplots beside each other so probably would be the second one you described.

        Sorry, for confusing you and thanks for the book recommendation. I will look at them.

        Comment


        • #5
          Note that your bin boundaries such as 0-50 and 50-100 are ambiguous: what happens with 50? Here I use [ ) intervals.

          I spent a few decades telling people they should draw box plots routinely and the last decade or so telling them (usually not the same people) that box plots are oversold unless you have many, many groups: it is usually a better idea to show more detail than box plots usually do.

          In that spirit here are some quantile-box plots showing the usual boxes based on median and quartiles, PLUS all the data too, PLUS reference lines showing the means. Various other posts in the archive use a similar design. You need to install stripplot from SSC to run this.

          Code:
          set scheme s1color
          clear
          input int v1 int v2
          .    8.33
          .    17
          .    2.94
          .    12.3
          .    7.87
          .    10.2
          87    7.01
          162    10.3
          162    7.23
          -74.09    11.4
          -74.09    9.8
          42    5.41
          42    11.7
          42    5.81
          42    18.9
          42    6.58
          79    14.4
          79    7.41
          79    9.84
          79    11.1
          149    10.1
          -74.09    9.01
          -74.09    10.7
          -74.09    10.1
          37    11.3
          37    10.4
          37    7.14
          -23    8.55
          -23    9.55
          -23    11.9
          -172    0
          -74.09   9.46
          -74.09    9.01
          -74.09    5.41
          56    9.78
          56    7.69
          56    11.5
          56    13.3
          82    9.38
          82    17.9
          82    7.81
          82    6.35
          62    4.44
          62    9.47
          -74    5.41
          -74    15.3
          -97    8.14
          -97    6.19
          -97    6.9
          -84    8.99
          -84    6.67
          -133    12.1
          -133    3.31
          -133    8.5
          -93    2.63
          -74    5.88
          104    6.6
          104    10.9
          104    8.4
          104    11.1
          82    10.6
          82    9.47
          82    12.6
          82    8
          44    4.94
          44    5.83
          -74.09   8.42
          -74.09    6.95
          -31    4.44
          -31    11.7
          -31    12
          -31    5.65
          150    9.2
          150    1.23
          81    11.3
          end
          gen binned_v1 = cond(missing(v1), -2, cond(v1 < 0, -1, floor(v1/50)))
          label def binned_v1 -2 "missing" -1 "negative"
          
          forval i = 0/8 {
              local I = 50 * `i'    
                label def binned_v1 `i' "`I'+", modify
          }
          
          label val binned_v1 binned_v1
          
          stripplot v2, over(binned_v1) vertical cumul cumprob box(blcolor(gs8)) refline centre ///
          mc(green) xla(, noticks) yla(, ang(h)) xtitle(classes of v1) ytitle(, orient(horizontal)) xsc(titlegap(*5))
          Click image for larger version

Name:	qboxplot3.png
Views:	1
Size:	12.6 KB
ID:	1308250

          Comment


          • #6
            Thank you very much Nick. I really appreciate it.

            Comment

            Working...
            X