Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Scale break with box plot

    I have a dataset with 79 observations. I am using Stata 13.1
    I want to plot two variables
    The indépendant variable is called pYAP and is on the x axis and dichotomized. The second variable is continuous, called AFP, on the y axis.

    Code:
    graph box AFP, over(pyap_dic) title(AFP and pYAP)
    and I obtained this graph:

    Click image for larger version

Name:	Graph_pyap-AFP-1.png
Views:	1
Size:	42.9 KB
ID:	1379325

    There is a big gap between 1500 and 4200 so I would like to create a scale break before showing the two "outliers".
    I did read http://www.stata.com/support/faqs/gr.../scale-breaks/ and other previous posts on scale breaks. My understanding from those websites is that although scale breaks are not recommended, they can still be created.

    so I tried:
    Code:
    gen AFP_break = cond(AFP == 4000, 0, AFP)
    label def AFP_break 0 "4000"
    label val AFP_break AFP_break
    label var AFP_break AFP
    graph box AFP_break, over(pyap_dic) ylabel (0 3000 7000, valuelabel) title(AFP and pYAP) yline(4000)
    And it gave me this:

    Click image for larger version

Name:	Graph.png
Views:	1
Size:	62.5 KB
ID:	1379326

    Which is not what I want. I only want the scale to be shorten between 1500 and 4200.

    Is it possible to do so?

    Thank you,

    JM Giard

  • #2
    I'd underline the "not recommended" despite my role in the FAQ cited. I can't believe that there isn't a much better graph than either of the graphs you've shown, but concrete recommendations would depend on the details of the data, including whether there are any exact zeros. If you could post the data, that would ease suggestions. .

    Comment


    • #3
      Thank you Mr Cox.

      Here are my data:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input double AFP float pyap_dic
         . 0
       3.3 0
       117 1
        74 0
         9 1
        34 0
         . 0
         6 0
         1 1
         7 1
       631 1
       550 1
         . 0
        26 1
        80 1
       489 0
        24 0
        31 0
         . 0
         6 0
        11 0
        28 1
         5 0
        34 1
       602 1
         . 1
         3 1
         . 0
         . 0
         . 1
         . 0
         8 1
       628 0
         . 0
         . 1
         6 1
         . 1
         3 1
         6 1
         . 0
         7 0
         3 0
        36 0
        67 0
         7 0
         9 0
       374 0
      6199 1
        10 0
         7 0
         2 0
         4 0
         9 0
         2 0
         . 0
         3 1
         . 0
         . 0
         9 0
         . 0
        32 1
         5 1
         . 0
         6 0
       128 1
       162 1
         6 0
       406 0
         . 0
         5 0
         . 0
         . 0
         . 0
         . 0
        13 1
         5 0
         . 0
       232 0
      1337 1
        11 0
         6 0
         6 0
         . 0
         9 0
       125 1
         1 0
         4 1
         . 1
         3 0
         3 0
         . 0
        16 0
       198 1
         . 0
         . 1
      4370 1
         9 0
         6 1
         8 0
        10 0
      end
      label values pyap_dic pyaplabel
      label def pyaplabel 0 "Less than 2.5", modify
      label def pyaplabel 1 "2.5 or higher", modify
      Thank you again,

      JM Giard

      Comment


      • #4
        Thanks for the data. I used stripplot (SSC). See for example http://www.statalist.org/forums/foru...updated-on-ssc

        I used a quantile-box plot: see the help for several references. With a variation in the response from 1 to 6199 and severe skewness, a logarithmic scale is surely indicated. (For the record, I also tried a negative reciprocal scale, but it goes too far.)

        The FAQ http://www.stata.com/support/faqs/gr...ithmic-scales/ outlines at length how the criterion of plotting points individually if beyond upper quartile + 1.5 IQR or lower quartile - 1.5 IQR meshes awkwardly with logarithmic scale. I now increasingly favour plotting whiskers to selected percentiles for which it is helpful that log of percentile = percentile of logs.

        With just two groups, as here, there is space for much more detail than a standard box plot shows, without it becoming burdensome.

        Code:
        stripplot AFP, over(pyap_dic) vertical box pctile(5) cumul cumprob centre ysc(log) yla(1 2 5 10 20 50 100 200 500 1000 2000 5000, ang(h)) xla(, noticks) scheme(s1color)
        Click image for larger version

Name:	giard.png
Views:	1
Size:	12.2 KB
ID:	1379557

        Comment


        • #5
          I've given several similar answers to earlier threads on Statalist. Further, Googling

          Code:
          quantile-box plot site:stats.stackexchange.com
          will point to further examples (and give some false positives).
          Last edited by Nick Cox; 22 Mar 2017, 07:19.

          Comment


          • #6
            Thank you very much Mr Cox. I learned a new command and it will be very helpful in the future for other projects.

            Comment

            Working...
            X