Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • stripplot with deciles

    Dear Statalist:

    I'm playing around with the excellent user-written command -stripplot- by Nick Cox (I hope I did that right).

    I've gotten most of what I want but I'm trying to put deciles on it instead of a box plot. This is non-standard, but I think it probably gives a better sense of what is going on with the data.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(income female)
     95704.73 0
     81228.63 0
     87905.97 0
     82703.98 0
     75585.75 0
     57527.77 0
     86251.52 0
     73205.29 0
     83702.41 0
     88789.44 0
     74163.18 0
     68735.63 0
     75812.84 0
     60658.98 0
     63960.63 0
      66904.6 0
     76574.77 0
    73425.234 0
     64538.88 0
    37272.945 0
     65383.36 0
     76498.24 0
     83368.27 0
      69704.7 0
     61884.38 0
     63960.63 0
    71255.195 0
     71899.38 0
     60416.83 0
     61514.18 0
     57355.45 0
     60780.42 0
    73132.125 0
     65514.25 0
     52629.02 0
    35597.227 0
      77965.6 0
    101724.45 0
      79064.8 0
     70264.58 0
     82373.82 0
     96377.02 0
      73498.7 0
     74833.66 0
     63641.63 0
     73719.52 0
     70899.81 0
     63324.21 0
     73278.53 0
      73498.7 0
     70264.58 0
     34683.63 0
     59815.67 0
     47907.29 0
     52471.38 0
     51329.61 0
      46771.2 0
     39223.17 0
     70194.34 0
     79143.91 0
     56388.64 0
     59935.43 0
     49316.95 0
     59339.05 0
     77887.68 0
     57931.87 0
     65122.34 0
     45343.54 0
     65842.65 0
     64024.62 0
     62443.85 0
     78356.41 0
     102849.6 0
     66770.92 0
      46213.3 0
     66238.88 0
     82456.23 0
      80823.5 0
     76498.24 0
      68598.3 0
    66637.516 0
     57126.48 0
     64862.38 0
     59995.39 0
     53799.69 0
      68598.3 0
     65383.36 0
    70828.945 0
      72549.4 0
     42532.47 0
      51073.6 0
     63197.69 0
     56332.28 0
    66504.375 0
      59220.5 0
      61452.7 0
     60537.79 0
    66837.734 0
      61146.2 0
     57585.32 0
    end

    And then I ran something like this:

    Code:
    stripplot income, over(female) vertical ms(oh) mc(red%30)  msize(tiny) jitter(10) refline(lw(medthick)) reflinestretch(0.4)
    I've been trying to do something with the option reflevel (with either pctile or xtile), and maybe there's a way to do something with the box option? But I'm increasingly thinking I will need to lay another graph over this. Does anyone have an idea of how to do this?

    Thanks,
    Jonathan

  • #2
    stripplot does not offer any specific support for display of deciles. It’s hard for me to imagine any display of deciles that wouldn’t be very crowded somewhere.

    Comment


    • #3
      You can do things like this with qplot from the Stata Journal.

      Code:
      . sysuse auto, clear
      (1978 automobile data)
      
      . xtile mpgd1=mpg if foreign, nq(10)
      
      . xtile mpgd0=mpg if !foreign, nq(10)
      
      . gen mpgd = min(mpgd0, mpgd1)
      
      . sort foreign mpg
      
      . l foreign mpg*
      
           +---------------------------------------+
           |  foreign   mpg   mpgd1   mpgd0   mpgd |
           |---------------------------------------|
        1. | Domestic    12       .       1      1 |
        2. | Domestic    12       .       1      1 |
        3. | Domestic    14       .       1      1 |
        4. | Domestic    14       .       1      1 |
        5. | Domestic    14       .       1      1 |
           |---------------------------------------|
        6. | Domestic    14       .       1      1 |
        7. | Domestic    14       .       1      1 |
        8. | Domestic    15       .       2      2 |
        9. | Domestic    15       .       2      2 |
       10. | Domestic    16       .       2      2 |
           |---------------------------------------|
       11. | Domestic    16       .       2      2 |
       12. | Domestic    16       .       2      2 |
       13. | Domestic    16       .       2      2 |
       14. | Domestic    17       .       3      3 |
       15. | Domestic    17       .       3      3 |
           |---------------------------------------|
       16. | Domestic    18       .       3      3 |
       17. | Domestic    18       .       3      3 |
       18. | Domestic    18       .       3      3 |
       19. | Domestic    18       .       3      3 |
       20. | Domestic    18       .       3      3 |
           |---------------------------------------|
       21. | Domestic    18       .       3      3 |
       22. | Domestic    18       .       3      3 |
       23. | Domestic    19       .       5      5 |
       24. | Domestic    19       .       5      5 |
       25. | Domestic    19       .       5      5 |
           |---------------------------------------|
       26. | Domestic    19       .       5      5 |
       27. | Domestic    19       .       5      5 |
       28. | Domestic    19       .       5      5 |
       29. | Domestic    19       .       5      5 |
       30. | Domestic    19       .       5      5 |
           |---------------------------------------|
       31. | Domestic    20       .       6      6 |
       32. | Domestic    20       .       6      6 |
       33. | Domestic    20       .       6      6 |
       34. | Domestic    21       .       7      7 |
       35. | Domestic    21       .       7      7 |
           |---------------------------------------|
       36. | Domestic    21       .       7      7 |
       37. | Domestic    22       .       7      7 |
       38. | Domestic    22       .       7      7 |
       39. | Domestic    22       .       7      7 |
       40. | Domestic    22       .       7      7 |
           |---------------------------------------|
       41. | Domestic    22       .       7      7 |
       42. | Domestic    24       .       8      8 |
       43. | Domestic    24       .       8      8 |
       44. | Domestic    24       .       8      8 |
       45. | Domestic    25       .       9      9 |
           |---------------------------------------|
       46. | Domestic    26       .       9      9 |
       47. | Domestic    26       .       9      9 |
       48. | Domestic    28       .      10     10 |
       49. | Domestic    28       .      10     10 |
       50. | Domestic    29       .      10     10 |
           |---------------------------------------|
       51. | Domestic    30       .      10     10 |
       52. | Domestic    34       .      10     10 |
       53. |  Foreign    14       1       .      1 |
       54. |  Foreign    17       1       .      1 |
       55. |  Foreign    17       1       .      1 |
           |---------------------------------------|
       56. |  Foreign    18       2       .      2 |
       57. |  Foreign    18       2       .      2 |
       58. |  Foreign    21       3       .      3 |
       59. |  Foreign    21       3       .      3 |
       60. |  Foreign    23       4       .      4 |
           |---------------------------------------|
       61. |  Foreign    23       4       .      4 |
       62. |  Foreign    23       4       .      4 |
       63. |  Foreign    24       5       .      5 |
       64. |  Foreign    25       6       .      6 |
       65. |  Foreign    25       6       .      6 |
           |---------------------------------------|
       66. |  Foreign    25       6       .      6 |
       67. |  Foreign    25       6       .      6 |
       68. |  Foreign    26       7       .      7 |
       69. |  Foreign    28       8       .      8 |
       70. |  Foreign    30       8       .      8 |
           |---------------------------------------|
       71. |  Foreign    31       9       .      9 |
       72. |  Foreign    35       9       .      9 |
       73. |  Foreign    35       9       .      9 |
       74. |  Foreign    41      10       .     10 |
           +---------------------------------------+
      
      . qplot mpg, over(foreign) ms(none ..) mla(mpgd mpgd)
      
      . qplot mpg, by(foreign) ms(none ..) mla(mpgd mpgd)
      The idea is that choosing different marker symbols and/or colours for different deciles -- whether indicating decile bin membership or decile values -- is doomed to failure. But self-explanatory marker labels might work,

      Using mpg does expose what has often been flagged -- with small samples and/or numerous ties silly side-effects on which values go in which bins are inevitable.


      Click image for larger version

Name:	decile1.png
Views:	1
Size:	48.7 KB
ID:	1750096
      Click image for larger version

Name:	decile2.png
Views:	1
Size:	40.1 KB
ID:	1750097

      Comment


      • #4
        This is very interesting, but I think what I was looking for was instead of the reflevel() for the reference line being the mean, that I could have it be the 10th and 90th percentiles on stripplot. Something like:

        Code:
         
         stripplot income, over(female) vertical ms(oh) mc(red%30)  msize(tiny) jitter(10) refline(lw(medthick)) reflinestretch(0.4) reflevel(pctile(10))
        That would only get the 10th percentile on there, and in any case I get an error:

        Code:
        pctile(10)() not known as egen function
        I thought there might be a solution for laying another plot over this, but maybe not.

        Thanks again for all of your help.

        Comment


        • #5
          So you want just two of the nine deciles. We can make progress on that.

          Your code mixes detail for quite different options. reflevel() requires the name of an egen function, and nothing else. as the error message explains. But pctile(10) is itself an allowed option call. As explained it draws whiskers of a box plot out to 10% and 90% points.

          I've never (well, hardly ever https://www.youtube.com/watch?v=kBK39BKWuQg) seen serious data on incomes that didn't benefit from using logarithmic scale.

          It is germane that to a good enough approximation for most graphical purposes log(any percentile(data)) = any percentile(log(data))

          So, this example gets as close to your set-up as I can get without hard work.

          Code:
          . sysuse auto, clear
          (1978 automobile data)
          
          . stripplot price, over(foreign) box(barw(0.1)) boffset(-0.45) pctile(10) vertical cumul cumprob centre ysc(log) yla(2000 5000 10000 20000) xla(, tlc(none))
          As you know, many options are possible. Although jittering is sometimes helpful graphically, I've come to think it's the worst way to deal with close or identical values. In the next public version of the help, it's going to be mentioned less enthusiastically.
          Click image for larger version

Name:	stripplot_with_deciles.png
Views:	1
Size:	42.9 KB
ID:	1751011

          Comment


          • #6
            Thanks! I guess I was kind of hoping to get a cap at the end of each whisker of similar width to the box, but if the whiskers come out of rspike instead of rcapsym then I guess it isn't easy to do.

            Comment


            • #7
              Clearly I prefer (much prefer) spikes. And I don't think that whisker ends deserve or require so much emphasis.

              But you can have caps at each whisker end. That's an extra option

              Code:
              whiskers(recast(rcap))
              Given the recast() you should be able to add other options.

              But there is a cap at each end, and sometimes it will be visible on top of the quartiles. That is one reason rcap is not the default.

              Comment

              Working...
              X