Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • box plot with a dummy variable

    hi, i want to use a box plot with a dummy variable like in the image. So in other words, i want to graph a box plot using the percentage of people with 0 or 1.
    Thank you
    Click image for larger version

Name:	boxplot.png
Views:	1
Size:	10.6 KB
ID:	1581198

  • #2
    A boxplot shows the median (the line in the middle of the box), the first and the third quartile (the boundaries of boxes), and the whiskers are also based on the quartiles. These quartiles (the median is the second quartile) mean that we ordered the observations from smallest to largest on a variable and the quartile is the value on that variable such that 1, 2 or 3 quarters are below it.

    An indicator (dummy) variable takes two values, 0 or 1. Lets say our data is 0 0 0 1, then our first quartile is the value for the first observation (0) our second quartile somewhere between second and the third observation (0 and 0, so 0), and third quartile will the third observation (0). That would be a very uninteristing box plot.

    All that to say that boxplots are not appropriate for indicator variables.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Maarten Buis is bang on. The numbers shown on a box plot of such a variable will be typically be 0 or 1 (and just possibly 0.5).

      Another possibility is that Allan is not explaining himself very well. He wants a box plot of one variable, but separately by an indicator (he says "dummy", a term better avoided in my view (*)) and also showing the fractions of each of the indicator values.

      I don't think such a plot is provided by any official command, but here is one alternative from stripplot (SSC).

      Code:
      sysuse auto, clear
      
      stripplot mpg, over(foreign) yla(, ang(h)) cumul box(barw(0.05)) boffset(-0.1) pctile(5) vertical name(G1, replace)
      
      bysort foreign : egen count = count(mpg)
      gen where = 40
      
      gen toshow = "{it:n} = " + strofreal(count)
      
      stripplot mpg, over(foreign) yla(, ang(h)) cumul box(barw(0.05)) boffset(-0.1) pctile(5) vertical addplot(scatter where foreign, ms(none) mla(toshow) mlabsize(medlarge)) name(G2, replace)
      The first plot gives a box plot -- and I use a variant in which whiskers just go to specified percentiles, but nothing compulsory there -- and consistently aligned quantile plots give extra detail and convey the frequency of data points in each group.


      Click image for larger version

Name:	qboxG1.png
Views:	1
Size:	32.8 KB
ID:	1581213

      The second plot shows how just a little extra work gives you an explicit display of group size. That's just a number that Stata can give you and text you can add where you want.


      Click image for larger version

Name:	qboxG2.png
Views:	1
Size:	33.7 KB
ID:	1581214


      I spent about 20 years trying to persuade students and colleagues to draw box plots as well as, or instead of, histograms. And about 20 years trying to persuade them to draw something else too. For the very common situation of just a few groups, there is always space to show more detail helpfully. It's only when the number of groups is more like 30 or 300 that the extreme compression of box plots can be ideal.

      (*) For a view that the term "dummy variables" should be deprecated see https://www.stata-journal.com/articl...article=dm0099

      Comment

      Working...
      X