Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding means of two categories to stripplot

    Hello, I have generated a striplot with separate markers for two categories, and would like to add symbols or lines to represent the mean of each category.
    (plotting the mean is the problem, not generating it)

    Code:
    sysuse auto, clear
    stripplot weight, over(rep78) stack height(0.1) vertical sep(foreign) centre msymbol(circle) msize(tiny tiny) mcolor(black red)
    Click image for larger version

Name:	Graph.gif
Views:	1
Size:	7.4 KB
ID:	1296552


    My x variable is a string however so addplot(scatter...) hasn't worked for me.
    Any pointers on a better way to do this would be much appreciated.
    --
    OS X Yosemite, (learning) Stata 13

  • #2
    stripplot is from SSC, as you are asked to explain (FAQ Advice on explaining where user-written programs come from). The help explains about four different ways to show means as well as what is shown by default. The easiest is the refline option.

    As your x variable is a string, there is little point in citing an example from the auto dataset showing numeric variables. You would be better advised to give us a reproducible example based on your data or on similar data but showing what you tried.

    Adding the options

    Code:
     
    refline legend(order(6 7))
    would help in the example given.
    Last edited by Nick Cox; 02 Jun 2015, 08:01.

    Comment


    • #3
      It is possible to create a similar graph with scatter. You wrote that your x variable is a string, so I start by converting rep78 to a string and then recreate the graph in your post.
      Code:
      sysuse auto, clear
      lab def rep78 1 "1" 2 "2" 3 "3" 4 "4" 5 "5"
      lab val rep78 rep78
      decode rep78, gen(rep78string)
      drop rep78
      #delimit ;
      stripplot weight,
        over(rep78string) stack height(0.1) vertical sep(foreign) centre
        msymbol(circle) msize(small small) mcolor(black red) name(g1, replace)
      ;
      #delimit cr
      graph export "stripplot.png", width(500) replace
      Click image for larger version

Name:	stripplot.png
Views:	1
Size:	11.3 KB
ID:	1296593


      Next, I convert rep78string to a numeric variable so that it can be used with scatter. Then I create the four variables that will be shown in the graph: weight of domestic cars, weight of foreign cars, and the mean of each. I cannot recreate the stack option of stripplot in scatter and substituted the jitter option but perhaps someone else has a better idea.
      Code:
      encode rep78string, gen(rep78)
      separate weight, by(foreign)
      egen weight0mean = mean(weight0), by(rep78)
      egen weight1mean = mean(weight1), by(rep78)
      
      #delimit ;
      twoway
        (scatter weight0 rep78, msymbol(c) mcolor(black) msize(small) jitter(1))
        (scatter weight1 rep78, msymbol(c) mcolor(red) msize(small) jitter(1))
        (scatter weight0mean rep78, msymbol(+) mcolor(black) msize(vlarge) mlwidth(medthick))
        (scatter weight1mean rep78, msymbol(+) mcolor(red) msize(vlarge) mlwidth(medthick)),
        xscale(r(0.8 5.2))
        ytitle("Weight (lbs.)")
        legend(lab(1 "Domestic") lab(2 "Foreign") lab(3 "Domestic mean")
          lab(4 "Foreign mean") order(1 3 2 4) row(1))
        name(g2, replace)
      ;
      #delimit cr
      graph export "scatter.png", width(500) replace
      Click image for larger version

Name:	scatter.png
Views:	1
Size:	12.3 KB
ID:	1296594

      Comment


      • #4
        Nick informed us that stripplot has several options to add means, so your best solution may be to convert your string to a numeric variable (see my post) so that you can use the addplot option with stripplot.

        Comment


        • #5
          Friedrich: stripplot should be able to deal with a string variable automatically so long as refline is used to show means.

          Comment


          • #6
            Friedrich: thanks, I think converting to numeric variable may be my best option too.

            Nick: Apologies, missed that in the FAQ. Refline gives one mean for each rep78, I am looking to show separate means for domestic and foreign.
            Having worked on this some more, addplot will allow me to include both means:
            Code:
            sysuse auto, clear
            drop if rep78 ==.
            generate str1 rep78str = string(rep78)
            egen mweight = mean(weight), by(foreign rep78str)
            stripplot weight, over(rep78str) stack height(0.1) vertical sep(foreign) centre msymbol(circle) msize(tiny tiny) mcolor(black red) addplot(scatter mweight rep78, ms(T) mcolor(green))
            Click image for larger version

Name:	stripplot-means.gif
Views:	1
Size:	11.9 KB
ID:	1296624


            I am running into problems making the two means look different - my first attempt at modifying the above to change one symbol is giving type mismatch error:
            Code:
            sysuse auto, clear
            drop if rep78 ==.
            generate str1 rep78str = string(rep78)
            egen mweight = mean(weight), by(foreign rep78str)
            stripplot weight, over(rep78str) stack height(0.1) vertical sep(foreign) centre msymbol(circle) msize(tiny tiny) mcolor(black red) addplot(scatter mweight78 if foreign =="Domestic", ms(T) mcolor(blue))
            Is this poor syntax or not possible?
            --
            OS X Yosemite, (learning) Stata 13

            Comment


            • #7
              Ah! I see.

              You can refer to observations by their value labels. This is documented at [U] 13.11 in Stata 14 and (under possibly different chapter and section numbers) in earlier versions. See also http://www.stata-journal.com/article...article=dm0009

              On the main point: I would do something rather different. Here are some possibilities:

              Code:
              sysuse auto, clear
              
              gen mygroup = cond(foreign == 0, rep78, foreign + rep78 + 3)
              
              labmask mygroup, values(rep78)
              
              egen mean = mean(weight), by(mygroup)
              
              stripplot weight, width(100) over(mygroup) centre stack height(0.6) vertical ///
              xmla(3 "Domestic" 8 "Foreign", tlc(none) tl(*7) labsize(medium)) xtitle("")  ///
              addplot(scatter mean mygroup, ms(Dh) msize(*2))
              
              stripplot weight, width(100) over(rep78) by(foreign) centre stack height(0.6) vertical ///
              addplot(scatter mean rep78, ms(Dh) msize(*2))
              Here the first graph is using over() alone and the second uses over() by(). (I don't know why they are imported at different sizes.



              Attached Files
              Last edited by Nick Cox; 02 Jun 2015, 11:05.

              Comment


              • #8
                Thanks, this looks great. I hadn't come across labmask/ labutil before either, v. useful.
                --
                OS X Yosemite, (learning) Stata 13

                Comment

                Working...
                X