No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • sepscatter available from SSC

    Thanks as always to Kit Baum, a new program sepscatter is available from SSC, to be installed using ssc inst sepscatter.

    As usual, the help file gives the documentation, so the better thing for me to do is to give some examples so you can see whether you care.

    The canonical example here uses the auto data. Imagine you want to separate foreign and domestic cars on a scatter plot of mpg versus weight.

    A little study of the manual makes it clear that you can superimpose a plot of foreign cars and one of domestic cars.

    sysuse auto, clear
    set scheme s1color
    scatter mpg weight if foreign || scatter mpg weight if !foreign

    The legend that springs up is dopey as far as you are concerned, but that can be fixed, and while you are doing that you can spell out precisely what you wish in the way of marker symbols and colours and so forth. So far, not so bad, but when you start to wonder about distinguishing the five levels of repair record, most of us protest inwardly at the thought of spelling out the commands for five separate graphs, all superimposed.

    You may know another way to approach this. This kind of problem was one of the main reasons for adding the separate command to Stata back in Stata 6. The keen Stata user will naturally have memorised and be able to add the extra flourish of the veryshortlabel option (that is, very short label).

    separate mpg, by(rep78) veryshortlabel
    scatter mpg? weight
    This does most of the work but scatter is in despair about what ytitle() you might want and the default choice of markers may not be especially suitable for the purpose (depending naturally on what graph scheme you have in place).

    So, after some years of these half-fudged solutions I got round to a convenience command with my idea of a fair first approximation, in which the separation is done on the fly and there is some inbuilt intelligence on markers, legend and axis title. Here are two examples:

    sepscatter mpg weight, separate(foreign) legend(pos(1) col(1) ring(0))
    sepscatter mpg weight, separate(rep78) myla(rep78)

    In the second case, you can play with different marker symbols for the five levels of repair record, but it's hard to improve on just showing the values 1 to 5. Usually scatter would insist that to show five distinct variables using marker label variables you need to spell out (here) mla(rep78 rep78 rep78 rep78 rep78) and more details too, but sepscatter automates that for you, given the mylabel() option.

    Running sepscatter with combineplot (also SSC) is a way of getting several such graphs all at once.

    Attached Files
    Last edited by Nick Cox; 07 May 2014, 04:23.

  • #2
    Both and combineplot seem lovely tools. Thankyou Nick.
    However, I spotted a problem with combineplot:
    It does not seem to handle variable lists.

    . combineplot price- foreign: qnorm @y
    nothing found where name expected


    • #3
      Your first token is


      which doesn't qualify as a name. Remove the space and specify price-foreign and you will get what you want. Usually Stata indulges the space but combineplot does not do that here.


      • #4
        Hi Nick,

        I love it, really helpful!!! One thing came immediately to mind with the use of mylabel(), and it was what happens if you have label values assigned to a categorical variable? So I tried the following:
        sepscatter mpg weight, separate(foreign) myla(foreign)
        and it returned

        Sometimes with a categorical variable where the label values have meaning, it would be better if we still were able to show the category numbers instead of the label values, and then we could do the legend like you did in your example. Is that something we can currently do?

        Thanks again for the code,
        Last edited by Alfonso Sánchez-Peñalver; 07 May 2014, 16:54.
        Alfonso Sanchez-Penalver


        • #5
          Thanks. If you

          gen origin = foreign
          then the numeric values get copied but not the value labels, so you could use that. But I guess what you want should be an option.


          • #6
            Thank you for the code.

            I want to add a regression line to my sepscatterplot. However stata tells me that "option | not allowed".

            is it possible to add a lfit line to the sepscatter command?

            sepscatter var1 var2, separate(crisis_countries) || lfit var1 var2


            • #7
              Christian: That syntax is only allowed with twoway calls, as is implied by the syntax diagram of sepscatter.

              Your question raises a thought that sepscatter might support addplot(), which I will think about.


              • #8
                Thank you for the code, I am trying to make a scatter plot, I was hoping that by using this sepscatter programme I will be able to produce something like this, however, i am having trouble in fitting the best fit line as well as separating the data over months
                my code is

                sepscatter percen year, separate(lab)

                the output if producing a graph with all the months stacked up rather than a linear line like this?


                • #9
                  You don't say how that "best fit line" for each is produced, so I have no suggestions on how to produce it. Or rather there are many ways to do something similar.

                  Given your intent to add such lines, sepscatter is a dead end for you. Also, as you say very little about your data layout and content, it really would be best if you said more about that first.

                  Please do read and act on FAQ Advice #12.


                  • #10
                    Dear Nick, So, the option addplot() is not available now, is it?

                    Ho-Chuan (River) Huang
                    Stata 16.1, MP(4)


                    • #11
                      River Huang Thanks for the prompt. It seems that I never did this, but here it is now.

                      *! 1.1.0 NJC 5 December 2018 
                      *! 1.0.2 NJC 9 May 2014 
                      *! 1.0.1 NJC 8 May 2014 
                      *! 1.0.0 NJC 29 April 2014 
                      program sepscatter 
                          version 9 
                          capture syntax anything [if] [in] [aweight fweight pweight] ///
                          , seperate(varname) [ * ] 
                          if _rc == 0 { 
                              noisily di _n "note: sep" as err "a" as  txt "rate() is correct spelling" 
                              local 0 `anything' `if' `in' [`weight' `exp'] ///
                                      , separate(`seperate') `options' 
                          syntax varlist(numeric min=2 max=2) [if] [in] ///
                          [aweight fweight pweight] , SEParate(varname) ///
                          [MYLAbel(varname) MYNUmeric(varname) MISSing addplot(str asis) *]
                          capture noisily {  
                          quietly { 
                              if "`mylabel'" != "" & "`mynumeric'" != "" { 
                                  di as err "choose mylabel() or mynumeric()" 
                                  exit 198 
                              tokenize `varlist' 
                              args y x 
                              marksample touse 
                              if "`missing'" == "" markout `touse' `separate', strok  
                              count if `touse' 
                              if r(N) == 0 exit 2000 
                              tempname stub 
                              separate `y' if `touse', `missing' by(`separate') ///
                              gen(`stub') veryshortlabel 
                              local Y `r(varlist)' 
                              local nY : word count `Y' 
                          local ytitle : var label `y' 
                          if `"`ytitle'"' == "" local ytitle "`y'"
                          if "`mylabel'`mynumeric'" != "" { 
                              if "`mynumeric'" != "" { 
                                  if "`: value label `mynumeric''" != "" { 
                                      tempvar mylabel 
                                      gen `mylabel' = `mynumeric' 
                                  else local mylabel `mynumeric' 
                              local mylabel : di _dup(`nY') "`mylabel' " 
                              local mypos : di _dup(`nY') "0 " 
                              local mynone : di _dup(`nY') "none " 
                              local mylabel ///
                              ms(`mynone') mla(`mylabel') mlabpos(`mypos') legend(off)   
                          scatter `Y' `x' if `touse' [`weight' `exp'], ///
                              ytitle(`"`ytitle'"') ms(Oh plus X Th Sh Dh) `mylabel' ///
                              `options' || `addplot' 
                          drop `Y' 
                      Here is a silly example:

                      sysuse auto, clear
                      sepscatter mpg weight , sep(foreign) mc(red blue) addplot(qfit mpg weight if !foreign, lc(red) || qfit mpg weight if foreign, lc(blue)) legend(order(1 2))
                      I'll update the help and post the code to Kit Baum for inclusion on SSC.


                      • #12
                        Dear Nick, Many thanks for this new feature.
                        Ho-Chuan (River) Huang
                        Stata 16.1, MP(4)


                        • #13
                          Now the revised version is up on SSC, thanks to Kit Baum as ever.


                          • #14
                            Dear Nick, Many thanks for the update.
                            Ho-Chuan (River) Huang
                            Stata 16.1, MP(4)


                            • #15
                              Hello, Thank you for the sepscatter code. Is there a limit to the number of levels in a variable used in separate()? I have been using sepscatter to look at the trajectory of pain over time among 50+ patients and have multiple data points for most of those patients.
                              My simple syntax is
                              sepscatter pain injuryduration, separate( PatientID)
                              The legend shows 21 unique combinations of symbol/color, and then begins to double and triple up cases attributed to each marker. I know Nick stated, "The legend that springs up is dopey," but was not sure if the issue of multiple cases being assigned to one type of marker could be resolved with manual modification of marker symbols and colors, or even if more than 21 levels could be mapped using sepscatter.
                              Ultimately I would like to use sepscatter pain injuryduration, separate(PatientID) recast (line) to visualize trajectory among those patients for who I have greater than 2 points, but this is not sensible to do presently given that multiple cases are attributed to the same marker. Thank you for your time and information.