Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Invalid syntax for mlabel for too many text values in scatter plot

    Hi

    I am trying to make a scatter plot and i am getting an error message with the mlabel option:

    twoway scatter difference seqnum , yscale(log) mlab(movies) m(i)
    invalid syntax
    r(198);

    When i drop the mlabel the graph works fine. Is there a way to fix this problem?

    Thanks

    Veeresh



  • #2
    I see no obvious problem. Tell us more about your data (e.g. describe movies) or give us a reproducible example using a standard Stata dataset.

    Comment


    • #3
      The values of movies are strings basically. It has 7126 unique string values and the format is str29 and the total observations are also 7126(so you can dump any list of strings like companies or cars(use cars dataset) as long as they are around 7000. When i plotting the graph i am expecting Stata to return these values. Thanks for your help

      Comment


      • #4
        I tried to restrict the observations to 2000 and it is working fine for that, but if i go beyond that, say 4000 it cracks up...

        Comment


        • #5
          I don't think it's the number of strings that's causing the problem.
          I expanded the auto data and created unique strings (and added some variability to height and weight to allow for more data points on the graph).
          The resulting graph is unreadable--which I would pretty much expect with any graph with 7,000 strings as labels--but there's no error.

          Code:
          sysuse auto.dta, clear
          
          **create ~7,000 observations
          expand 100
          
          **create a bunch of unique names by adding random string of letters at the end
          **create some random numbers from 1 to 26 to use to assign letters
          forv i=1/10 {
          gen r26_`i'=1+int((26-1+1)*runiform())
          }
          
          local a=1
          foreach let in a b c d e f g h i j k l m n o p q r s t u v w x y z {
              forv i=1/10 {
                  replace make=make+"`let'" if r26_`i'==`a'
              }
          local ++a
          }
          
          **add some more variability in to weight and length as well to get more data points
          gen w=1+int((300-1+1)*runiform())
          gen l=1+int((50-1+1)*runiform())
          replace weight=weight+w
          replace length=length+l
          
          twoway scatter weight length, yscale(log) mlab(make) m(i)
          
          **graph is useless in terms of presenting data, but no error

          Comment


          • #6
            Is there a way I can only display extreme text values(mlabels)? For instance least 5 and maximum 5 values in the graph.

            Comment


            • #7
              Naturally. The problem can remain how readable the graph is, but there is no difficulty with syntax.

              Code:
               
              sysuse auto
              sort mpg
              gen extreme = _n <= 5 | _n >= 70
              scatter mpg weight if !extreme || scatter mpg weight if extreme, mlabel(make) mlabpos(0) ms(none)

              Comment


              • #8
                Thanks Nick. I was able to use the text function but this looks like a better way to address the problem.

                Comment


                • #9
                  I am wondering if there is a way I can mark some points of interest, as there are around 7000 data points it is quite unreadable, so i am thinking of highlighting specific points in the graphs. For instance, say I want to highlight 600, 732, 834 (as one group) (meaning of one color) and 228, 1, 330 (as second group). Is there a way i can achieve this? And may be i can specify the text in the legend reporting the groups for making a better sense of the data.

                  Comment


                  • #10
                    It's the same answer. Select particular groups using if and/or logical operators and overlay graphs. Only the details vary (e.g. you can specify observation numbers using _n). Look up legend options to see how to control legend details.

                    Code:
                      
                    sysuse auto
                    gen group1 = inlist(_n, 42, 66)
                    gen group2 = inlist(_n, 13, 31, 73)  
                    scatter mpg weight if !group1 & !group2 || scatter mpg weight if group1, mlabel(make) mlabcolor(red) mlabpos(0) ms(none) || scatter mpg weight if group2, mlabel(make) mlabcolor(blue)

                    Comment

                    Working...
                    X