Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Binscatter: Ordering x axis according to size of dots.

    Hello.
    I have data similar to the following :
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int price float group
     4099 4
     4749 4
     3799 4
     4816 4
     7827 2
     5788 4
     4453 4
     5189 4
    10372 3
     4082 4
    11385 3
    14500 1
    15906 1
     3299 4
     5705 4
     4504 4
     5104 4
     3667 4
     3955 4
     3984 4
     4010 4
     5886 4
     6342 4
     4389 4
     4187 4
    11497 3
    13594 1
    13466 1
     3829 4
     5379 4
     6165 4
     4516 4
     6303 4
     3291 4
     8814 2
     5172 4
     4733 4
     4890 4
     4181 4
     4195 4
    10371 3
     4647 4
     4425 4
     4482 4
     6486 4
     4060 4
     5798 4
     4934 4
     5222 4
     4723 4
     4424 4
     4172 4
     9690 2
     6295 4
     9735 2
     6229 4
     4589 4
     5079 4
     8129 2
     4296 4
     5799 4
     4499 4
     3995 4
    12990 1
     3895 4
     3798 4
     5899 4
     3748 4
     5719 4
     7140 2
     5397 4
     4697 4
     6850 4
    11995 3
    end
    label values group group_label1
    label def group_label1 1 "A", modify
    label def group_label1 2 "B", modify
    label def group_label1 3 "C", modify
    label def group_label1 4 "D", modify
    I am making a binned scatter plot of price and group and I want to order the xaxis according to the size/value of the bins. I want to sort the bins in descending order. So after running:

    Code:
      binscatter price group, xlabel(1 "A" 2 "B" 3 "C" 4 "D",  noticks)
    I want the axis to be ordered A C B D and not A B C D
    Click image for larger version

Name:	stata_forum.png
Views:	1
Size:	142.0 KB
ID:	1594180
    .

  • #2
    binscatter is a community-contributed command, so by FAQ Advice #12, you are asked to say where it comes from.

    I know that many people find it useful, and it may well offer a solution, but the task here of calculating and plotting means is also possible directly. The twist of getting the groups in order of the means was discussed in detail earlier today. See https://www.statalist.org/forums/for...occupation-sum

    Here is your graph -- and what might possibly be considered an even more helpful one, showing each distribution too.

    Code:
    clear
    input int price float group
     4099 4
     4749 4
     3799 4
     4816 4
     7827 2
     5788 4
     4453 4
     5189 4
    10372 3
     4082 4
    11385 3
    14500 1
    15906 1
     3299 4
     5705 4
     4504 4
     5104 4
     3667 4
     3955 4
     3984 4
     4010 4
     5886 4
     6342 4
     4389 4
     4187 4
    11497 3
    13594 1
    13466 1
     3829 4
     5379 4
     6165 4
     4516 4
     6303 4
     3291 4
     8814 2
     5172 4
     4733 4
     4890 4
     4181 4
     4195 4
    10371 3
     4647 4
     4425 4
     4482 4
     6486 4
     4060 4
     5798 4
     4934 4
     5222 4
     4723 4
     4424 4
     4172 4
     9690 2
     6295 4
     9735 2
     6229 4
     4589 4
     5079 4
     8129 2
     4296 4
     5799 4
     4499 4
     3995 4
    12990 1
     3895 4
     3798 4
     5899 4
     3748 4
     5719 4
     7140 2
     5397 4
     4697 4
     6850 4
    11995 3
    end
    label values group group_label1
    label def group_label1 1 "A", modify
    label def group_label1 2 "B", modify
    label def group_label1 3 "C", modify
    label def group_label1 4 "D", modify
    
    egen mean = mean(-price), by(group)
    egen group2 = group(mean)
    
    * labmask is from the Stata Journal
    labmask group2, values(group) decode
    
    label var group2 " "
    set scheme s1color
    replace mean = -mean
    scatter mean group2, xla(1/4, valuelabel) name(G1, replace)
    
    * stripplot is from SSC; for an overview see https://www.statalist.org/forums/forum/general-stata-discussion/general/209911-stripplot-updated-on-ssc
    stripplot price, over(group2) cumul cumprob refline vertical name(G2, replace)
    Click image for larger version

Name:	notbinscatter1.png
Views:	1
Size:	22.6 KB
ID:	1594184

    Click image for larger version

Name:	notbinscatter2.png
Views:	1
Size:	29.8 KB
ID:	1594185


    Note: the additional option centre (or even center) will place each point cluster display in the second graph directly over the axis label below.
    Last edited by Nick Cox; 12 Feb 2021, 10:28.

    Comment


    • #3
      Thank you, Nick! That works perfectly on my dataset. stripplot from SSC is also quite useful. A follow up query: Let's say I wanted to show the mean price by(foreign). So for A, there would be a dot for foreign and a dot for domestic and so on. The axis would still be ordered on the mean for only one of the groups, say foreign =0.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte foreign int price float group
      0  4099 4
      0  4749 4
      0  3799 4
      0  4816 4
      0  7827 2
      0  5788 4
      0  4453 4
      0  5189 4
      0 10372 3
      0  4082 4
      0 11385 3
      0 14500 1
      0 15906 1
      0  3299 4
      0  5705 4
      0  4504 4
      0  5104 4
      0  3667 4
      0  3955 4
      0  3984 4
      0  4010 4
      0  5886 4
      0  6342 4
      0  4389 4
      0  4187 4
      0 11497 3
      0 13594 1
      0 13466 1
      0  3829 4
      0  5379 4
      0  6165 4
      0  4516 4
      0  6303 4
      0  3291 4
      0  8814 2
      0  5172 4
      0  4733 4
      0  4890 4
      0  4181 4
      0  4195 4
      0 10371 3
      0  4647 4
      0  4425 4
      0  4482 4
      0  6486 4
      0  4060 4
      0  5798 4
      0  4934 4
      0  5222 4
      0  4723 4
      0  4424 4
      0  4172 4
      1  9690 2
      1  6295 4
      1  9735 2
      1  6229 4
      1  4589 4
      1  5079 4
      1  8129 2
      1  4296 4
      1  5799 4
      1  4499 4
      1  3995 4
      1 12990 1
      1  3895 4
      1  3798 4
      1  5899 4
      1  3748 4
      1  5719 4
      1  7140 2
      1  5397 4
      1  4697 4
      1  6850 4
      1 11995 3
      end
      label values foreign origin
      label def origin 0 "Domestic", modify
      label def origin 1 "Foreign", modify
      label values group group_label1
      label def group_label1 1 "A", modify
      label def group_label1 2 "B", modify
      label def group_label1 3 "C", modify
      label def group_label1 4 "D", modify
      The "by" option isn't allowed with scatter and I unsuccessfully tried to overlay two different graphs because of how we defined the groups.

      Comment


      • #4
        HI Nick, I think your most recent response was deleted while the server was down. Do you have any insight on my follow up question above?

        Comment


        • #5
          Indeed. I don't have a record of what I said, but it was probably similar to this.

          The "by" option isn't allowed with scatter


          Not so. Perhaps what you mean is that you tried to use it but got an error for some other reason.

          However, starting from your example data in #3 I did this. There are many other ways of showing the same or similar results using variations on options
          over() by() separate() in stripplot.


          Code:
          * for more on this see section 9 in https://www.stata-journal.com/article.html?article=dm0055
          egen mean = mean(cond(foreign == 0, -price, .)), by(group)
          
          egen group2 = group(mean)
          
          * labmask is from the Stata Journal
          labmask group2, values(group) decode
          
          set scheme s1color 
          
          * stripplot is from SSC; for an overview see https://www.statalist.org/forums/forum/general-stata-discussion/general/209911-stripplot-updated-on-ssc
          stripplot price, over(foreign) by(group2, legend(off) row(1) note("") compact) xtitle("") centre cumul cumprob refline vertical separate(foreign)


          Click image for larger version

Name:	twowaystripchart.png
Views:	1
Size:	26.8 KB
ID:	1594621


          My main point is that wanting to show the means -- in your examples so far 4 or 8 of them -- is fair enough, but you have enough space to show more detail without serious risk of confusion.

          Comment

          Working...
          X