Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Third over() variable in graph dot

    Stata allows only two over() variables in "graph dot ...". Is there a way to mimic a third over() variable in a graph dot? Below is the code and data I used to create a Cleaveland Dot plot. I want to label "City" and "Village" on the y-axis, on the left, for the group of cities and villages. If that is not possible, I want to have a gap between the group of Cities and Villages so that they seem grouped (group of cities is distinct from group of villages).

    graph dot (asis) _50 _90 _100, over(year, label(labsize(small))) over(population, sort(seqcode)) marker(1, msymbol(circle)) marker(2, msymbol(circle_hollow)) marker(3, msymbol(+)) title("Income shares") scheme(s1mono)

    data is below:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte seqCode str7 popgroup str9 population str7 year float(_50 _90 _100)
    1 "City"    "City 1"    "2004-05" 15.47 44.12 40.42
    2 "City"    "City 2"    "2004-05" 21.29 43.25 35.47
    3 "City"    "City 3"    "2004-05" 18.04 45.72 36.24
    4 "Village" "Village 1" "2004-05" 15.62 46.15 38.24
    5 "Village" "Village 2" "2004-05" 18.89 43.84 37.27
    6 "Village" "Village 3" "2004-05" 24.37 46.24 29.39
    1 "City"    "City 1"    "2011-12" 17.43 44.39 38.18
    2 "City"    "City 2"    "2011-12" 17.99 44.08 37.93
    3 "City"    "City 3"    "2011-12" 18.04 43.62 38.34
    4 "Village" "Village 1" "2011-12" 14.95 42.89 42.16
    5 "Village" "Village 2" "2011-12" 16.33 44.57 39.11
    6 "Village" "Village 3" "2011-12" 28.98 47.14 23.88
    end
    label var seqCode "seqCode" 
    label var popgroup "popGroup" 
    label var population "Population" 
    label var year "Year" 
    label var _50 "0_50" 
    label var _90 "50_90" 
    label var _100 "90_100"

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte seqCode str7 popgroup str9 population str7 year float(_50 _90 _100)
    1 "City"    "City 1"    "2004-05" 15.47 44.12 40.42
    2 "City"    "City 2"    "2004-05" 21.29 43.25 35.47
    3 "City"    "City 3"    "2004-05" 18.04 45.72 36.24
    4 "Village" "Village 1" "2004-05" 15.62 46.15 38.24
    5 "Village" "Village 2" "2004-05" 18.89 43.84 37.27
    6 "Village" "Village 3" "2004-05" 24.37 46.24 29.39
    1 "City"    "City 1"    "2011-12" 17.43 44.39 38.18
    2 "City"    "City 2"    "2011-12" 17.99 44.08 37.93
    3 "City"    "City 3"    "2011-12" 18.04 43.62 38.34
    4 "Village" "Village 1" "2011-12" 14.95 42.89 42.16
    5 "Village" "Village 2" "2011-12" 16.33 44.57 39.11
    6 "Village" "Village 3" "2011-12" 28.98 47.14 23.88
    end
    label var seqCode "seqCode"
    label var popgroup "popGroup"
    label var population "Population"
    label var year "Year"
    label var _50 "0_50"
    label var _90 "50_90"
    label var _100 "90_100"
    
    egen cityyear= group(population year), label
    
    graph dot (asis) _50 _90 _100, over(cityyear, label(labsize(small))) ///
    over(popgroup, label(angle(vert)) sort(seqcode)) ///
    marker(1, msymbol(circle)) marker(2, msymbol(circle_hollow)) ///
    marker(3, msymbol(+)) title("Income shares") scheme(s1mono) nofill
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	41.8 KB
ID:	1762916

    Last edited by Andrew Musau; 03 Sep 2024, 04:57.

    Comment


    • #3
      I'd change the marker symbols and some other details if this were my graph. But I don't see that you really need a third over() option. If you are sure that you do, reach for by() instead.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte seqCode str7 popgroup str9 population str7 year float(_50 _90 _100)
      1 "City"    "City 1"    "2004-05" 15.47 44.12 40.42
      2 "City"    "City 2"    "2004-05" 21.29 43.25 35.47
      3 "City"    "City 3"    "2004-05" 18.04 45.72 36.24
      4 "Village" "Village 1" "2004-05" 15.62 46.15 38.24
      5 "Village" "Village 2" "2004-05" 18.89 43.84 37.27
      6 "Village" "Village 3" "2004-05" 24.37 46.24 29.39
      1 "City"    "City 1"    "2011-12" 17.43 44.39 38.18
      2 "City"    "City 2"    "2011-12" 17.99 44.08 37.93
      3 "City"    "City 3"    "2011-12" 18.04 43.62 38.34
      4 "Village" "Village 1" "2011-12" 14.95 42.89 42.16
      5 "Village" "Village 2" "2011-12" 16.33 44.57 39.11
      6 "Village" "Village 3" "2011-12" 28.98 47.14 23.88
      end
      label var seqCode "seqCode" 
      label var popgroup "popGroup" 
      label var population "Population" 
      label var year "Year" 
      label var _50 "0-50" 
      label var _90 "50-90" 
      label var _100 "90-100"
      
      graph dot (asis) _50 _90 _100, over(year, label(labsize(small))) ///
      over(population, sort(seqCode)) ///
      marker(1, msymbol(circle)) marker(2, msymbol(circle_hollow)) ///
      marker(3, msymbol(+)) title("Income shares") scheme(s1mono) nofill ///
      legend(row(1))

      Click image for larger version

Name:	city_village.png
Views:	2
Size:	51.3 KB
ID:	1762918
      Attached Files

      Comment


      • #4
        Code:
        graph dot (asis) _50 _90 _100, over(year, label(labsize(small))) ///
        over(population, sort(seqCode)) by(popgroup, note("")  col(1) title("Income shares")) ///
        marker(1, msymbol(circle)) marker(2, msymbol(circle_hollow)) ///
        marker(3, msymbol(+))  scheme(s1mono) nofill ///
        legend(row(1)) subtitle(, pos(9) nobox nobexpand fcolor(none))
        Click image for larger version

Name:	city_village2.png
Views:	1
Size:	48.9 KB
ID:	1762929

        Comment


        • #5
          As these are income shares, a stacked bar chart might appeal -- or some twist on one, as here. I used tabplot from the Stata Journal.

          The offset should be tweaked slightly.

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input byte seqCode str7 popgroup str9 population str7 year float(_50 _90 _100)
          1 "City"    "City 1"    "2004-05" 15.47 44.12 40.42
          2 "City"    "City 2"    "2004-05" 21.29 43.25 35.47
          3 "City"    "City 3"    "2004-05" 18.04 45.72 36.24
          4 "Village" "Village 1" "2004-05" 15.62 46.15 38.24
          5 "Village" "Village 2" "2004-05" 18.89 43.84 37.27
          6 "Village" "Village 3" "2004-05" 24.37 46.24 29.39
          1 "City"    "City 1"    "2011-12" 17.43 44.39 38.18
          2 "City"    "City 2"    "2011-12" 17.99 44.08 37.93
          3 "City"    "City 3"    "2011-12" 18.04 43.62 38.34
          4 "Village" "Village 1" "2011-12" 14.95 42.89 42.16
          5 "Village" "Village 2" "2011-12" 16.33 44.57 39.11
          6 "Village" "Village 3" "2011-12" 28.98 47.14 23.88
          end
          label var seqCode "seqCode" 
          label var popgroup "popGroup" 
          label var population "Population" 
          label var year "Year" 
          label var _50 "0-50" 
          label var _90 "50-90" 
          label var _100 "90-100"
          
          reshape long _, i(population year) j(which)
          label def which 50 "0-50" 90 "50-90" 100 "90-100"
          label val which which 
          label var which "some explanation here"
          
          set scheme stcolor 
          tabplot year which [iw=_] , horizontal by(population, note("") t1title(Income shares) col(1)) ytitle("") showval(offset(0.27) mlabsize(medsmall) format(%2.1f)) subtitle(, nobox nobexpand fcolor(none) pos(9)) separate(which) xsc(r(0.8 .))
          Click image for larger version

Name:	incomeshares.png
Views:	1
Size:	34.8 KB
ID:	1762958

          Comment


          • #6
            Also if categories of something are 0-50, 50-90, 90-100, which way would 50 or 90 jump?

            Comment


            • #7
              Another take

              Code:
              clear
              input byte seqCode str7 popgroup str9 population str7 year float(_50 _90 _100)
              1 "City"    "City 1"    "2004-05" 15.47 44.12 40.42
              2 "City"    "City 2"    "2004-05" 21.29 43.25 35.47
              3 "City"    "City 3"    "2004-05" 18.04 45.72 36.24
              4 "Village" "Village 1" "2004-05" 15.62 46.15 38.24
              5 "Village" "Village 2" "2004-05" 18.89 43.84 37.27
              6 "Village" "Village 3" "2004-05" 24.37 46.24 29.39
              1 "City"    "City 1"    "2011-12" 17.43 44.39 38.18
              2 "City"    "City 2"    "2011-12" 17.99 44.08 37.93
              3 "City"    "City 3"    "2011-12" 18.04 43.62 38.34
              4 "Village" "Village 1" "2011-12" 14.95 42.89 42.16
              5 "Village" "Village 2" "2011-12" 16.33 44.57 39.11
              6 "Village" "Village 3" "2011-12" 28.98 47.14 23.88
              end
              label var seqCode "seqCode" 
              label var popgroup "popGroup" 
              label var population "Population" 
              label var year "Year" 
              label var _50 "0-50" 
              label var _90 "50-90" 
              label var _100 "90-100"
              
              reshape long _, i(population year) j(which)
              label def which 50 "0-50" 90 "50-90" 100 "90-100"
              label val which which 
              rename _ Percent 
              
              set scheme stcolor 
              graph dot (asis) Percent, over(year) over(which) by(population, compact note("") t1title(Income shares) col(1)) ytitle("") subtitle(, nobox nobexpand fcolor(none) pos(9)) asyvars marker(1, ms(Oh) msize(large)) marker(2, ms(+) msize(large)) linetype(line) lines(lw(thin) lc(gs12)) ytitle(Percent)
              Click image for larger version

Name:	city_village3.png
Views:	1
Size:	79.3 KB
ID:	1762981

              Comment


              • #8
                Originally posted by Nick Cox View Post
                Also if categories of something are 0-50, 50-90, 90-100, which way would 50 or 90 jump?
                0-50, 50-90 and 90-100 are percentiles: bottom 50% income earners (households), 50-90% of households and top 10% of households, and their income shares.

                Comment


                • #9
                  Thanks! I like the tabplot. I will give it a try. However, I need it in Black & White. I will have to figure out which scheme works well in B&W. Any suggestion?

                  Comment


                  • #10
                    [QUOTE=Nick Cox;n1762917] I'd change the marker symbols and some other details if this were my graph. But I don't see that you really need a third over() option. If you are sure that you do, reach for by() instead. /QUOTE]

                    Thanks. How would you change it? Any suggestions are welcome. I would like the readers to read and understand the graph easily. I think that the light-grey marker symbols should be changed to a dark color. I think the darkness or lightness of marker symbols is picked by the scheme.

                    Comment


                    • #11
                      Originally posted by Nick Cox View Post
                      Code:
                      graph dot (asis) _50 _90 _100, over(year, label(labsize(small))) ///
                      over(population, sort(seqCode)) by(popgroup, note("") col(1) title("Income shares")) ///
                      marker(1, msymbol(circle)) marker(2, msymbol(circle_hollow)) ///
                      marker(3, msymbol(+)) scheme(s1mono) nofill ///
                      legend(row(1)) subtitle(, pos(9) nobox nobexpand fcolor(none))
                      [ATTACH=CONFIG]n1762929[/ATTACH]
                      This is interesting. I did not know this was possible.

                      Comment


                      • #12
                        Originally posted by Andrew Musau View Post
                        Code:
                        * Example generated by -dataex-. For more info, type help dataex
                        clear
                        input byte seqCode str7 popgroup str9 population str7 year float(_50 _90 _100)
                        1 "City" "City 1" "2004-05" 15.47 44.12 40.42
                        2 "City" "City 2" "2004-05" 21.29 43.25 35.47
                        3 "City" "City 3" "2004-05" 18.04 45.72 36.24
                        4 "Village" "Village 1" "2004-05" 15.62 46.15 38.24
                        5 "Village" "Village 2" "2004-05" 18.89 43.84 37.27
                        6 "Village" "Village 3" "2004-05" 24.37 46.24 29.39
                        1 "City" "City 1" "2011-12" 17.43 44.39 38.18
                        2 "City" "City 2" "2011-12" 17.99 44.08 37.93
                        3 "City" "City 3" "2011-12" 18.04 43.62 38.34
                        4 "Village" "Village 1" "2011-12" 14.95 42.89 42.16
                        5 "Village" "Village 2" "2011-12" 16.33 44.57 39.11
                        6 "Village" "Village 3" "2011-12" 28.98 47.14 23.88
                        end
                        label var seqCode "seqCode"
                        label var popgroup "popGroup"
                        label var population "Population"
                        label var year "Year"
                        label var _50 "0_50"
                        label var _90 "50_90"
                        label var _100 "90_100"
                        
                        egen cityyear= group(population year), label
                        
                        graph dot (asis) _50 _90 _100, over(cityyear, label(labsize(small))) ///
                        over(popgroup, label(angle(vert)) sort(seqcode)) ///
                        marker(1, msymbol(circle)) marker(2, msymbol(circle_hollow)) ///
                        marker(3, msymbol(+)) title("Income shares") scheme(s1mono) nofill
                        [ATTACH=CONFIG]n1762916[/ATTACH]
                        Thanks. That is an interesting trick, however, I prefer not to have the place (city, village) name repeated for each year.

                        Comment


                        • #13
                          #8 misses the point. If a value were exactly equal to the 50th percentile or the 90th percentile would it go up or down? Perhaps the issue doesn't arise -- if the percentile isn't a value in the data -- or is trivial, but in principle bin limits should be unambiguous. See for example Section 5 of https://journals.sagepub.com/doi/pdf...867X1801800311 and the references given there. You need only an explanation such as "Lower limits are inclusive" somewhere.

                          #9 scheme(s1mono) is one to try as in your original post.

                          #10 I suggest that two essentials are that three marker symbols are of equal size and equal visual impact. If they might ever occlude each other, you need open or hollow symbols such as Oh or Th and + works well with any of those,

                          The most striking detail is that Village 3 is different. You can judge graph designs partly on how clearly they make that point.

                          Comment


                          • #14
                            Originally posted by Anup Tyagi View Post

                            Thanks. That is an interesting trick, however, I prefer not to have the place (city, village) name repeated for each year.
                            Doesn't Nick's illustration in #4 using a -by()- option achieve this?

                            Comment


                            • #15
                              Originally posted by Andrew Musau View Post

                              Doesn't Nick's illustration in #4 using a -by()- option achieve this?
                              Yes, it does.

                              Comment

                              Working...
                              X