Third over() variable in graph dot

Anup Tyagi

Join Date: Sep 2024
Posts: 14

Third over() variable in graph dot

03 Sep 2024, 03:42

Stata allows only two over() variables in "graph dot ...". Is there a way to mimic a third over() variable in a graph dot? Below is the code and data I used to create a Cleaveland Dot plot. I want to label "City" and "Village" on the y-axis, on the left, for the group of cities and villages. If that is not possible, I want to have a gap between the group of Cities and Villages so that they seem grouped (group of cities is distinct from group of villages).

graph dot (asis) _50 _90 _100, over(year, label(labsize(small))) over(population, sort(seqcode)) marker(1, msymbol(circle)) marker(2, msymbol(circle_hollow)) marker(3, msymbol(+)) title("Income shares") scheme(s1mono)

data is below:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte seqCode str7 popgroup str9 population str7 year float(_50 _90 _100)
1 "City"    "City 1"    "2004-05" 15.47 44.12 40.42
2 "City"    "City 2"    "2004-05" 21.29 43.25 35.47
3 "City"    "City 3"    "2004-05" 18.04 45.72 36.24
4 "Village" "Village 1" "2004-05" 15.62 46.15 38.24
5 "Village" "Village 2" "2004-05" 18.89 43.84 37.27
6 "Village" "Village 3" "2004-05" 24.37 46.24 29.39
1 "City"    "City 1"    "2011-12" 17.43 44.39 38.18
2 "City"    "City 2"    "2011-12" 17.99 44.08 37.93
3 "City"    "City 3"    "2011-12" 18.04 43.62 38.34
4 "Village" "Village 1" "2011-12" 14.95 42.89 42.16
5 "Village" "Village 2" "2011-12" 16.33 44.57 39.11
6 "Village" "Village 3" "2011-12" 28.98 47.14 23.88
end
label var seqCode "seqCode" 
label var popgroup "popGroup" 
label var population "Population" 
label var year "Year" 
label var _50 "0_50" 
label var _90 "50_90" 
label var _100 "90_100"

Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10199

03 Sep 2024, 04:45

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte seqCode str7 popgroup str9 population str7 year float(_50 _90 _100)
1 "City"    "City 1"    "2004-05" 15.47 44.12 40.42
2 "City"    "City 2"    "2004-05" 21.29 43.25 35.47
3 "City"    "City 3"    "2004-05" 18.04 45.72 36.24
4 "Village" "Village 1" "2004-05" 15.62 46.15 38.24
5 "Village" "Village 2" "2004-05" 18.89 43.84 37.27
6 "Village" "Village 3" "2004-05" 24.37 46.24 29.39
1 "City"    "City 1"    "2011-12" 17.43 44.39 38.18
2 "City"    "City 2"    "2011-12" 17.99 44.08 37.93
3 "City"    "City 3"    "2011-12" 18.04 43.62 38.34
4 "Village" "Village 1" "2011-12" 14.95 42.89 42.16
5 "Village" "Village 2" "2011-12" 16.33 44.57 39.11
6 "Village" "Village 3" "2011-12" 28.98 47.14 23.88
end
label var seqCode "seqCode"
label var popgroup "popGroup"
label var population "Population"
label var year "Year"
label var _50 "0_50"
label var _90 "50_90"
label var _100 "90_100"

egen cityyear= group(population year), label

graph dot (asis) _50 _90 _100, over(cityyear, label(labsize(small))) ///
over(popgroup, label(angle(vert)) sort(seqcode)) ///
marker(1, msymbol(circle)) marker(2, msymbol(circle_hollow)) ///
marker(3, msymbol(+)) title("Income shares") scheme(s1mono) nofill

Click image for larger version

Name: Graph.png
Views: 1
Size: 41.8 KB
ID: 1762916

Last edited by Andrew Musau; 03 Sep 2024, 04:57.

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35699

03 Sep 2024, 05:28

I'd change the marker symbols and some other details if this were my graph. But I don't see that you really need a third over() option. If you are sure that you do, reach for by() instead.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte seqCode str7 popgroup str9 population str7 year float(_50 _90 _100)
1 "City"    "City 1"    "2004-05" 15.47 44.12 40.42
2 "City"    "City 2"    "2004-05" 21.29 43.25 35.47
3 "City"    "City 3"    "2004-05" 18.04 45.72 36.24
4 "Village" "Village 1" "2004-05" 15.62 46.15 38.24
5 "Village" "Village 2" "2004-05" 18.89 43.84 37.27
6 "Village" "Village 3" "2004-05" 24.37 46.24 29.39
1 "City"    "City 1"    "2011-12" 17.43 44.39 38.18
2 "City"    "City 2"    "2011-12" 17.99 44.08 37.93
3 "City"    "City 3"    "2011-12" 18.04 43.62 38.34
4 "Village" "Village 1" "2011-12" 14.95 42.89 42.16
5 "Village" "Village 2" "2011-12" 16.33 44.57 39.11
6 "Village" "Village 3" "2011-12" 28.98 47.14 23.88
end
label var seqCode "seqCode" 
label var popgroup "popGroup" 
label var population "Population" 
label var year "Year" 
label var _50 "0-50" 
label var _90 "50-90" 
label var _100 "90-100"

graph dot (asis) _50 _90 _100, over(year, label(labsize(small))) ///
over(population, sort(seqCode)) ///
marker(1, msymbol(circle)) marker(2, msymbol(circle_hollow)) ///
marker(3, msymbol(+)) title("Income shares") scheme(s1mono) nofill ///
legend(row(1))

Click image for larger version

Name: city_village.png
Views: 2
Size: 51.3 KB
ID: 1762918

Attached Files

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35699

03 Sep 2024, 07:44

Code:

graph dot (asis) _50 _90 _100, over(year, label(labsize(small))) ///
over(population, sort(seqCode)) by(popgroup, note("")  col(1) title("Income shares")) ///
marker(1, msymbol(circle)) marker(2, msymbol(circle_hollow)) ///
marker(3, msymbol(+))  scheme(s1mono) nofill ///
legend(row(1)) subtitle(, pos(9) nobox nobexpand fcolor(none))

Click image for larger version

Name: city_village2.png
Views: 1
Size: 48.9 KB
ID: 1762929

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35699

03 Sep 2024, 11:24

As these are income shares, a stacked bar chart might appeal -- or some twist on one, as here. I used tabplot from the Stata Journal.

The offset should be tweaked slightly.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte seqCode str7 popgroup str9 population str7 year float(_50 _90 _100)
1 "City"    "City 1"    "2004-05" 15.47 44.12 40.42
2 "City"    "City 2"    "2004-05" 21.29 43.25 35.47
3 "City"    "City 3"    "2004-05" 18.04 45.72 36.24
4 "Village" "Village 1" "2004-05" 15.62 46.15 38.24
5 "Village" "Village 2" "2004-05" 18.89 43.84 37.27
6 "Village" "Village 3" "2004-05" 24.37 46.24 29.39
1 "City"    "City 1"    "2011-12" 17.43 44.39 38.18
2 "City"    "City 2"    "2011-12" 17.99 44.08 37.93
3 "City"    "City 3"    "2011-12" 18.04 43.62 38.34
4 "Village" "Village 1" "2011-12" 14.95 42.89 42.16
5 "Village" "Village 2" "2011-12" 16.33 44.57 39.11
6 "Village" "Village 3" "2011-12" 28.98 47.14 23.88
end
label var seqCode "seqCode" 
label var popgroup "popGroup" 
label var population "Population" 
label var year "Year" 
label var _50 "0-50" 
label var _90 "50-90" 
label var _100 "90-100"

reshape long _, i(population year) j(which)
label def which 50 "0-50" 90 "50-90" 100 "90-100"
label val which which 
label var which "some explanation here"

set scheme stcolor 
tabplot year which [iw=_] , horizontal by(population, note("") t1title(Income shares) col(1)) ytitle("") showval(offset(0.27) mlabsize(medsmall) format(%2.1f)) subtitle(, nobox nobexpand fcolor(none) pos(9)) separate(which) xsc(r(0.8 .))

Click image for larger version

Name: incomeshares.png
Views: 1
Size: 34.8 KB
ID: 1762958

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35699
#6

03 Sep 2024, 11:45

Also if categories of something are 0-50, 50-90, 90-100, which way would 50 or 90 jump?
1 like
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35699

03 Sep 2024, 14:26

Another take

Code:

clear
input byte seqCode str7 popgroup str9 population str7 year float(_50 _90 _100)
1 "City"    "City 1"    "2004-05" 15.47 44.12 40.42
2 "City"    "City 2"    "2004-05" 21.29 43.25 35.47
3 "City"    "City 3"    "2004-05" 18.04 45.72 36.24
4 "Village" "Village 1" "2004-05" 15.62 46.15 38.24
5 "Village" "Village 2" "2004-05" 18.89 43.84 37.27
6 "Village" "Village 3" "2004-05" 24.37 46.24 29.39
1 "City"    "City 1"    "2011-12" 17.43 44.39 38.18
2 "City"    "City 2"    "2011-12" 17.99 44.08 37.93
3 "City"    "City 3"    "2011-12" 18.04 43.62 38.34
4 "Village" "Village 1" "2011-12" 14.95 42.89 42.16
5 "Village" "Village 2" "2011-12" 16.33 44.57 39.11
6 "Village" "Village 3" "2011-12" 28.98 47.14 23.88
end
label var seqCode "seqCode" 
label var popgroup "popGroup" 
label var population "Population" 
label var year "Year" 
label var _50 "0-50" 
label var _90 "50-90" 
label var _100 "90-100"

reshape long _, i(population year) j(which)
label def which 50 "0-50" 90 "50-90" 100 "90-100"
label val which which 
rename _ Percent 

set scheme stcolor 
graph dot (asis) Percent, over(year) over(which) by(population, compact note("") t1title(Income shares) col(1)) ytitle("") subtitle(, nobox nobexpand fcolor(none) pos(9)) asyvars marker(1, ms(Oh) msize(large)) marker(2, ms(+) msize(large)) linetype(line) lines(lw(thin) lc(gs12)) ytitle(Percent)

Click image for larger version

Name: city_village3.png
Views: 1
Size: 79.3 KB
ID: 1762981

Comment

Anup Tyagi

Join Date: Sep 2024

Posts: 14
#8

03 Sep 2024, 22:22

Originally posted by Nick Cox View Post

Also if categories of something are 0-50, 50-90, 90-100, which way would 50 or 90 jump?

0-50, 50-90 and 90-100 are percentiles: bottom 50% income earners (households), 50-90% of households and top 10% of households, and their income shares.
Comment
Anup Tyagi

Join Date: Sep 2024

Posts: 14
#9

03 Sep 2024, 22:25

Thanks! I like the tabplot. I will give it a try. However, I need it in Black & White. I will have to figure out which scheme works well in B&W. Any suggestion?
Comment
Anup Tyagi

Join Date: Sep 2024

Posts: 14
#10

03 Sep 2024, 22:33

[QUOTE=Nick Cox;n1762917] I'd change the marker symbols and some other details if this were my graph. But I don't see that you really need a third over() option. If you are sure that you do, reach for by() instead. /QUOTE]

Thanks. How would you change it? Any suggestions are welcome. I would like the readers to read and understand the graph easily. I think that the light-grey marker symbols should be changed to a dark color. I think the darkness or lightness of marker symbols is picked by the scheme.
Comment

Anup Tyagi

Join Date: Sep 2024
Posts: 14

#11

03 Sep 2024, 22:44

Originally posted by Nick Cox View Post

Code:

graph dot (asis) _50 _90 _100, over(year, label(labsize(small))) ///
over(population, sort(seqCode)) by(popgroup, note("") col(1) title("Income shares")) ///
marker(1, msymbol(circle)) marker(2, msymbol(circle_hollow)) ///
marker(3, msymbol(+)) scheme(s1mono) nofill ///
legend(row(1)) subtitle(, pos(9) nobox nobexpand fcolor(none))

[ATTACH=CONFIG]n1762929[/ATTACH]

This is interesting. I did not know this was possible.

Comment

Anup Tyagi

Join Date: Sep 2024
Posts: 14

#12

04 Sep 2024, 00:21

Originally posted by Andrew Musau View Post

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte seqCode str7 popgroup str9 population str7 year float(_50 _90 _100)
1 "City" "City 1" "2004-05" 15.47 44.12 40.42
2 "City" "City 2" "2004-05" 21.29 43.25 35.47
3 "City" "City 3" "2004-05" 18.04 45.72 36.24
4 "Village" "Village 1" "2004-05" 15.62 46.15 38.24
5 "Village" "Village 2" "2004-05" 18.89 43.84 37.27
6 "Village" "Village 3" "2004-05" 24.37 46.24 29.39
1 "City" "City 1" "2011-12" 17.43 44.39 38.18
2 "City" "City 2" "2011-12" 17.99 44.08 37.93
3 "City" "City 3" "2011-12" 18.04 43.62 38.34
4 "Village" "Village 1" "2011-12" 14.95 42.89 42.16
5 "Village" "Village 2" "2011-12" 16.33 44.57 39.11
6 "Village" "Village 3" "2011-12" 28.98 47.14 23.88
end
label var seqCode "seqCode"
label var popgroup "popGroup"
label var population "Population"
label var year "Year"
label var _50 "0_50"
label var _90 "50_90"
label var _100 "90_100"

egen cityyear= group(population year), label

graph dot (asis) _50 _90 _100, over(cityyear, label(labsize(small))) ///
over(popgroup, label(angle(vert)) sort(seqcode)) ///
marker(1, msymbol(circle)) marker(2, msymbol(circle_hollow)) ///
marker(3, msymbol(+)) title("Income shares") scheme(s1mono) nofill

[ATTACH=CONFIG]n1762916[/ATTACH]

Thanks. That is an interesting trick, however, I prefer not to have the place (city, village) name repeated for each year.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35699
#13

04 Sep 2024, 02:41

#8 misses the point. If a value were exactly equal to the 50th percentile or the 90th percentile would it go up or down? Perhaps the issue doesn't arise -- if the percentile isn't a value in the data -- or is trivial, but in principle bin limits should be unambiguous. See for example Section 5 of https://journals.sagepub.com/doi/pdf...867X1801800311 and the references given there. You need only an explanation such as "Lower limits are inclusive" somewhere.

#9 scheme(s1mono) is one to try as in your original post.

#10 I suggest that two essentials are that three marker symbols are of equal size and equal visual impact. If they might ever occlude each other, you need open or hollow symbols such as Oh or Th and + works well with any of those,

The most striking detail is that Village 3 is different. You can judge graph designs partly on how clearly they make that point.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10199
#14

04 Sep 2024, 03:57

Originally posted by Anup Tyagi View Post

Thanks. That is an interesting trick, however, I prefer not to have the place (city, village) name repeated for each year.

Doesn't Nick's illustration in #4 using a -by()- option achieve this?
1 like
Comment
Anup Tyagi

Join Date: Sep 2024

Posts: 14
#15

04 Sep 2024, 09:17

Originally posted by Andrew Musau View Post

Doesn't Nick's illustration in #4 using a -by()- option achieve this?

Yes, it does.
Comment

Announcement

Third over() variable in graph dot

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment