Tabplot: How to get legend for variable used in separate()

Ulrich Sieberer

Join Date: Feb 2018

Posts: 4
#1

Tabplot: How to get legend for variable used in separate()

22 Feb 2018, 17:12

Dear list,

I used the excellent tabplot command by Nick Cox (from SSC) in Stata 15.1 to create a frequency plot that graphically reproduces a twoway table of frequencies. I further use the separate() option to get different coloring for the bars associated with the categories of a third variable.
The following example reproduces my code using the auto dataset (admittedly a substantively meaningless graph...):

Code:

sysuse auto, clear tabplot headroom trunk, separate(foreign) bar1(bcol(gs2)) bar2(bcol(gs12))

This yields the following graph:

Now my question: Is there a way to include a legend that explains the meaning of the different bar colors as defined by separate() (in my case these are the two categories of foreign)?

Thank you very much for any suggestions and best regards,
Ulrich
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35697

22 Feb 2018, 17:43

tabplot is from the Stata Journal as well as SSC.

The trick is to turn the legend(on) and then see which variables are being plotted.

In this case and others tabplot isn't smart about occlusion. It certainly doesn't stack bars within cells. So watch out for cases where your row and column variables have differing values on the third variable. I should document this in more detail. I am reluctant to implement side-by-side bars.

Code:

sysuse auto, clear
set scheme s1color 

tabplot headroom trunk, separate(foreign) bar1(bcol(gs2)) bar2(bcol(gs12)) legend(on) 

tabplot headroom trunk, separate(foreign) bar1(bcol(gs2)) bar2(bcol(gs12)) ///
legend(on) legend(order(2 "Domestic" 3 "Foreign")) 

tabplot headroom trunk, separate(foreign) bar1(bcol(red)) bar2(bcol(blue)) ///
legend(on) legend(order(2 "Domestic" 3 "Foreign") pos(7) ring(0) col(1)) ///
barall(bfcolor(none))

Problems with occlusion (groups is from Stata Journal):

Code:

. bysort head trunk (foreign) : gen diff = foreign[1] != foreign[_N]

. groups head trunk foreign if diff, sepby(head trunk) show(f)

  +-------------------------------------+
  | headroom   trunk    foreign   Freq. |
  |-------------------------------------|
  |      2.0       8   Domestic       1 |
  |      2.0       8    Foreign       1 |
  |-------------------------------------|
  |      2.0      11   Domestic       1 |
  |      2.0      11    Foreign       1 |
  |-------------------------------------|
  |      2.0      16   Domestic       4 |
  |      2.0      16    Foreign       1 |
  |-------------------------------------|
  |      2.5      11   Domestic       2 |
  |      2.5      11    Foreign       2 |
  |-------------------------------------|
  |      3.0       9   Domestic       1 |
  |      3.0       9    Foreign       1 |
  |-------------------------------------|
  |      3.0      10   Domestic       1 |
  |      3.0      10    Foreign       2 |
  |-------------------------------------|
  |      3.0      15   Domestic       1 |
  |      3.0      15    Foreign       3 |
  +-------------------------------------+

Comment

Ulrich Sieberer

Join Date: Feb 2018

Posts: 4
#3

23 Feb 2018, 02:19

Thank you, Nick, for the quick and valuable advice!

I have one follow-up question on a complication I left out in my earlier post: I further specify the by() option to produce separate graphs for categories of yet another variable. When adding your suggestion on the legend, I get one legend for each subgraph. I tried the usual advice of specifying the content of the legend in the overall code and the position of the legend within the by() option (as indicated in help legend_option under Remarks: Use of legends with by()), but the command does not accept legend within the by() option.

Here is what I tried continuing my fake example and using a binary variable of above/below average price in by(). This is the code that produces a graph with one legend per subgraph:

Code:

sysuse auto, clear set scheme s1mono summ price gen price_bin=cond(price>`r(mean)', 1, 0) tabplot headroom trunk, by(price_bin) /// separate(foreign) bar1(bcol(gs2)) bar2(bcol(gs12)) /// legend(on) legend(order(2 "Domestic" 3 "Foreign"))

This is the resulting graph

In my substantive case, the by-variable has six and the separate-variable three values so that having one legend per subgraph is not really an option.
Here is my extended attempt of the last command that tries to create a single legend by adding a legend position within the by() option. This command results in the error message "option legend() not allowed"

Code:

tabplot headroom trunk, by(price_bin, legend(pos(6))) /// separate(foreign) bar1(bcol(gs2)) bar2(bcol(gs12)) /// legend(on) legend(order(2 "Domestic" 3 "Foreign"))

I tried the following work-around by plotting the graphs for the subgroups separately (via a loop) and combining them with Vince Wiggins grc1leg (from http://www.stata.com/users/vwiggins), but this yields other problems because the scales of the graphs for subgroups differ and the xcommon and ycommon options of grc1leg do not seem to work:

Code:

forvalues i=0/1 { tabplot headroom trunk if price_bin==`i', /// separate(foreign) bar1(bcol(gs2)) bar2(bcol(gs12)) /// legend(on) legend(order(2 "Domestic" 3 "Foreign")) xtitle("") ytitle("") /// name(graph_`i', replace) } grc1leg graph_0 graph_1, col(2) l1("Headroom(in.)") b1("Trunk space (in.)") /// ycommon xcommon

Is there some other trick to obtain a single legend when using by()?

Thank you in advance for any suggestions!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#4

23 Feb 2018, 05:27

I've looked at this for a while and I don't have an answer for your immediate want. As the program author I can speak to its intent and specifically what the separate() option is designed to do. .

The main idea of tabplot is to use one design only, single bars within each combination of row and/or column (and possibly panel too) and to let as many of row, column and panel identifiers as are used indicate what is what.

Underlying that design one premise is that legends are at best a necessary evil and so should be avoided wherever possible. Hence some of the code is devoted to suppressing legends.

The separate() option was intended to allow more colourful graphs where desired -- and I've wanted this too -- whereby different values of row or column or panel variables could be shown differently. If that is done, then there is still no need for a legend, as what is what is explained on the margins of each graph.

What you're doing is using a different variable as argument to separate() that isn't the row or column or panel variable. So, understandably you now need a legend to explain.

But, but, but:

1. Despite a fair amount of experience I still struggle with legends and by(). I take the existence of grc1leg (which counts as community-contributed, except that the author is Vince Wiggins, who is a leading Stata developer and was heavily involved in programming the current Stata graphics) as an admission that there are reasonable things you might want of the legend that are hard or even impossible to get through the graph syntax.

2. The killer for me is that you are raising the possibility that a user asks implicitly to put two or more bars into the same cell on the graph. As the example in #1 showed, there is then a serious possibility that bars may occlude each other. I am not going to complicate (frustrate) the main design by stacking such bars or putting them side-by-side. Although this wasn't your purpose I am convinced by your example, that separate() should only allow a variable that appears on an axis or panel. I don't want to allow misleading or ambiguous plots, and so the next version of tabplot will be strict on this point.

Thanks for an example that showed the need to tighten up the code (and also the help).
Comment
Ulrich Sieberer

Join Date: Feb 2018

Posts: 4
#5

26 Feb 2018, 00:53

Thanks for the clarification, Nick! I understand your point even though it is a pity for my purpose which is simply to distinguish graphically between different groups that are not involved in the aggregation for the graph (actually, in my substantive research, I plot levels of party unity in voting across different parties (rows) and legislative periods (columns) and would like to color the bars to indicate whether the party in question was in government or opposition). Anyways, thanks again for your help and for providing the tabplot command.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#6

26 Feb 2018, 01:55

stripplot (SSC) may be of help. Its separate() option really does lead to separate displays.
Comment

Chen Samulsion

Join Date: Jan 2018
Posts: 914

17 Jul 2018, 20:13

Thank you Nick Cox. This post resolve my problem. However, the key in cipher which Sieberer tried to find is hard for us to explore, I think the -legend(on)- option should be added in help files.
Alongside, I still have a problem as to legend. When I set legend of the third variable which was filled in -separate()- option or you may call it supercolvar, the symbol was invisible. I must correct the value of it from 1 and 2 to 3 and 4 to get my desirable results. So I want to know the mechanism behind this operation. Thank you.
The data I use as example is from Aitkin et al. (1989), see https://www.statalist.org/forums/for...updated-on-ssc. I modify the data in order to avoid overlapping of sex with the other two variables. ps. -tablecol- (SSC) is authored by Nicholas Winter. The command line and data is as belows.

Code:

tablecol policy year sex [w=freq], row col scol overall rowpct nofreq
tabplot policy year [w=freq], separate(sex) percent(policy) showval legend(on) legend(order(1 "Male" 2 "Female"))
tabplot policy year [w=freq], separate(sex) percent(policy) showval legend(on) legend(order(3 "Male" 4 "Female"))

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 sex str9 year str1 policy int freq
"male"   "1" "A" 175
"male"   "1" "B" 116
"male"   "1" "C" 131
"male"   "1" "D"  17
"male"   "2" "A" 160
"male"   "2" "B" 126
"male"   "2" "C" 135
"male"   "2" "D"  21
"male"   "3" "A" 132
"male"   "3" "B" 120
"male"   "3" "C" 154
"male"   "3" "D"  29
"male"   "4" "A" 145
"male"   "4" "B"  95
"male"   "4" "C" 185
"male"   "4" "D"  44
"female" "5" "A"  13
"female" "5" "B"  19
"female" "5" "C"  40
"female" "5" "D"   5
"female" "6" "A"   5
"female" "6" "B"   9
"female" "6" "C"  33
"female" "6" "D"   3
"female" "7" "A"  22
"female" "7" "B"  29
"female" "7" "C" 110
"female" "7" "D"   6
"female" "8" "A"  12
"female" "8" "B"  21
"female" "8" "C"  58
"female" "8" "D"  10
end

Click image for larger version

Name: tabplot with legend wrong.png
Views: 1
Size: 52.5 KB
ID: 1453910

Click image for larger version

Name: tabplot with legend.png
Views: 1
Size: 52.8 KB
ID: 1453911

Last edited by Chen Samulsion; 17 Jul 2018, 20:18.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35697
#8

18 Jul 2018, 02:06

I take the point about better documentation of how to use a legend with tabplot. From my point of view as author of tabplot there is one and only one defensible use of a legend, when the separate() option is used in a way that doesn't correspond to what is otherwise explained on the axes.

In the first graph you shown there are no marker symbols because none are used to show the corresponding variables. The numeric values are shown with marker labels.

In any case you've mangled the data example here as the females aren't in years 5 6 7 8 but in years 1 2 3 4.

From my point of view the by() option is the natural way to show and explain males and females separately (so that the legend can be dispensed with). The advice is the same for any similar rows x columns x panels structure. That is discussed in the help.

Last edited by Nick Cox; 18 Jul 2018, 02:12.
Comment
Chen Samulsion

Join Date: Jan 2018

Posts: 914
#9

18 Jul 2018, 02:48

Nick, thank you very much. You are right that I should have used original data of Aitkin et al.. However, with the orginal data the results generated by -tablecol- show that there are gender overlapping in each bar. For example, the first bar in top-left has a value of 188 which consists of 175 male and 13 female both belongs to year 1 & policy A, although the bar is colored in blue that suggests it is a "male" bar (and legend symbol will also suggests the same). I've mangled the data to avoid redundant puzzle, although that seems now an unnecessary destruction.
Comment

Announcement