tabplot updated on SSC

Nick Cox

Join Date: Mar 2014

Posts: 35651
#1

tabplot updated on SSC

12 Apr 2016, 08:56

Thanks as always to Kit Baum, the package tabplot on SSC has been
updated with new ado and help files for that program, which goes back to
1999. Stata 8 is required. tabplot is billed as supporting one-, two-
and three-way bar charts for tables, which understates its possibilities
a little, but the whole story need not be given here.

"Multiple bar charts" would be a good umbrella term, except for the need
to explain that doesn't mean stacked or divided bars and it doesn't mean
bars side by side on the same axis (and except for the puzzle that a
single bar would just get lonely, so don't all bar charts have multiple
bars?). (A single bar does not mean a "singles bar".)

The update in code fixes some awkward, indeed deficient, parsing of
calls to the by() option, which ruled out adjustment of a note() call
together with the by() option.

A bigger deal by comparison is much re-writing of the help file, with
restructured explanation of syntax, better-explained and more numerous
examples, and many more references since the last update several months
ago.

If interested, then use

Code:

ssc inst tabplot

to install afresh or

Code:

ssc inst tabplot, replace

to update an existing installation; some readers may be using

Code:

adoupdate

instead.

Bar charts are basic, and may seem very well supported in Stata, as only
a little acquaintance with the documentation reveals four commands,
graph bar, graph hbar, twoway bar and twoway rbar, which might seem
already three more than one might need.

Another command for bar charts (or more; I have others) thus needs a
little explanation. This one is itself just a wrapper for twoway rbar,
but it can do various plots more easily than you could do yourself,
unless you were willing to do a little programming and a lot of fiddling
around.

The main conceit of tabplot is table-like plots. The name is intended to
evoke commands like tabulate with their structured output of tables in
rows and columns.

Incidentally, I note that there is a tabplot package for R with its main
command tableplot; an old Stata command of mine called tableplot also
exists on SSC, but its main capabilities have long since been folded
into tabplot. I don't doubt that tabplot on R is good, but I've never
used it or studied its documentation closely. I am pretty sure that I
used the name first, not that I mind so long as the name remains
distinct within Stata.

Clearly the help file is there with the details you are expected to
want, so the best I can now do for anyone curious is to give a couple of
self-contained examples, together with a moderate sales pitch.

Other applications of tabplot can be found at

http://www.statalist.org/forums/foru...-and-subgraphs

http://www.statalist.org/forums/foru...something-else

http://www.statalist.org/forums/foru...d-with-grc1leg

http://www.statalist.org/forums/foru...lot-or-tabplot

http://stats.stackexchange.com/quest...inal-variables

http://stats.stackexchange.com/quest...ical-variables

Greenacre (2007, p.42; full reference below) gave these data from the
Encuesta Nacional de la Salud (Spanish National Health Survey), 1997.
They are interesting in themselves, but for my purposes they are useful
as an example large enough to be challenging. As with many tables, the
main handle for understanding is to look at the probability distribution
of the response health given the predictor age. tabplot offers options
to calculate percent or proportional/fractional breakdowns on the fly.
Aesthetic preferences or conventions often encourage presentation in
terms of percents. ("Percentage" seems to me too long a word, whatever
dictionaries may say.)

Code:

clear input byte(agegroup health) long freq 1 1 243 1 2 789 1 3 167 1 4 18 1 5 6 2 1 220 2 2 809 2 3 164 2 4 35 2 5 6 3 1 147 3 2 658 3 3 181 3 4 41 3 5 8 4 1 90 4 2 469 4 3 236 4 4 50 4 5 16 5 1 53 5 2 414 5 3 306 5 4 106 5 5 30 6 1 44 6 2 267 6 3 284 6 4 98 6 5 20 7 1 20 7 2 136 7 3 157 7 4 66 7 5 17 end label values agegroup agegroup label def agegroup 1 "16-24", modify label def agegroup 2 "25-34", modify label def agegroup 3 "35-44", modify label def agegroup 4 "45-54", modify label def agegroup 5 "55-64", modify label def agegroup 6 "65-74", modify label def agegroup 7 "75+", modify label values health health label def health 1 "very good", modify label def health 2 "good", modify label def health 3 "regular", modify label def health 4 "bad", modify label def health 5 "very bad", modify tabplot health agegroup [w=freq] , percent(agegroup) showval subtitle(% of age group) xtitle("") bfcolor(none)

What particularly bites here are some very small percents, which are
perfectly credible and not at all unusual for such data. A merit of the
multiple bar charts design is that small values are discernible as such.
Note especially the showval option, which insists on showing values too.

The graph thus deliberately uses table ideas and graph ideas together.
Sometimes people say to me, "But you shouldn't do that!" and some
prohibition emerges that graphs are graphs and tables and tables, and
ne'er the twain shall meet, which seems to me no more than superstition.

Digression. An intriguing suggestion, which I have borrowed elsewhere,
is that the conventional distinction between graphs and tables was a
side-effect of the development of printing. Before printing there were
manuscripts -- those scripted manually, or written by hand -- to which
writers could add illustrations, say of knights, or dragons, or of
sinners being tormented, or something equally entertaining, as they
liked and where they liked. Printed documents encouraged, or even
enforced, a division of labour between typesetters and those who
prepared illustrations. But now that's obsolete.

A detailed objection to numeric values too is that they clutter up the
graph, to which the answers are it depends on how you do it, and if
you strongly object it's not compulsory. But tabplot gives up on
labelling axes with bar magnitudes, so that reduces clutter too.

Given this dataset, how else would you represent the patterns
graphically? Setting aside any temptation to draw multiple pie charts,
one alternative is a stacked bar chart:

Code:

* ssc inst catplot needed before catplot health agegroup [w=freq], percent(agegroup) asyvars stack subtitle(% of age group)

In recent Stata versions, graph hbar could also do this directly, but the syntax
differs.

I have not tried to hard to optimise this: the colour scheme and legend both need work,
and so forth. Some would prefer vertical bars here.

The key point is whether it could be made better (clearer, more effective,
more attractive) than the previous graph. I note three key issues:

1. Stacking is a well-understood design but very small amounts are hard to work
to discern.

2. A legend necessarily springs into being, but a legend obliges mental "back
and forth" from readers (or else readers give up on looking at the detail).

3. The program would let you add numeric values on top of the bars, but that would
be at least a little messy.

Naturally this is a straw graph that I set up to knock down again, but are there good
alternatives? I've had better results with unstacked bars for this example, but I
will move on.

Let's look at graphs for a three-way table.

Aitkin et al. (1989, p.242; full reference below) reported data from a
survey of student opinion on the Vietnam War taken at the University of
North Carolina in Chapel Hill in May 1967. Students were classified by
sex, year of study, and the policy they supported, given choices of

A. The United States should defeat the power of North Vietnam by
widespread bombing of its industries, ports, and harbors and by land
invasion.

B. The United States should follow the present policy in Vietnam.

C. The United States should de-escalate its military activity, stop
bombing North Vietnam, and intensify its efforts to begin negotiation.

D. The United States should withdraw its military forces from Vietnam
immediately.

The labels A ... D are fairly dopey, but even at this distance
suggesting better ones might be thought contentious politically, so I
will desist.

Code:

clear input str6 sex str8 year str1 policy int freq "male" "1" "A" 175 "male" "1" "B" 116 "male" "1" "C" 131 "male" "1" "D" 17 "male" "2" "A" 160 "male" "2" "B" 126 "male" "2" "C" 135 "male" "2" "D" 21 "male" "3" "A" 132 "male" "3" "B" 120 "male" "3" "C" 154 "male" "3" "D" 29 "male" "4" "A" 145 "male" "4" "B" 95 "male" "4" "C" 185 "male" "4" "D" 44 "male" "Graduate" "A" 118 "male" "Graduate" "B" 176 "male" "Graduate" "C" 345 "male" "Graduate" "D" 141 "female" "1" "A" 13 "female" "1" "B" 19 "female" "1" "C" 40 "female" "1" "D" 5 "female" "2" "A" 5 "female" "2" "B" 9 "female" "2" "C" 33 "female" "2" "D" 3 "female" "3" "A" 22 "female" "3" "B" 29 "female" "3" "C" 110 "female" "3" "D" 6 "female" "4" "A" 12 "female" "4" "B" 21 "female" "4" "C" 58 "female" "4" "D" 10 "female" "Graduate" "A" 19 "female" "Graduate" "B" 27 "female" "Graduate" "C" 128 "female" "Graduate" "D" 13 end tabplot policy year [w=freq], by(sex, subtitle(% by sex and year, place(w)) note("")) percent(sex year) showval

The way to plot three-way tables is unsurprisingly by using a by() option to repeat two-way tables.
The syntax for tabplot matches standard conventions such that (as in regress and scatter, for
example) it is usually best to mention the response or outcome variable first (as defining rows of
the plot, and as to be shown on the y axis). There can be trade-offs or compromises,
as no layout is best for all purposes, but big differences can safely be put at a distance (so
males and females here differ markedly in their mix of views), while finer distinctions are
easier to make if bars are close. On top of all that, any ordinal scales should naturally be
respected as such.

Aitkin, M., D. Anderson, B. Francis, and J. Hinde. 1989. Statistical
Modelling in GLIM. Oxford: Oxford University Press

Greenacre, M. 2007. Correspondence analysis in practice. Boca Raton, FL:
Chapman & Hall/CRC
Tags: None

4 likes
Nick Cox

Join Date: Mar 2014

Posts: 35651
#2

14 Apr 2016, 05:31

For comparison, here is a spineplot (many people say "mosaic plot") for the first dataset. The program should be downloaded from the Stata Journal site, except that I am cheating in using an updated version not yet publicly available.

Code:

spineplot health age [w=freq], bar1(color(gs4)) bar2(color(gs8)) bar3(color(blue*0.2)) bar4(color(blue*0.6)) bar5(color(blue)) xla(, labsize(*0.8) axis(2)) percent xla(0(20)100) yla(0(20)100, axis(2))

I worked harder on the colour scheme than on the corresponding stacked bar chart.

This does a better job at showing the differences in age group frequencies than any other design shown because they ignore that.

The overall pattern of change comes over quite well.
3 likes
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35651

25 Apr 2016, 11:18

I tried a bit harder with the stacked bar chart.

Code:

graph bar (count) [fw=freq], over(health, descending) over(agegroup) percent subtitle(% of age group) stack asyvars bar(5, bfcolor(red*0.8)) bar(4, bfcolor(red*0.3) blcolor(red*0.8)) bar(3, bfcolor(blue*0.2) blcolor(blue*1.2)) bar(2, bfcolor(blue*0.7) blcolor(blue*1.2)) bar(1, bcolor(blue*1.2)) legend(pos(3) col(1)) ysc(r(-5 100)) yla(, ang(h))

This syntax requires Stata 14 or Stata13 with updates at least to 9 October 2014.

Click image for larger version

Name: stackedbar.png
Views: 1
Size: 42.8 KB
ID: 1337387

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35651
#4

12 Jul 2016, 07:50

Now written up at http://www.stata-journal.com/article...article=gr0066
2 likes
Comment
Hassen Ali

Join Date: May 2018

Posts: 39
#5

01 Jul 2018, 08:27

Dear my respected Nick Cox, Thank you very much!! I have learned a lot from your daily posts. I wish you have an endless happiness and success in your life!!
Respectfully, Hassen
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35651
#6

01 Jul 2018, 23:46

Hassen: Thanks for those kind words, which I much appreciate.
1 like
Comment
Jonas Jakobi

Join Date: Sep 2018

Posts: 19
#7

23 Sep 2018, 03:42

Nick Cox , I was desperately looking for a feasible solution to graph the relationship between an ordinal response and ordinal predictor variable. Till today I did't have a satisfactory solution. However, after reading this post and the article you linked, I found something I liked. Thank you so much for your help!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35651
#8

23 Sep 2018, 04:13

Jonas Jakobi Excellent! Thanks for writing in.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35651

15 Jul 2019, 12:21

Thanks to Kit Baum, version 2.8.0 of tabplot has been posted on SSC. For the moment, this is the most up-to-date public version. The most notable change is the addition of a frame() option, illustrated here:

Click image for larger version

Name: tabplot_frame2.png
Views: 1
Size: 30.6 KB
ID: 1507783

Each bar is shown framed. Here's sample data and code to make it reproducible:

Code:

clear
input str6 sex str8 year str1 policy int freq
"male" "1" "A" 175
"male" "1" "B" 116
"male" "1" "C" 131
"male" "1" "D" 17
"male" "2" "A" 160
"male" "2" "B" 126
"male" "2" "C" 135
"male" "2" "D" 21
"male" "3" "A" 132
"male" "3" "B" 120
"male" "3" "C" 154
"male" "3" "D" 29
"male" "4" "A" 145
"male" "4" "B" 95
"male" "4" "C" 185
"male" "4" "D" 44
"male" "Graduate" "A" 118
"male" "Graduate" "B" 176
"male" "Graduate" "C" 345
"male" "Graduate" "D" 141
"female" "1" "A" 13
"female" "1" "B" 19
"female" "1" "C" 40
"female" "1" "D" 5
"female" "2" "A" 5
"female" "2" "B" 9
"female" "2" "C" 33
"female" "2" "D" 3
"female" "3" "A" 22
"female" "3" "B" 29
"female" "3" "C" 110
"female" "3" "D" 6
"female" "4" "A" 12
"female" "4" "B" 21
"female" "4" "C" 58
"female" "4" "D" 10
"female" "Graduate" "A" 19
"female" "Graduate" "B" 27
"female" "Graduate" "C" 128
"female" "Graduate" "D" 13
end
set scheme s1color

tabplot policy year [w=freq], by(sex, subtitle(% by sex and year, place(w)) note("")) percent(sex year) showval name(G1)
tabplot policy year [w=freq], by(sex, subtitle(% by sex and year, place(w)) note("")) percent(sex year) showval frame(100) name(G2)

Notice the frame(100) option on the second version (shown above).

Beyond that, the help file continues to grow quietly, with extra references as I find them.

Comment

River Huang

Join Date: Mar 2016

Posts: 1908
#10

15 Jul 2019, 17:04

Dear Nick, Thanks for this extra interesting feature.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment
Marc Kaulisch

Join Date: Jan 2016

Posts: 182
#11

19 Jul 2019, 02:18

Dear Nick, I am puzzled. In which repository do you update and maintain -tabplot-? I somehow was under the impression that you prefer the Stata Journal repository for -tabplot- but it looks like this update is only available at SSC, or?
At least the gr0066_1 (http://www.stata-journal.com/software/sj17-3) as not seen an update yet.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35651
#12

19 Jul 2019, 02:28

tabplot was maintained on SSC over most of its history until I wrote it up in the Stata Journal in 2016. Then I updated it in the same place in 2017. I will send another update to the Editors shortly but updates there are subject to a 3 month cycle.

Updates on SSC are subject to a delay of more like 3 hours or 3 days depending on Kit Baum's travels and how busy he is and the position of the Sun over Boston, MA. .

I hinted at tabplot 2.8.0 in a recent post here

https://www.statalist.org/forums/for...-subtitle-size

so I was minded to get an update out quickly on SSC for anybody who cared. Indeed, it was you who expressed interest in that, so there you go: the main reason I put this on SSC quickly is your own comment.

Last edited by Nick Cox; 19 Jul 2019, 02:31.
1 like
Comment
Marc Kaulisch

Join Date: Jan 2016

Posts: 182
#13

19 Jul 2019, 04:32

Nick, thank you very much. I switched to SSC. I appreciate your care about user feedback.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35651
#14

05 Sep 2020, 04:40

The update alluded to in #12 is forthcoming in Stata Journal 20(3) 2020. As you can tell, I didn't treat the task as urgent once the code was updated on SSC.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35651
#15

10 Apr 2022, 01:51

Following a bug report in https://www.statalist.org/forums/for...rcent-fraction a fixed version of tabplot (2.8.1) is now available at SSC. Thanks as usual to Kit Baum for prompt posting.

A formal update will follow in the Stata Journal.

The bug might bite you if you use set dp comma but if you use tabplot at all you would benefit also from an extended help file with further examples and references.
1 like
Comment

Announcement