How to visualize a two-way table

Gobinda Natak

Join Date: Sep 2016

Posts: 79
#1

How to visualize a two-way table

28 Oct 2018, 11:17

Dear Stata list,

I would like to use Stata to create Figure 3 of this new article:

Lawrence, Matthew. 2018. "Visualizing Income Inequality and Mobility Together." Socius 4:1-3. doi: 10.1177/2378023118805646 (Open Access)

It is basically a two-way table where the rows and columns have different sizes and and cells are colored based on a third variable (sorry, I am lacking the vocabulary to describe the Figure in more technical terms, maybe such a type of figure has a name):

Is there any way to do such a thing in Stata?

Thanks for your consideration
Go
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#2

28 Oct 2018, 14:07

I don't know of any way to do this in Stata, although there are many other Forum members who are far better at graphics than I am and might respond with a way.

But why reproduce that utterly dreadful graph? It's truly hideous. To the extent that it conveys any information at all, it does so using colors. 10% of males are colorblind, and of them, about a fifth specifically will be unable to distinguish the shades shown here. It is almost malicious to make a graph whose information content is obscured in that way. On top of that, even for those who have normal color vision, colors are a poor way of presenting numeric data because there is no natural cognitive ordering of colors. There is a physical ordering, of course, based on wavelength, but that is not perceived, so any attempt to read this graph forces the reader to continually refer back to the legend to know which color represents which probability. On top of that, transition probabilities are inherently continuous, but have been arbitrarily broken into categories. And the categories aren't even informatively labeled: does 0.1 mean exactly 0.1, or 0.5-0.15, or something else?

Going further, the vertical and horizontal scales of the graph are misleading, or the variables are misnamed. Quintiles are, by definition, of (approximately) equal size. So why is so much more space afforded to each quintile than the one just beneath it? Why is the spacing on the axes unequal? Does that convey something else? If so, why is there no information explaining that, and why is nothing quantitative shown about it?

You can do much, much better than that!
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3458
#3

28 Oct 2018, 14:24

ssc desc spineplot

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Gobinda Natak

Join Date: Sep 2016

Posts: 79
#4

28 Oct 2018, 15:03

Clyde, thank you very much, I agree with most of the shortcomings you have pointed out.

Maarten, thank you very much, -spineplot- does create very pretty figures -- but does it allow me to plot a third variable (other than as a number plotted into the bars)?

Thanks again
Gobinda
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

28 Oct 2018, 15:03

(Added in edit: crossed with post #4, but this was written in response to post #3).

Nope, the plot in post #1 is not a spineplot (also known as a mosaic plot).

I looked at the linked paper and understand better what the author of the figure is trying to do. I'm not sure it's the right thing to do, however. I thought I could use spineplot to improve on the plot, or perhaps bend spineplot to produce the plot.

The first problem with the plot is that we weren't given the underlying data that it represents in order to experiment with. I followed the link to the article, and from there the link to the data source, and was able to gather data to reconstruct "Figure 2" from the linked paper which is a 5x5 table (presented as a picture rather than as copyable text) of "transition probabilities" from the parent's income quintile to the child's income quintile. But I couldn't make it to "Figure 3" quoted in post #1.

What's missing from the description of the plot is that the horizontal and vertical axes are scales of "family income" so that each band lies between the upper and lower income values represented in the quintile. That's why the fifth quintile is so large in each dimension - a few very high family incomes. (Also, I expect that's why the parents' fifth quintile is so much larger than the children's fifth quintile - regression to the mean strikes me as one possibility.)

With that said, then, we can understand that the author of the plot is concerned that the magnitude of the difference in income between the first and fifth quintiles is obscured by plotting equal-width bands for each quintile. So he has tried to do better by making each band represent the income range (without actually labelling what that range is) for each quintile.

So while I was able to compute the "transition probabilities" represented by each of the 5x5 blocks, it was going to take some further digging to figure out what the income quintile cutoffs were for the data, and that's where I gave up.

But along with Clyde's concerns, I find the disproportionate size of the 5th x 5th quintile disturbing. In this regard, it's the antithesis of the spineplot, which is designed so that each block is proportional in area to the number of observations it represents, and each vertical band is proportional in width to the proportion of observations the band as a whole represents. (That's really poorly written but I can't quickly do better. I've been working with mosaic plots recently but haven't had to write (or find something to cite) a two-sentence explanation.)

One thing that astonishes me is that the figure shown in post #1, and thus in the paper, puts the lowest values in the upper left corner of the figure, rather than the lower left corner. So parents income increases as you go down the plot, counterintuitively, while children's income increases as you go right, as you would expect. So the figure discards nearly four centuries of experience with the Cartesian coordinate system, so that down is now up, although right remains right.
Comment
Gobinda Natak

Join Date: Sep 2016

Posts: 79
#6

28 Oct 2018, 15:16

William, note that my intention was not to replicate the exact graph shown in the paper -- I would like to be able to create a graph where I can specify the size of rows and columns somehow (like with -spineplot-) and plot a third variable via a reasonably chosen color scheme.

Regarding the last point, "the figure ... puts the lowest values in the upper left corner of the figure, rather than the lower left corner" is probably due to a long-standing convention in that area of research, but what you are saying is actually eye-opening for me. Thanks a lot
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35715
#7

28 Oct 2018, 17:03

Although I would like to give you a different answer, I agree that this is not a spineplot. You could use the same underlying graph commands to draw tiles of various sizes with varying colours.

I would much rather see the transition probabilities shown directly: decoding colours is not very effective so far as I am concerned.

The source for spineplot is the Stata Journal, not SSC.

I haven't read the original paper but presumably the point is that the bins are not of equal width in terms of income. This is not surprising -- or even very interesting but it still dominates the graph.But why bin any way? If you have the incomes, then exploit the detail!
1 like
Comment

Announcement

How to visualize a two-way table

Comment

Comment

Comment

Comment

Comment

Comment