new package stdtable available on SSC

Maarten Buis

Join Date: Mar 2014
Posts: 3449

new package stdtable available on SSC

19 May 2016, 02:18

Thanks to Kit Baum a new package, stdtable, is now available from SSC. It can be installed by typing in Stata ssc install stdtable.

stdtable standardizes a cross tabulation such that the marginal distributions (row and column totals) correspond to some pre-specified distribution, a technique that goes back to at least (Yule 1912). The purpose is to display the association that exists in the table nett of the marginal distributions. Consider the example below:

Code:

use "http://www.maartenbuis.nl/software/mob.dta", clear
(mobility table from the USA collected in 1973)

tab row col [fw=pop]

       Father's |                    Son's occupation
     occupation | upper non  lower non  upper man  lower man       farm |     Total
----------------+-------------------------------------------------------+----------
upper nonmanual |     1,414        521        302        643         40 |     2,920
lower nonmanual |       724        524        254        703         48 |     2,253
   upper manual |       798        648        856      1,676        108 |     4,086
   lower manual |       756        914        771      3,325        237 |     6,003
           farm |       409        357        441      1,611      1,832 |     4,650
----------------+-------------------------------------------------------+----------
          Total |     4,101      2,964      2,624      7,958      2,265 |    19,912

There are many more people that went from a farm to lower manual than the other way around. However, the number of people in agriculture strongly declined so sons had to leave the farm. Moreover, the number of people in lower manual occupations were on the increase, offering room for those sons that had to leave their farm. We may be interested in knowing if this asymmetry is completely explained by these changes in the marginal distribution, or if there is more to it.

Code:

stdtable row col [fw=pop], cellwidth(9)

-----------------------------------------------------------------------------
Father's        |                      Son's occupation                      
occupation      | upper non lower non upper man lower man      farm     Total
----------------+------------------------------------------------------------
upper nonmanual |      41.7      23.6      17.3      13.1      4.23       100
lower nonmanual |        27        30      18.4      18.1      6.42       100
   upper manual |      15.9      19.9      33.2      23.2      7.73       100
   lower manual |      11.1      20.6        22      33.8      12.5       100
           farm |       4.3      5.78      9.03      11.7      69.1       100
                |
          Total |       100       100       100       100       100       500
-----------------------------------------------------------------------------

These standardized counts can be interpreted as the row and column percentages that would occur if for both fathers and sons each occupation was equally likely. It appears that the apparent asymmetry was almost entirely due to changes in the marginal distributions. Also, it is now much clearer that farming is much more persistent over generations than the other occupations.

Standardizing cross-tabulations also help when comparing tables across groups. In the example below we look at the race of husbands and wives in the USA for married couples whose husbands were born born between 1821 and 1989. We can see that the racial boundaries have become a bit more permeable over time, but that the USA is still very far removed from being a melting pot. In this example I also use Nick Cox's tabplot, which is also available from SSC, to graph the results

Code:

. use "http://www.maartenbuis.nl/software/interracial.dta", clear
(husband's and wife's race in the USA from the census and ACS 1880-2014)

. qui stdtable hrace wrace [fw=_freq], by(coh) replace

. tabplot hrace coh [iw=std],                       ///
>    by(wrace, compact cols(3) note(""))            ///
>    xtitle("husband's birth cohort" "wife's race") ///
>    xlab(1(2)18,angle(35) labsize(vsmall))

Click image for larger version

Name: Graph.png
Views: 1
Size: 24.7 KB
ID: 1341565

Yule, U. (1912) On the methods of measuring association between two attributes, Journal of the Royal Statistical Society, 75(6):579-652.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35651
#2

19 May 2016, 03:17

Interesting! See also mstdize (SSC).
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3449
#3

19 May 2016, 03:27

I did not know mstdize. A quick look suggests that both mstdize and stdtable use the same algorithm. stdtable seems to have a bit more bells and wistles. Everybody has to deside for themselves whether that is a good or a bad thing.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35651
#4

19 May 2016, 04:24

Maarten: I am sure you are right. This algorithm has been reinvented or rediscovered many times.

I am pleased you noticed G.U. Yule in 1912.

Deming and Stephan were there long before categorical data analysis started rediscovering it in the 1970s. I think you can get the results out of a Poisson regression with offsets somehow. Entropy-maximising is another buzzword. Economists will think of Richard Stone (RAS method) and biproportional matrices.

Kruithof, J. 1937. Calculation of telephone traffic. De Ingenieur 52: E15–E25 is a fairly early reference often omitted from statistical discussions.

Last edited by Nick Cox; 19 May 2016, 04:51.
Comment

Maarten Buis

Join Date: Mar 2014
Posts: 3449

19 May 2016, 07:37

I happend to start out with using poisson, but it turned out that using IPF is quicker (it requires more iterations, but each iteration is very quick) and more stable when you have 0s in your table. Here is an example of how the trick with poisson works:

Code:

. use "http://www.maartenbuis.nl/software/mob.dta", clear
(mobility table from the USA collected in 1973)

. stdtable row col [fw=pop], cellwidth(9)

----------------------------------------------------------------------------------
Father's        |                         Son's occupation                        
occupation      | upper non  lower non  upper man  lower man       farm      Total
----------------+-----------------------------------------------------------------
upper nonmanual |      41.7       23.6       17.3       13.1       4.23        100
lower nonmanual |        27         30       18.4       18.1       6.42        100
   upper manual |      15.9       19.9       33.2       23.2       7.73        100
   lower manual |      11.1       20.6         22       33.8       12.5        100
           farm |       4.3       5.78       9.03       11.7       69.1        100
                |
          Total |       100        100        100        100        100        500
----------------------------------------------------------------------------------

. gen target = 100/5

. qui poisson target i.row i.col, exposure(pop)

. predict mu
(option n assumed; predicted number of events)

. tabdisp row col, cell(mu) cellwidth(9) format(%9.3g)

-----------------------------------------------------------------------
Father's        |                   Son's occupation                   
occupation      | upper non  lower non  upper man  lower man       farm
----------------+------------------------------------------------------
upper nonmanual |      41.7       23.6       17.3       13.1       4.23
lower nonmanual |        27         30       18.4       18.1       6.42
   upper manual |      15.9       19.9       33.2       23.2       7.73
   lower manual |      11.1       20.6         22       33.8       12.5
           farm |       4.3       5.78       9.03       11.7       69.1
-----------------------------------------------------------------------

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------

Comment

Maarten Buis

Join Date: Mar 2014
Posts: 3449

26 Jan 2017, 03:05

Thanks to Kit Baum a new version of the stdtable package is now available on SSC. To install it type in Stata ssc install stdtable, replace . It adds the row and col options. These result in standardized row or column percentages in the case of non-square tables. In square tables (the same number of rows as columns) the standardized counts can be interpreted as both row and column percentages. Here is an example of such a non-square table:

Code:

. use "http://www.maartenbuis.nl/software/husb.dta", clear
(based on Cumulated German General Social Survey 1980-2012)

. tab east husb_career [fw=freq], cel nofreq

 region of |    wife should support husband's career
 residence | strongly       agree   disagree  strongly  |     Total
-----------+--------------------------------------------+----------
      west |      8.69      15.92      24.45      19.70 |     68.77
      east |      2.27       5.22      12.12      11.62 |     31.23
-----------+--------------------------------------------+----------
     Total |     10.96      21.14      36.57      31.32 |    100.00

It is hard to compare the cell percentages with one another because there are more people in West-Germany as in East-Germany and in general people are more likely to disagree with that statement. We can take out the effect of the marginal distribution of region by asking for row percentages, and take out the effect of the marginal distribution of opinion by computing column percentages. However to take out the effect of both margins simultaneously we need to use the stdtable package:

Code:

. stdtable east husb_career [fw=freq], cellwidth(10)

----------------------------------------------------------------------
region of |            wife should support husband's career          
residence | strongly a       agree    disagree  strongly d       Total
----------+-----------------------------------------------------------
     west |       15.1        13.7        11.1        10.1          50
     east |       9.92        11.3        13.9        14.9          50
          |
    Total |         25          25          25          25         100
----------------------------------------------------------------------

These standardized counts can be interpreted as the cell percentages that would have occurred if there are an equal number of respondents in the east and the west and an equal number of respondents that strongly agreed, agreed, disagreed and strongly disagreed. However, at least in my field using cell percentages is fairly uncommon. Instead row or column percentages are more commonly used. That is what the new row can col options are for.

Code:

. stdtable east husb_career [fw=freq], cellwidth(10) row

----------------------------------------------------------------------
region of |            wife should support husband's career          
residence | strongly a       agree    disagree  strongly d       Total
----------+-----------------------------------------------------------
     west |       30.2        27.4        22.3        20.1         100
     east |       19.8        22.6        27.7        29.9         100
          |
    Total |         25          25          25          25         100
----------------------------------------------------------------------

Last edited by Maarten Buis; 26 Jan 2017, 03:06. Reason: how to install stdtable

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------

Comment

Kenneth Macdonald

Join Date: Apr 2023

Posts: 1
#7

04 Apr 2023, 03:36

Anyone interested in following up Nick Cox's reference to Kruithof's 1937 paper in antique Dutch, will find an intelligent English translation here: https://wwwhome.ewi.utwente.nl/~ptde...anslation.html
Comment

Announcement

new package stdtable available on SSC

Comment

Comment

Comment

Comment

Comment

Comment