Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • new package stdtable available on SSC

    Thanks to Kit Baum a new package, stdtable, is now available from SSC. It can be installed by typing in Stata ssc install stdtable.

    stdtable standardizes a cross tabulation such that the marginal distributions (row and column totals) correspond to some pre-specified distribution, a technique that goes back to at least (Yule 1912). The purpose is to display the association that exists in the table nett of the marginal distributions. Consider the example below:

    Code:
    use "http://www.maartenbuis.nl/software/mob.dta", clear
    (mobility table from the USA collected in 1973)
    
    tab row col [fw=pop]
    
           Father's |                    Son's occupation
         occupation | upper non  lower non  upper man  lower man       farm |     Total
    ----------------+-------------------------------------------------------+----------
    upper nonmanual |     1,414        521        302        643         40 |     2,920
    lower nonmanual |       724        524        254        703         48 |     2,253
       upper manual |       798        648        856      1,676        108 |     4,086
       lower manual |       756        914        771      3,325        237 |     6,003
               farm |       409        357        441      1,611      1,832 |     4,650
    ----------------+-------------------------------------------------------+----------
              Total |     4,101      2,964      2,624      7,958      2,265 |    19,912
    There are many more people that went from a farm to lower manual than the other way around. However, the number of people in agriculture strongly declined so sons had to leave the farm. Moreover, the number of people in lower manual occupations were on the increase, offering room for those sons that had to leave their farm. We may be interested in knowing if this asymmetry is completely explained by these changes in the marginal distribution, or if there is more to it.

    Code:
    stdtable row col [fw=pop], cellwidth(9)
    
    -----------------------------------------------------------------------------
    Father's        |                      Son's occupation                      
    occupation      | upper non lower non upper man lower man      farm     Total
    ----------------+------------------------------------------------------------
    upper nonmanual |      41.7      23.6      17.3      13.1      4.23       100
    lower nonmanual |        27        30      18.4      18.1      6.42       100
       upper manual |      15.9      19.9      33.2      23.2      7.73       100
       lower manual |      11.1      20.6        22      33.8      12.5       100
               farm |       4.3      5.78      9.03      11.7      69.1       100
                    |
              Total |       100       100       100       100       100       500
    -----------------------------------------------------------------------------
    These standardized counts can be interpreted as the row and column percentages that would occur if for both fathers and sons each occupation was equally likely. It appears that the apparent asymmetry was almost entirely due to changes in the marginal distributions. Also, it is now much clearer that farming is much more persistent over generations than the other occupations.

    Standardizing cross-tabulations also help when comparing tables across groups. In the example below we look at the race of husbands and wives in the USA for married couples whose husbands were born born between 1821 and 1989. We can see that the racial boundaries have become a bit more permeable over time, but that the USA is still very far removed from being a melting pot. In this example I also use Nick Cox's tabplot, which is also available from SSC, to graph the results

    Code:
    . use "http://www.maartenbuis.nl/software/interracial.dta", clear
    (husband's and wife's race in the USA from the census and ACS 1880-2014)
    
    . qui stdtable hrace wrace [fw=_freq], by(coh) replace
    
    . tabplot hrace coh [iw=std],                       ///
    >    by(wrace, compact cols(3) note(""))            ///
    >    xtitle("husband's birth cohort" "wife's race") ///
    >    xlab(1(2)18,angle(35) labsize(vsmall))
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	24.7 KB
ID:	1341565




    Yule, U. (1912) On the methods of measuring association between two attributes, Journal of the Royal Statistical Society, 75(6):579-652.


    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

  • #2
    Interesting! See also mstdize (SSC).

    Comment


    • #3
      I did not know mstdize. A quick look suggests that both mstdize and stdtable use the same algorithm. stdtable seems to have a bit more bells and wistles. Everybody has to deside for themselves whether that is a good or a bad thing.
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Maarten: I am sure you are right. This algorithm has been reinvented or rediscovered many times.

        I am pleased you noticed G.U. Yule in 1912.

        Deming and Stephan were there long before categorical data analysis started rediscovering it in the 1970s. I think you can get the results out of a Poisson regression with offsets somehow. Entropy-maximising is another buzzword. Economists will think of Richard Stone (RAS method) and biproportional matrices.

        Kruithof, J. 1937. Calculation of telephone traffic. De Ingenieur 52: E15–E25 is a fairly early reference often omitted from statistical discussions.
        Last edited by Nick Cox; 19 May 2016, 04:51.

        Comment


        • #5
          I happend to start out with using poisson, but it turned out that using IPF is quicker (it requires more iterations, but each iteration is very quick) and more stable when you have 0s in your table. Here is an example of how the trick with poisson works:

          Code:
          . use "http://www.maartenbuis.nl/software/mob.dta", clear
          (mobility table from the USA collected in 1973)
          
          . stdtable row col [fw=pop], cellwidth(9)
          
          ----------------------------------------------------------------------------------
          Father's        |                         Son's occupation                        
          occupation      | upper non  lower non  upper man  lower man       farm      Total
          ----------------+-----------------------------------------------------------------
          upper nonmanual |      41.7       23.6       17.3       13.1       4.23        100
          lower nonmanual |        27         30       18.4       18.1       6.42        100
             upper manual |      15.9       19.9       33.2       23.2       7.73        100
             lower manual |      11.1       20.6         22       33.8       12.5        100
                     farm |       4.3       5.78       9.03       11.7       69.1        100
                          |
                    Total |       100        100        100        100        100        500
          ----------------------------------------------------------------------------------
          
          . gen target = 100/5
          
          . qui poisson target i.row i.col, exposure(pop)
          
          . predict mu
          (option n assumed; predicted number of events)
          
          . tabdisp row col, cell(mu) cellwidth(9) format(%9.3g)
          
          -----------------------------------------------------------------------
          Father's        |                   Son's occupation                   
          occupation      | upper non  lower non  upper man  lower man       farm
          ----------------+------------------------------------------------------
          upper nonmanual |      41.7       23.6       17.3       13.1       4.23
          lower nonmanual |        27         30       18.4       18.1       6.42
             upper manual |      15.9       19.9       33.2       23.2       7.73
             lower manual |      11.1       20.6         22       33.8       12.5
                     farm |       4.3       5.78       9.03       11.7       69.1
          -----------------------------------------------------------------------
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            Thanks to Kit Baum a new version of the stdtable package is now available on SSC. To install it type in Stata ssc install stdtable, replace . It adds the row and col options. These result in standardized row or column percentages in the case of non-square tables. In square tables (the same number of rows as columns) the standardized counts can be interpreted as both row and column percentages. Here is an example of such a non-square table:

            Code:
            . use "http://www.maartenbuis.nl/software/husb.dta", clear
            (based on Cumulated German General Social Survey 1980-2012)
            
            . tab east husb_career [fw=freq], cel nofreq
            
             region of |    wife should support husband's career
             residence | strongly       agree   disagree  strongly  |     Total
            -----------+--------------------------------------------+----------
                  west |      8.69      15.92      24.45      19.70 |     68.77
                  east |      2.27       5.22      12.12      11.62 |     31.23
            -----------+--------------------------------------------+----------
                 Total |     10.96      21.14      36.57      31.32 |    100.00
            It is hard to compare the cell percentages with one another because there are more people in West-Germany as in East-Germany and in general people are more likely to disagree with that statement. We can take out the effect of the marginal distribution of region by asking for row percentages, and take out the effect of the marginal distribution of opinion by computing column percentages. However to take out the effect of both margins simultaneously we need to use the stdtable package:

            Code:
            . stdtable east husb_career [fw=freq], cellwidth(10)
            
            ----------------------------------------------------------------------
            region of |            wife should support husband's career          
            residence | strongly a       agree    disagree  strongly d       Total
            ----------+-----------------------------------------------------------
                 west |       15.1        13.7        11.1        10.1          50
                 east |       9.92        11.3        13.9        14.9          50
                      |
                Total |         25          25          25          25         100
            ----------------------------------------------------------------------
            These standardized counts can be interpreted as the cell percentages that would have occurred if there are an equal number of respondents in the east and the west and an equal number of respondents that strongly agreed, agreed, disagreed and strongly disagreed. However, at least in my field using cell percentages is fairly uncommon. Instead row or column percentages are more commonly used. That is what the new row can col options are for.

            Code:
            . stdtable east husb_career [fw=freq], cellwidth(10) row
            
            ----------------------------------------------------------------------
            region of |            wife should support husband's career          
            residence | strongly a       agree    disagree  strongly d       Total
            ----------+-----------------------------------------------------------
                 west |       30.2        27.4        22.3        20.1         100
                 east |       19.8        22.6        27.7        29.9         100
                      |
                Total |         25          25          25          25         100
            ----------------------------------------------------------------------
            Last edited by Maarten Buis; 26 Jan 2017, 03:06. Reason: how to install stdtable
            ---------------------------------
            Maarten L. Buis
            University of Konstanz
            Department of history and sociology
            box 40
            78457 Konstanz
            Germany
            http://www.maartenbuis.nl
            ---------------------------------

            Comment


            • #7
              Anyone interested in following up Nick Cox's reference to Kruithof's 1937 paper in antique Dutch, will find an intelligent English translation here: https://wwwhome.ewi.utwente.nl/~ptde...anslation.html

              Comment

              Working...
              X