Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Missing data in tables and chi-square computations

    Recently I ran some simple bivariate tables using tabulate via the tab2 wrapper. I wanted to show the missing data values in the tables but compute the chi-square statistics based only on the valid entries. There doesn't seen to be an easy way to do that. If you specify the missing option along with chi2 Stata computes the statistic based on the whole table. Here's an example.

    Code:
    . tabulate rep78 foreign, chi2
    
        Repair |
        Record |       Car type
          1978 |  Domestic    Foreign |     Total
    -----------+----------------------+----------
             1 |         2          0 |         2 
             2 |         8          0 |         8 
             3 |        27          3 |        30 
             4 |         9          9 |        18 
             5 |         2          9 |        11 
    -----------+----------------------+----------
         Total |        48         21 |        69 
    
              Pearson chi2(4) =  27.2640   Pr = 0.000
    
    . tabulate rep78 foreign, chi2 miss
    
        Repair |
        Record |       Car type
          1978 |  Domestic    Foreign |     Total
    -----------+----------------------+----------
             1 |         2          0 |         2 
             2 |         8          0 |         8 
             3 |        27          3 |        30 
             4 |         9          9 |        18 
             5 |         2          9 |        11 
             . |         4          1 |         5 
    -----------+----------------------+----------
         Total |        52         22 |        74 
    
              Pearson chi2(5) =  27.8735   Pr = 0.000

    I figured that had to be a routine that would do this and so I went searching through Stata's rather confusing set of tabulation routines -- table, tabulate tabdisp, tabstat, even epitab. Each has its own list of options, but, so far as I can tell, none will let you do what Stata's main competitors do -- print a table showing missing values but computing statistics only on valid values. I looked though a few of the many user-written routines for tables and I don't see any that do this either.

    It was easy enough to work around the problem. I just got the table with the missing option, reran it quietly without the missing option, asking for chi-square and then used the display command to print it although I was surprised to see that tabulate returns chi-square and the p value but not the df. Have I missed something here? It hardly seems worth trying to get this on a wishlist for Stata 16, but on the other hand, particularly for a new user, Stata's table routines are, in my opinion, confusing and unnecessarily difficult to navigate.

    Richard T. Campbell
    Emeritus Professor of Biostatistics and Sociology
    University of Illinois at Chicago

  • #2
    See also tabchi from tab_chi (SSC). Right now I can't check its behaviour which I doubt is different in this respect, but it may be helpful. Also, I can probably add this feature faster than Stata 16 would do it.

    Comment


    • #3
      Hi everyone,
      I have the same problem. I want to present percentage based on the hole sample per group but calculate the chi square statistics based on valid values only. Does it make sense?

      Comment


      • #4
        You could write your own program for this. but I don't know a way to do this at present without running two commands.

        Code:
        . sysuse auto, clear
        (1978 Automobile Data)
        
        . tab rep78 foreign, missing
        
            Repair |
            Record |       Car type
              1978 |  Domestic    Foreign |     Total
        -----------+----------------------+----------
                 1 |         2          0 |         2 
                 2 |         8          0 |         8 
                 3 |        27          3 |        30 
                 4 |         9          9 |        18 
                 5 |         2          9 |        11 
                 . |         4          1 |         5 
        -----------+----------------------+----------
             Total |        52         22 |        74 
        
        . quietly tab rep78 foreign, chi
        
        . ret li
        
        scalars:
                          r(N) =  69
                          r(r) =  5
                          r(c) =  2
                       r(chi2) =  27.26396103896104
                          r(p) =  .0000175796084266

        Comment


        • #5
          Thank you Nick for your response,

          I'm presenting a table for a paper. Is it correct to present frequencies and percentages (taking missing values into account in the denominator) but the p value of the chi square based only on the valid one? I'm not sure...

          Comment

          Working...
          X