Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to clear cells with less than 5 observations in a cross table?

    Hi there,

    I'm working with Danish register data, wich is a massive panel dataset with several hundred thousands observations per year over 14 years. According to Statistics Denmarks discretion policy I'm not allowed to show cell values below 5 individuals.

    But since I need a lot of cross tables with plenty of cells with cell value below 5, my question is:

    - Does Stata have an option or a command where I can condition the output in the table cells so that the tab-command empties cells with values below 5?

    My code:
    tab variable2013 variable2012

    In the attached example1, I would like the STATA tab command to empty the following cells: C2, F2, E3, and E6, as showed in the attached example2

    I hope my question is clear, otherwise please let me know so that I can try to improve my question. I know you guys spend a lot of time helping people like me, and I truly admire and respect that.

    Thank you very much in advance,


    Best Regards,

    Mikkel
    Attached Files

  • #2
    How can it breach confidentiality to show that cross-combinations are rare? At one extreme, if no individuals are both one thing and another, no one is being outed.

    That said, the question yields to a two-step

    Code:
    bysort variable2013 variable2012: gen N = _N 
    tab variable2013 variable2012 if N > 5

    NB: Spelling is Stata, not STATA, as explained in http://www.statalist.org/forums/help#spelling

    Comment


    • #3
      Hi Neil!

      Thanks for your quick reply. The solution works, but doesn’t really solve my problem, as it erases all cells crossed by two variables if cell values are below 5.

      What I need is cross tables, where cells below 5 are not erased, but empty/blank, as showed in my attached example2.

      Does it make sense? It might sound silly, but I need to show that there are few observations when certain variables are crossed though I cannot show exactly how few.

      Comment


      • #4
        I can't speak for Neil, but I see the problem. Consider installing tabcount (SSC). See also http://www.stata-journal.com/sjpdf.h...iclenum=pr0011

        You can automate the recording of values that occur using (e.g.) levelsof

        Code:
        . sysuse auto
        (1978 Automobile Data)
        
        . bysort foreign rep78 : gen N = _N
        
        . tab foreign rep78 if N > 5
        
                   |             Repair Record 1978
          Car type |         2          3          4          5 |     Total
        -----------+--------------------------------------------+----------
          Domestic |         8         27          9          0 |        44
           Foreign |         0          0          9          9 |        18
        -----------+--------------------------------------------+----------
             Total |         8         27         18          9 |        62
        
        
        . levelsof foreign, local(row)
        0 1
        
        . levelsof rep78, local(col)
        1 2 3 4 5
        
        . tabcount foreign rep78 if N > 5, v1(`row') v2(`col')
        
        ----------------------------------------
                  |      Repair Record 1978    
         Car type |    1     2     3     4     5
        ----------+-----------------------------
         Domestic |          8    27     9      
          Foreign |                      9     9
        ----------------------------------------
        
        . ssc desc tabcount
        
        --------------------------------------------------------------------------------
        package tabcount from http://fmwww.bc.edu/repec/bocode/t
        --------------------------------------------------------------------------------
        
        TITLE
              'TABCOUNT': module to tabulate frequencies, with zeros explicit
        
        DESCRIPTION/AUTHOR(S)
              
              tabcount tabulates frequencies for up to 7 variables.   Its main
              distinctive features are that zero frequencies  of one or more
              specified values or conditions are always  shown in the table
              (i.e. entirely empty rows, columns, etc.  are not omitted) and
              that reduced datasets and/or matrices  containing the frequencies
              may also be saved. This  version of tabcount is 2.0.0 and
              requires Stata 8.  A previous version, much restricted, of
              tabcount for Stata 7 is included in this package as tabcount7.
              
        <stuff omitted>  
        
        (type -ssc install tabcount- to install)

        Comment


        • #5
          Sorry, I meant Nick:-)

          Your solution seems to work for me! Though I have no idea what the 'levelsof' and 'tabcount' actually does. But thanks a lot!

          Comment


          • #6
            Thanks. In each case, there is a help file to help! You hardly need it, but there is also a manual entry for levelsof.

            Comment


            • #7
              Dear Nick.

              I'm now facing another difficulty: as I try to install the tabcount by "ssc install tabcount", Stata replies:

              host not found
              http://fmwww.bc.edu/repec/bocode/t/ either:
              1) is not a valid URL, or
              2) could not be contacted, or
              3) is not a Stata download site (has no stata.toc file).

              Do you know what's wrong?

              Comment


              • #8
                I can see the files on SSC. So, I suspect a problem with the network connections from your Stata. For example, you may be using a proxy server. See

                Code:
                help netio

                Comment


                • #9
                  You were right once again, it was a question of proxy server. I've had the server responsible to install the tabcount for me, and now it works all fine. Especially as it also removes the totals, which I don't need.

                  Thanks again

                  Comment

                  Working...
                  X