Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unexpected result from table command when missing observations of string variable

    Hello,

    Apologies if this has been pointed out before, I searched the forum for previous posts on this but was unable to find one. I am encountering some unexpected behavior when trying to report missing values using Stata's new table command. The "missing" option does not appear to change the table output when a string variable is missing. This is easy enough to fix by explicitly labeling missing values, but was unexpected behavior nonetheless and can mask missing values if you are using the "table" command without double-checking.

    Is this intended behavior or a bug? If it is intended, is there a different way I should be coding missing string values?

    Thanks,
    -Noah

    Version:
    Stata/SE 17.0 for Windows (64-bit x86-64)
    Revision 10 May 2022

    Code:
    . version 17.0
    
    . clear
    
    . sysuse auto
    (1978 automobile data)
    
    . tostring rep78, gen(rep78_str)
    rep78_str generated as str1
    
    . replace rep78_str = "" if missing(rep78)
    (5 real changes made)
    
    . 
    . //Confirm both variables contain 5 missing values
    . count if missing(rep78)
      5
    
    . count if missing(rep78_str)
      5
    
    . 
    . tab rep78, missing
    
         Repair |
    record 1978 |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              1 |          2        2.70        2.70
              2 |          8       10.81       13.51
              3 |         30       40.54       54.05
              4 |         18       24.32       78.38
              5 |         11       14.86       93.24
              . |          5        6.76      100.00
    ------------+-----------------------------------
          Total |         74      100.00
    
    . //5 missing values shown
    . table rep78, missing
    
    -------------------------------
                       |  Frequency
    -------------------+-----------
    Repair record 1978 |           
      1                |          2
      2                |          8
      3                |         30
      4                |         18
      5                |         11
      .                |          5
      Total            |         74
    -------------------------------
    
    . //5 missing values shown
    . tab rep78_str, missing
    
         Repair |
    record 1978 |      Freq.     Percent        Cum.
    ------------+-----------------------------------
                |          5        6.76        6.76
              1 |          2        2.70        9.46
              2 |          8       10.81       20.27
              3 |         30       40.54       60.81
              4 |         18       24.32       85.14
              5 |         11       14.86      100.00
    ------------+-----------------------------------
          Total |         74      100.00
    
    . //5 missing values shown
    . table rep78_str, missing
    
    -------------------------------
                       |  Frequency
    -------------------+-----------
    Repair record 1978 |           
      1                |          2
      2                |          8
      3                |         30
      4                |         18
      5                |         11
      Total            |         69
    -------------------------------
    
    . //No missing values shown

  • #2
    looks like a bug to me, I suggest you report it to tech support (Stata Technical Support <[email protected]>) and then report back on what they tell you

    and, yes, I get the same thing on my Stata (version 17 on a Mac with birth date of 5/10/22)

    Comment


    • #3
      Stata's technical support states that the current "table" command only supports the "missing" option for numeric values. I asked whether they planned on including the option to report non-numeric values using the missing option and will update when I get a reply.

      I'm slightly confused by the reply because the description of the "missing" option in the documentation file doesn't specify anything about numeric vs. non-numeric values, it just says "treat missing values like other values," but maybe I'm missing something from deeper in the manual.

      In the mean time, they suggest using a wrapper program like below to include missing non-numeric values:

      Code:
      capture program drop mytable
       
      program mytable
        syntax varlist [in] [if], [missing] *
        if "`missing'" != "" {
           preserve
           ds `varlist', has(type string)
           local strlist = r(varlist)
           foreach var of local strlist {
              replace `var' = uchar(8287) if missing(`var')
           }
        }
       table  `varlist' `in' `if', `missing' `options'
       end

      Comment

      Working...
      X