Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • -codebook, problems- with problems indicating incompletely labeled vars

    -codebook, problems- can indicate if variables are incompletely labeled. However, it also reports variables as incompletely labeled if the labels correspond exactly to the values of a variable (which I suspect is unintended -- at least to me, because it is not documented), see for example:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(smident1 smident2)
    . .
    . .
    . .
    5 5
    2 2
    4 4
    4 5
    4 4
    1 1
    1 2
    1 1
    2 3
    1 3
    2 3
    3 4
    4 3
    1 2
    3 3
    4 3
    2 3
    3 4
    2 3
    2 3
    4 5
    2 3
    3 5
    3 2
    1 2
    1 1
    4 5
    3 2
    5 5
    1 1
    3 3
    3 3
    3 3
    4 4
    3 3
    2 3
    2 2
    3 5
    3 3
    1 4
    1 2
    1 2
    2 1
    2 2
    5 5
    . .
    3 4
    1 1
    2 3
    3 4
    3 2
    2 2
    . .
    2 3
    . .
    3 4
    4 4
    4 4
    4 4
    4 3
    4 3
    2 2
    4 2
    3 4
    3 3
    3 4
    1 2
    3 4
    2 2
    2 2
    4 5
    3 4
    3 3
    3 3
    5 4
    5 5
    3 4
    2 3
    4 4
    1 2
    1 1
    1 3
    3 5
    3 3
    4 4
    3 3
    5 5
    3 4
    1 2
    4 3
    1 1
    3 3
    3 2
    4 4
    2 1
    1 5
    5 5
    end
    label values smident1 smident1
    label def smident1 1 "not at all", modify
    label def smident1 5 "completely", modify
    label values smident2 smident2
    label def smident2 1 "not at all", modify
    label def smident2 2 "2", modify
    label def smident2 3 "3", modify
    label def smident2 4 "4", modify
    label def smident2 5 "completely", modify
    
    recode smident1 smident2 (. = .a)
    
    cap which fre
    if _rc ssc install fre
    
    codebook, problems
    fre smident1 smident2
    label list smident1 smident2
    The result of -codebook, problems- indicates that both variables have incompletely labeled variables, although (to my mind) smident2 is completely labeled:
    Code:
    . codebook, problems
    
    Potential problems in dataset [unnamed]
    
                   Potential problem   Variables
    ----------------------------------------------------
           incompletely labeled vars   smident1 smident2
    ----------------------------------------------------
    
    . fre smident1 smident2
    
    smident1
    -------------------------------------------------------------------
                          |      Freq.    Percent      Valid       Cum.
    ----------------------+--------------------------------------------
    Valid   1  not at all |         19      19.00      20.21      20.21
            2             |         19      19.00      20.21      40.43
            3             |         30      30.00      31.91      72.34
            4             |         19      19.00      20.21      92.55
            5  completely |          7       7.00       7.45     100.00
            Total         |         94      94.00     100.00           
    Missing .a            |          6       6.00                      
    Total                 |        100     100.00                      
    -------------------------------------------------------------------
    
    smident2
    -------------------------------------------------------------------
                          |      Freq.    Percent      Valid       Cum.
    ----------------------+--------------------------------------------
    Valid   1  not at all |          9       9.00       9.57       9.57
            2  2          |         20      20.00      21.28      30.85
            3  3          |         29      29.00      30.85      61.70
            4  4          |         22      22.00      23.40      85.11
            5  completely |         14      14.00      14.89     100.00
            Total         |         94      94.00     100.00           
    Missing .a            |          6       6.00                      
    Total                 |        100     100.00                      
    -------------------------------------------------------------------
    
    . label list smident1 smident2
    smident1:
               1 not at all
               5 completely
    smident2:
               1 not at all
               2 2
               3 3
               4 4
               5 completely
    This is not due to the fact, that missing values are not labeled (-codebook, problems- does not care about the labeling of extended missing values): If I modify the labels of smident2, this variable is now treated as completely labeled although the extended missing values are not:
    Code:
    . lab def smident2 2 "2nd" 3 "3rd" 4 "4th", modify
    
    . codebook, problems
    
    Potential problems in dataset [unnamed]
    
                   Potential problem   Variables
    --------------------------------------------------
           incompletely labeled vars   smident1
    --------------------------------------------------
    
    . fre smident1 smident2
    
    smident1
    -------------------------------------------------------------------
                          |      Freq.    Percent      Valid       Cum.
    ----------------------+--------------------------------------------
    Valid   1  not at all |         19      19.00      20.21      20.21
            2             |         19      19.00      20.21      40.43
            3             |         30      30.00      31.91      72.34
            4             |         19      19.00      20.21      92.55
            5  completely |          7       7.00       7.45     100.00
            Total         |         94      94.00     100.00           
    Missing .a            |          6       6.00                      
    Total                 |        100     100.00                      
    -------------------------------------------------------------------
    
    smident2
    -------------------------------------------------------------------
                          |      Freq.    Percent      Valid       Cum.
    ----------------------+--------------------------------------------
    Valid   1  not at all |          9       9.00       9.57       9.57
            2  2nd        |         20      20.00      21.28      30.85
            3  3rd        |         29      29.00      30.85      61.70
            4  4th        |         22      22.00      23.40      85.11
            5  completely |         14      14.00      14.89     100.00
            Total         |         94      94.00     100.00           
    Missing .a            |          6       6.00                      
    Total                 |        100     100.00                      
    -------------------------------------------------------------------

  • #2
    Technically, what happens is this:

    codebook.ado uses the extended macro function label lblname # to obtain the label attached to # in value label lblname. This extended macro function retuns # if no label is attached to # (unless the option strict is specified). Thus, there is no way of distinguishing between

    Code:
    . label list foo
    foo:
    and

    Code:
    . label list foo
    foo:
              42 42
    Whether the latter should be considered labeled depends on how we define a label. If the definition is a purely technical one, e.g., a(ny) string, then the codebook should not report missing labels. Note that labelbook with the problems option will report numeric-to-numeric mappings as a potential problem. Personally, I do not like numeric labels. If the label matches the underlying value, there is no need for a label because all commands will simply output the numeric value. If the label does not match the underlying value confusion is a matter of time.

    Anyway, I believe this should be clearly documented.
    Last edited by daniel klein; 24 Apr 2023, 02:40.

    Comment


    • #3
      duplicate
      Last edited by daniel klein; 24 Apr 2023, 02:39.

      Comment

      Working...
      X