Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • The -labellacking- command can't run under Stata 16

    Dear Stata users,

    Command -labellacking- was written by Nick Cox and Robert Picard. (Distribution-Date: 20130702). I often used it to check variables that with values lacking value labels. I find the command can't run sucessfully under Stata 16. Could anyone tell me how to modify it? Example as belows, all variables in my dataset have values labels, however labellacking list all of them (and that is a wrong hint).

    Click image for larger version

Name:	_20210910101419.png
Views:	1
Size:	15.2 KB
ID:	1626958

  • #2
    Please show the results of

    Code:
    about
    describe a1
    label list `: value label a1'

    Comment


    • #3
      daniel klein, thanks a lot!
      Code:
      . about
      
      Stata/SE 16.0 for Windows (64-bit x86-64)
      Revision 26 Jun 2019
      Copyright 1985-2019 StataCorp LLC
      
      Total physical memory:        8.00 GB
      Available physical memory:    4.02 GB
      
      Single-user Stata perpetual license:
             Serial number:  xxxxxxxxxx
               Licensed to:  xxxx
                             xxxxxxx
      
      . describe a1
      
                    storage   display    value
      variable name   type    format     label      variable label
      -----------------------------------------------------------------------------------------------------------------------
      a1              byte    %8.0g      a1         Gender
      
      . label list `: value label a1'
      a1:
                 1 male
                 2 female
      
      .
      Last edited by Chen Samulsion; 09 Sep 2021, 23:56.

      Comment


      • #4
        Thanks for providing more information. There was a bug in the label save command in the initial release of Stata 16 that was fixed in the update from 24jul2019. Perhaps that bug affected decode, which is used in labelacking (SSC). Update your version of Stata and see whether that fixes the problem.

        Comment


        • #5
          Thank you for your professional help.
          Best regards.

          Comment


          • #6
            daniel klein Thanks!

            Comment


            • #7
              It seems that if we do not want to update to Stata 16.1, then we can do a little modification by using daniel klein's a small piece of code (original thread is here https://www.statalist.org/forums/for...th-value-label):
              Originally posted by daniel klein View Post

              labelbook might not be sufficient; from the help



              labelbook will not consider it a problem if, e.g., values 1 and 2 are labeled, but values 3, 4, ..., k are not.


              I will point to labellacking (SSC), which is specially designed for this task. A minimal (reduced) version of that program is

              Code:
              tempvar tmp
              
              ds , has(vallabel)
              local varlist `r(varlist)'
              
              foreach v of local varlist {
              display as txt "unlabeled values in " as res "`v'"
              decode `v' , generate(`tmp')
              levelsof `v' if (`v' != .) & mi(`tmp') , missing
              display
              drop `tmp'
              }
              By the way, elabel (SSC) could do this, too:

              Code:
              levelsof varname , local(levels) missing separate(,)
              elabel list (varname) iff inlist(#, `levels')
              local values `r(values)'
              local levels : subinstr local levels "," " " , all
              local nolabel : list levels - values
              display "`nolabel'"
              Best
              Daniel

              Comment


              • #8
                If at all possible, I would recommend keeping your version of Stata up to date.

                If my diagnosis in #4 is correct, then the highlighted code in #7 will fail just like labellacking (SSC) fails because the code resembles essentially the same approach that labellacking uses.

                What will work is the code in #7 that uses elabel (preferably SSC) because it does not rely on decode and/or label save and is, thus, not affected by the bug in Stata 16.0. Here is a more general sketch replicating (some of) the functionality of labellacking with elabel:

                Code:
                *! version 0.0.1 27oct2021 daniel klein
                program elabel_cmd_lacking , rclass
                    version 11.2
                    
                    syntax [ varlist ] [ if ] [ in ] [ , Missing ]
                    
                    marksample touse , novarlist
                    
                    foreach var of local varlist {
                        
                        // skip variables with no value label attached
                        // note: this excludes all string variables
                        local lblname : value label `var'
                        if ("`lblname'" == "") continue
                        
                        // get the observed (integer) values
                        quietly levelsof `var' if `touse' & (`var' == int(`var')) ///
                            , `missing' local(var_values)
                        
                        // get the labeled values
                        quietly elabel list `lblname'
                        local lab_values `r(values)'
                        
                        // get the non-labeled values
                        local not_labeled : list var_values - lab_values
                        
                        // strip system missing value; those cannot be labeled
                        if ("`missing'" == "missing") {
                            local sysmis .
                            local not_labeled : list not_labeled - sysmis
                        }
                        
                        if ("`not_labeled'" == "") continue
                        
                        display abbrev("`var'", 12) _col(13) "`not_labeled'"
                        
                        return local varlist `return(varlist)' `var'
                        return local `var' `not_labeled'
                        
                    }
                    
                end
                Last edited by daniel klein; 27 Oct 2021, 01:28.

                Comment


                • #9
                  Thank you daniel klein. The bug of label save seems related to language that value label used. Let's do a small trial. Sorry the folllowing codes are too long.

                  Original -labellacking- command:
                  Code:
                  *! 1.3.0 NJC and RP 23 June 2013
                  * 1.2.0 NJC 22 June 2013
                  * 1.0.0 NJC 21 June 2013
                  program labellacking_original
                  version 8.2
                  syntax [varlist] [if] [in] [, All Reportnovaluelabels MISSing]
                   
                  quietly {
                      ds `varlist', has(type numeric)
                      local varlist `r(varlist)'
                      if "`varlist'" == "" error 102
                      
                      marksample touse, novarlist
                      count if `touse'
                      if r(N) == 0 error 2000
                  }
                   
                  local length = 1
                  local I = 0
                  tempvar vuse
                  gen byte `vuse' = 0
                   
                  quietly foreach v of local varlist {
                      if "`: value label `v''" == "" {
                          if "`reportnovaluelabels'" != "" {
                              local ++I
                              local length = max(`length', length("`v'"))
                              local name`I' "`v'"
                              local text`I' "(no value label)"
                          }
                      }
                      else {
                          tempvar work
                          replace `vuse' = `touse' & (`v' != .) & (`v' == int(`v'))
                          decode `v' if `vuse', gen(`work') maxlength(1)
                          levelsof `v' if `vuse' & missing(`work'), local(levels) `missing'
                          if "`levels'" == "" {
                              if "`all'" != "" {
                                  local ++I
                                  local length = max(`length', length("`v'"))
                                  local name`I' "`v'"
                                  local text`I' "(none)"
                              }
                          }    
                          else {
                              local ++I  
                              local length = max(`length', length("`v'"))
                              local name`I' "`v'"
                              local text`I' "`levels'"
                          }
                          drop `work'
                      }
                  }
                  
                  if `I' di
                  local col = `length' + 4
                  forval i = 1/`I' {
                      di "`name`i''{col `col'}`text`i''"
                  }
                   
                  end
                  A Trivial Modification:
                  Code:
                  *! modify the original labellacking command written by Nick & Robert Picard
                  program labellacking_modified
                  version 8.2
                  syntax [varlist] [if] [in] [, All Reportnovaluelabels MISSing]
                   
                  quietly {
                      ds `varlist', has(type numeric) //has(vallabel)
                      local varlist `r(varlist)'
                      if "`varlist'" == "" error 102
                      
                      marksample touse, novarlist
                      count if `touse'
                      if r(N) == 0 error 2000
                  }
                   
                  local length = 1
                  local I = 0
                  tempvar vuse
                  gen byte `vuse' = 0
                   
                  quietly foreach v of local varlist {
                      if "`: value label `v''" == "" {
                          if "`reportnovaluelabels'" != "" {
                              local ++I
                              local length = max(`length', length("`v'"))
                              local name`I' "`v'"
                              local text`I' "(no value label)"
                          }
                      }
                      else {
                          tempvar work
                          replace `vuse' = `touse' & (`v' != .) & (`v' == int(`v'))
                          //decode `v' if `vuse', gen(`work') maxlength(1)
                          //levelsof `v' if `vuse' & missing(`work'), local(levels) `missing'
                                decode `v' if `vuse', generate(`work')
                                levelsof `v' if (`v' != .) & missing(`work') , local(levels) `missing' 
                          if "`levels'" == "" {
                              if "`all'" != "" {
                                  local ++I
                                  local length = max(`length', length("`v'"))
                                  local name`I' "`v'"
                                  local text`I' "(none)"
                              }
                          }    
                          else {
                              local ++I  
                              local length = max(`length', length("`v'"))
                              local name`I' "`v'"
                              local text`I' "`levels'"
                          }
                          drop `work'
                      }
                  }
                  
                  if `I' di
                  local col = `length' + 4
                  forval i = 1/`I' {
                      di "`name`i''{col `col'}`text`i''"
                  }
                   
                  end
                  And Here Is A Test:
                  Code:
                  sysuse voter
                  label list
                  replace inc = 6 in 6
                  replace inc = 7 in 11
                  replace inc = 9 in 14
                  replace inc = 8 in 2
                  replace inc = .d in 8
                  replace inc = .c in 4
                  labellacking_original
                  labellacking_modified
                  label define inc2 1 aÄGötamål 2 oÖSveamål 3 uÜNorrländska 4 bßGotländska 5 oøsäga, modify
                  labellacking_original
                  labellacking_modified
                  label define inc2 1 ÄaGötamål 2 ÖoSveamål 3 ÜuNorrländska 4 ßbGotländska 5 øosäga, modify
                  labellacking_original
                  labellacking_modified
                  Last edited by Chen Samulsion; 27 Oct 2021, 02:28.

                  Comment


                  • #10
                    OK, I see what is going on here. The problem you show is not related to the bug in Stata 16.0 as I had initially speculated. Instead, the problem arises because labellacking was written during a time when Stata did not support Unicode. The maxlength() option, that your modified version omits, would keep only the first byte of a potentially multiple-byte Unicode character, such as Ä, Ü, Ö, etc. This results in missing values for the decoded variable, which in turn leads labellacking to report the label as lacking. Note that omitting the maxlength() option, or changing it to something like maxlength(4)*, will solve the problem; no further modification is needed.

                    I can only speculate as to why Nick Cox and Robert Picard included the maxlength() option in the first place. If I had to guess, I would say the two authors come from a time when storage space was even more rare and precious than nowadays and did not want to waste more than needed for their purposes. A string variable that holds 1-byte needs less storage than a string variable that holds potentially 32,000 bytes (or characters); although back in the days, Stata's string variables were actually limited to 244 characters.** Perhaps keeping only the first byte would also slightly speed up the code. Anyway, this is all speculation and you have shown how to fix the issue. Perhaps the authors will consider modifying their code.

                    * I believe Unicode characters take up to 4 bytes, but I could be wrong.

                    **Edit: On second thought, on June 23, 2013 we might be talking about Stata 13.0, which did indeed increase the limit of string variables from 244 characters to 2 billion characters or so. As I have mentioned, these thoughts are speculation.
                    Last edited by daniel klein; 27 Oct 2021, 04:17.

                    Comment


                    • #11
                      Dear daniel klein thank you for you attentions. I had tried to modify maxlength() option before posting in #9. I changed codes in bits and pieces, and ignoring that the maxlength() is the very crux.

                      Comment


                      • #12
                        As is evident from the thread labellacking was written in a brief flurry of activity in 2013. daniel klein is I think right: the motive behind maxlength(1) was that there being a non-missing result was sufficient for the purposes of the command. My inclination at this point is to make maxlength() an option for those wishing to use it.

                        Comment


                        • #13
                          The historical context for the approach can be found here.

                          It's amusing to note that I initially used:
                          Code:
                          decode n, gen(s) maxlen(4)
                          in that post. This got further reduced to maxlen(1) in labellacking. If the first character of a "decoded" string is missing, then there is no label associated with that numeric value.

                          daniel klein has correctly identified the issue in #10.

                          With Unicode Stata (version 14 and up), strings are encoded in UTF-8, which uses 1 to 4 bytes per character. The encoding is backwards compatible with ASCII; each ASCII character has the same code in UTF-8 and is stored using a single byte. Every other character is stored as a multi-byte sequence of 2 to 4 bytes.

                          Arguably, this is a bug in decode as maxlength(1)should return the first byte of the value label, even though this may not form a valid UTF-8 character. Given that this is not the case, this will only bite if the first character of a value label is a multi-byte UTF-8 character. As Daniel has pointed out, changing to maxlen(4)will ensure that at least one Unicode character is returned if the value label is defined.

                          Comment


                          • #14
                            Robert and I conferred and went with his idea of changing to maxlength(4). So the syntax users need is unchanged.

                            Thanks to Kit Baum, revised files may be downloaded from SSC. The help file thanks Chen and Daniel for identifying a problem and diagnosing it, and thanks again here to both of you.

                            Comment


                            • #15
                              labellacking is now updated to fix a different problem. See https://www.statalist.org/forums/for...labeled-values if interested.

                              Comment

                              Working...
                              X