Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • output all unlabeled values of variables with value label

    Hi!

    How may I select all unlabeled values of variables which - except for those values - have a value label, and list only those unlabeled values?
    The thing is, the value labels have upto thousands of labeled values, and just a few unlabeled ones - and this is the case for several variables.

    So far, I happened to find out, which variables I have to deal with (1). Knowing the specific values at stake (2), would be even better
    Code:
    *(1)
    findname *, vallabeltext("") local(missingvl)
    
    *(2)
    foreach v of var `missingvl' {
    fre `v' if (substr(`"` : value label `v''"', 1, 2) == "") //only would output (if present) variables with all vaules being unlabeld
    }
    There should be any macro for the value label content (on value level), not the (overarching) value label name!?

    I also tried ...
    Code:
    foreach v of var `missingvl' {
    quiet labelbook `: value label `v'' fre `v' if (substr(`"``r(null)''"') // ...but got invalid syntax here :(
    }
    Note: I cannot install new commands (like the labutil stuff. Some basic "additional" commands (eg fre) have been installed, however).


    Greetings
    Thank you for reading (and some reply)
    Using Stata 16.1
    Extractions (-dataex-) of the data I'm working with is impossible, sorry!

  • #2
    You can get a list of variables with incompletely labeled values using codebook, problems.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Originally posted by Maarten Buis View Post
      You can get a list of variables with incompletely labeled values using codebook, problems.
      labelbook might not be sufficient; from the help

      1. Value label has gaps in mapped values (for example, values 0 and 2 are labeled, while 1 is not)
      labelbook will not consider it a problem if, e.g., values 1 and 2 are labeled, but values 3, 4, ..., k are not.


      I will point to labellacking (SSC), which is specially designed for this task. A minimal (reduced) version of that program is

      Code:
      tempvar tmp
      
      ds , has(vallabel)
      local varlist `r(varlist)'
      
      foreach v of local varlist {
          display as txt "unlabeled values in " as res "`v'"
          decode `v' , generate(`tmp')
          levelsof `v' if (`v' != .) & mi(`tmp') , missing
          display
          drop `tmp'
      }
      By the way, elabel (SSC) could do this, too:

      Code:
      levelsof varname , local(levels) missing separate(,)
      elabel list (varname) iff inlist(#, `levels')
      local values `r(values)'
      local levels : subinstr local levels "," " " , all
      local nolabel : list levels - values
      display "`nolabel'"
      Best
      Daniel

      Comment


      • #4
        thank you both!
        Maarten Buis, maybe I should have made that more obvious - the variables (1) are not the actual problem. Thx anyway.
        daniel klein, your (reduced) program is pretty cool (thoough, I use findname, however, since I prefer to generate locals with unique names each).
        I checked for labellacking (and elabel again) - uninstalled in the "restricted area". Sad!
        Last edited by Franz Gerbig; 09 Jan 2020, 08:18.
        Thank you for reading (and some reply)
        Using Stata 16.1
        Extractions (-dataex-) of the data I'm working with is impossible, sorry!

        Comment


        • #5
          Originally posted by Franz Gerbig View Post
          [...] thoough, I use findname
          findname is from SSC (or SJ); I like it very much, but I could not be sure what you have installed.

          Edit:

          A better elabel approach (simpler code and works for many levels in varname)

          Code:
          elabel list (rep78) iff (# != .)
          local values `r(values)'
          levelsof rep78 if (rep78 != .) , local(levels) missing
          local nolabel : list levels - values
          display "`nolabel'"
          Edit 2:

          Originally posted by Franz Gerbig View Post
          your (reduced) program is pretty cool
          Just to be sure: that code (especially the details, such as missing values) is borrowed from labellacking; credits are due to Nick Cox and Robert Picard.

          Best
          Daniel
          Last edited by daniel klein; 09 Jan 2020, 08:50.

          Comment


          • #6
            Thanks to the masters of masters Nick and Robert, then, too! (I had not read down to the credits, sorry.)

            Very nice to have would be some way to only run the loop for variables with at least some label for positive integer values (integer in the sense of lacking decimals, not in the sense of data type int). In this sense - ideally -, example variables age and books should not be included ...

            Code:
            clear
            input id age city school college books
            1 18 -98 111111 888888 1000
            2 -97 10456321 -20 887321 -98
            3 103 9987654 112546 -97 3000
            end
            
            lab def age -97 "not answered", modify
            lab val age age
            
            lab def city -97 "not answered" -98 "dunno" ///
            10456321 "Capital of all evil" ///
            9987654 "Downtown", modify
            lab val city city
            
            lab def school -97 "not answered" -98 "dunno" -20 "not relevant" ///
            111111 "Hogward's cradle" ///
            112546 "DWDXTW High School", modify
            lab val school school
            
            lab def college -98 "dunno" -20 "not relevant" ///
            888888 "Whambridge University" ///
            878586 "Towerbridge University", modify
            lab val college college
            
            lab def books -97 "not answered" -98 "dunno", modify
            lab val books books
            
            tempvar tmp
            qui findname *, vallabeltext("") local(missingvlc) //varmidth(32)
            foreach v of local missingvlc {
                local vl: val lab `v'
                decode `v', gen(`tmp') //tmp = missing if value unlabeled
                qui levelsof `v' if (`v' != .) & mi(`tmp'), missing local(ulvalue) //local for unlabeled values
                di `"lab def `vl' `ulvalue' "", modify //in `v'"' //write code draft for adding label content later on
                drop `tmp'
            }
            ... but they are.
            Code:
            *result:
            lab def age 18 103 "", modify //in age
            lab def college -97 887321 "", modify //in college
            lab def books 1000 3000 "", modify //in books
            Thank you for reading (and some reply)
            Using Stata 16.1
            Extractions (-dataex-) of the data I'm working with is impossible, sorry!

            Comment


            • #7
              First of all, understand that findname, (and other programs) loop over the entire list of variables. Actually, findname will also loop over the values in variables, using the levelsof approach. Therefore if you are concerned with speed, you might be better off just implementing the main loop.

              Code:
              foreach var of varlist * {
                  local vl : value label `var'
                  
                  // skip variables w/o value labels
                  if ( mi("`vl'") ) continue
                  
                  // at least one positive integer in value label
                  mata {
                      st_vlload("`vl'", values=., text="")
                      st_local("k", strofreal((colsum(values:>0) :>0 )))
                  }
                  if ( !`k' ) continue
                  
                  display "`var'" // <- replace with the decode, etc. stuff
              }

              Edit: If it is not obvious, you may naturally stick with findname and still use the inner (mata) part of my loop to skip variables if their value label mappings do not contain at least one positive integer.


              Edit 2: I am not getting tired of advertising elabel (others who read this might be able to install it)

              Code:
              foreach var of ... {
                  local vl : value label `var'
              
                  // at least one positive integer
                  quietly elabel list `vl' iff (# > 0)
                  if ( !r(k) ) continue
                  ...
              }
              Best
              Daniel
              Last edited by daniel klein; 10 Jan 2020, 04:21.

              Comment


              • #8
                Thanks again for your effort. Unfortunately, I don't get the loop work in my real data in area 51. But should not be your business, of course.

                I am not getting tired of advertising elabel
                seems, you won't, ever
                Thank you for reading (and some reply)
                Using Stata 16.1
                Extractions (-dataex-) of the data I'm working with is impossible, sorry!

                Comment


                • #9
                  Originally posted by Franz Gerbig View Post
                  Unfortunately, I don't get the loop work in my real data in area 51.
                  Are you referring to the first loop that I posted in #7? I do not use any third-party commands or functions in this, so it should work. If you experience problems, please be more specific; ideally, provide an example that reproduces these problems.

                  Comment


                  • #10
                    Yes, to that very loop. I wish I could be more precise. Invalid syntax is all I geht, after having checked each line of it (w/o the decode stuff). The thing is, I can't export any dataex out of that "area". But it's okay for now! You've done enough!
                    Thank you for reading (and some reply)
                    Using Stata 16.1
                    Extractions (-dataex-) of the data I'm working with is impossible, sorry!

                    Comment

                    Working...
                    X