Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • "ds, not(vallabel)" does NOT select all variables without defined value labels

    Hello

    Using
    Code:
    ds, not(vallabel)
    selects variables without defined value labels (according to "help ds"). Yet, in my dataset some variables are NOT selected, even though they have no value labels.

    Here is a reproducible example:

    Code:
    sysuse citytemp.dta, clear
    describe // we see: "region" and "division" have value labels; all other variables don't.
    ds, not(vallabel) // correctly lists all variables except "region" and "division".
    label list region // works.
    label drop region // drop the value label container "region" for the variable "region".
    label list region // shows message "value label region not found" (as expected)
    ds, not(vallabel) // "region" is still not implied. WHY? How can I get a list that also implies "region"?
    describe // "region" still has a value label container accoording to describe...

    1. What is happening here?
    2. How can I get a list with ALL variables that don't have value labels according to "label list"?

    Thank you very much in advance
    Andreas
    Last edited by Andreas Fabian Fischer; 05 Sep 2023, 07:43.

  • #2
    We need a reproducible problem as we can't check with a dataset you don't show us. It seems that there is inconsistency somewhere but I can't identify which command, if any, is to blame.

    Comment


    • #3
      Dear Mr. Cox,

      thanks for your response. I changed my initial post and now it has a reproducible example.

      Best
      Andreas

      Comment


      • #4
        You have to break the value label association.
        For example,
        Code:
        . label values region
        
        . d
        
        Contains data from /Applications/Stata18/ado/base/c/citytemp.dta
         Observations:           956                  City temperature data
            Variables:             6                  3 Mar 2022 19:17
        --------------------------------------------------------------------------------------------------------------
        Variable      Storage   Display    Value
            name         type    format    label      Variable label
        --------------------------------------------------------------------------------------------------------------
        division        int     %16.0g     division   Census division
        region          int     %13.0g                Census region
        heatdd          int     %8.0g                 Heating degree days
        cooldd          int     %8.0g                 Cooling degree days
        tempjan         float   %9.0g                 Average January temperature
        tempjuly        float   %9.0g                 Average July temperature
        --------------------------------------------------------------------------------------------------------------
        Sorted by: region
             Note: Dataset has changed since last saved.
        
        . ds, not(vallabel)
        region    heatdd    cooldd    tempjan   tempjuly

        Comment


        • #5
          I agree with #1 that terminology in the help file is slightly misleading. The wording should be

          "has(vallabel) selects variables with attached value labels."

          or

          "has(vallabel) selects variables with associated value labels."

          rather than

          "has(vallabel) selects variables with defined value labels."

          But terminology is generally ambiguous in this area. For example, is a value label a collection of integer-to-text mappings, or (specific) text mapped to an integer?

          Comment


          • #6
            Thank you Jeff Pitblado. Actually, I already knew that I can get rid of the value label in the way you showed. Yet, that does not really solve my problem. I would have to know for which variables I should "break the value label association" in this way. My actual data has thousands of variables and a lot of them have those "empty" value label containers that only show in the output of the describe command, but don't actually assign any value labels. And I don't know in advance which variables have those empty containers. I don't get them via "ds, not(vallabel)". How can I get them?

            Thanks again
            Andreas

            Comment


            • #7
              elabel from SSC can help.

              Code:
              . sysuse auto
              (1978 automobile data)
              
              .
              . label values mpg   mpg
              
              . label values rep78 rep78
              
              .
              . elabel dir , nomemory
              origin
              rep78*
              mpg*
              
              Note: * indicates value label is not stored in memory
              
              .
              . return list
              
              macros:
                            r(names) : "origin"
                             r(used) : "origin"
                        r(undefined) : "rep78 mpg"
              
              .
              . elabel swap (`r(undefined)') (.)
              
              .
              . describe
              
              Contains data from C:\Program Files\Stata17\ado\base/a/auto.dta
               Observations:            74                  1978 automobile data
                  Variables:            12                  13 Apr 2020 17:45
                                                            (_dta has notes)
              ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
              Variable      Storage   Display    Value
                  name         type    format    label      Variable label
              ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
              make            str18   %-18s                 Make and model
              price           int     %8.0gc                Price
              mpg             int     %8.0g                 Mileage (mpg)
              rep78           int     %8.0g                 Repair record 1978
              headroom        float   %6.1f                 Headroom (in.)
              trunk           int     %8.0g                 Trunk space (cu. ft.)
              weight          int     %8.0gc                Weight (lbs.)
              length          int     %8.0g                 Length (in.)
              turn            int     %8.0g                 Turn circle (ft.)
              displacement    int     %8.0g                 Displacement (cu. in.)
              gear_ratio      float   %6.2f                 Gear ratio
              foreign         byte    %8.0g      origin     Car origin
              ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
              Sorted by: foreign
                   Note: Dataset has changed since last saved.

              Edit:

              codebook with the problems option will also find undefined value labels. ds would then find variables with those value labels (but only in the current label language).
              Last edited by daniel klein; 05 Sep 2023, 08:48.

              Comment


              • #8
                I checked whether findname from the Stata Journal could find such variables. Short answer: Not as published, but my private version now can, and the public version will be updated in due course from the Stata Journal.

                Comment


                • #9
                  Originally posted by daniel klein View Post
                  elabel from SSC can help.

                  [code]
                  . sysuse auto
                  (1978 automobile data)

                  .
                  . label values mpg mpg

                  . label values rep78 rep78

                  .
                  . elabel dir , nomemory
                  origin
                  rep78*
                  mpg*

                  Note: * indicates value label is not stored in memory

                  .
                  . return list

                  macros:
                  r(names) : "origin"
                  r(used) : "origin"
                  r(undefined) : "rep78 mpg"

                  .
                  . elabel swap (`r(undefined)') (.)
                  Thank you very much. That solves my problem!

                  Comment


                  • #10
                    Originally posted by Nick Cox View Post
                    I checked whether findname from the Stata Journal could find such variables. Short answer: Not as published, but my private version now can, and the public version will be updated in due course from the Stata Journal.
                    Nice, thanks!

                    Comment

                    Working...
                    X