Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can we have a choice on label names when importing from SPSS?

    Dear all,
    I'm using Stata 18.0.
    From the manual entry related to the help of "import spss" (https://www.stata.com/manuals/dimportspss.pdf), I read:
    Value labels for numeric variables will be named label# and attached to the corresponding variable.
    (actually, I find that labels are named "labels#", with a final "s", but this is not the point).
    This creates problems to me when I want to compare different file versions, since adding a label to an existing variable (or adding a numeric variable with label) changes the numbering, thus the label names. The solution I've found is just to remove all labels before proceding to compare datasets. Is there any alternative to such naming system of labels when we import a file from SPSS that has labels for numeric variables?

  • #2
    After an email exchange with StataCorp, I was able to find a solution to rename labels (their name coinciding with the one of the corresponding variable).

    These are the command lines suggested by StataCorp, to generate the new labels:

    Code:
    describe
    
    ds, has(vall)
    
    
    
    foreach var in `r(varlist)' {
                   
            local `var'_label : value label `var'
           
            label copy ``var'_label' `var'Label
           
            label values `var' `var'Label
           
           
    }
    This is what I added to remove the original label names:

    Code:
    local k=0
    
    foreach var in `r(varlist)' {
                          
           capture label drop labels`k'
            
          local ++k
           
    }

    Comment


    • #3
      Federico Tedeschi, thank you for posting the solution you received from tech support. Despite item 16.1 in the FAQ, not everyone does that. Your follow-up post will be very helpful to anyone who finds this thread when facing the same problem in the future.

      --
      Bruce Weaver
      Email: [email protected]
      Version: Stata/MP 18.5 (Windows)

      Comment


      • #4
        Originally posted by Federico Tedeschi View Post

        This is what I added to remove the original label names:

        Code:
        local k=0
        
        foreach var in `r(varlist)' {
        
        capture label drop labels`k'
        
        local ++k
        
        }
        I had to read this multiple times to understand it. As you are not using `var' inside the loop, at first glance I thought the loop would be unneccessary. But the I recognized that you are simply looping over varlist to adress every variable that has a value label, right?
        The following would be easier to read, but that's probably just personal style.
        Code:
        local nvars : word count `r(varlist)'
        forvalues k = 1/`nvars' {
                capture label drop labels`k'
        }
        and to make sure you also delete all labels that may have other names as "labels" followed by a number, you could also try

        Code:
        foreach var in `r(varlist)' {
            capture: label drop `:value label `var'' 
        }
        All the best!

        Comment


        • #5
          The code in #2 also confused me at first. The key here is that it relies on the idosyncratic property of O.P.'s particular data set that the variables, in the order that -ds- lists them, are labeled with value labels label1, label2,... A more general approach that renames labels to reflect the variable name and drops labels without relying on any particular pre-existing scheme for the starting label names would be something like this:

          Code:
          ds, has(vallabel)
          local labeled_vars `r(varlist)'
          
          foreach lv of local labeled_vars {
              local lbl: val label `lv' // GET THE NAME OF THIS VARIABLE'S VALUE LABEL
              label copy `lbl' Label_`lv'
              label values `lv' Label_`lv'
              label drop `lbl'
          }
          The key point here is that we loop over the value-labeled variables and we calculate the name of the value label inside the loop. By the way, this code is not entirely generic because it will break if one of the variable's names has too many characters to allow the prefixation of Label_ within the 32 character limit for value label names.

          This example is interesting because it contradicts a general rule of thumb: code that is tailored to specific facts about a data set is generally shorter, clearer, and more efficient than code that solves the problem more generically. (This principle is illustrated by the relative slowness and large memory burden of commands like -collapse- and -reshape-, which, being generic, have to distinguish and handle many different situations. While the -gtools- suite (SSC) shows that both of those commands can be sped up while still retaining generic functionality, even so, in particular cases it is sometimes possible to write tailored code for a particular data set that will leave even the -gtools- versions in the dust by exploiting the specifics of the data set. Of course, this is only worth the coding effort if the data set in question is very large and the same code will be run with essentially the same data set repeatedly.) In this case, though, the data-set specific code is longer and more opaque. Odd.

          Comment


          • #6
            This discussion is now beyond the specific question which has been answered in #2.

            I wish to comment on the more generic problem of renaming value labels and how it turns out to be surprisingly complex. Before I do, let me quickly mention that I think ds is overused (here and elsewhere). Why would you want to loop over all variables to pick the ones with value labels attached and then loop over the selected subset again? Would it not be more straightforward to loop over all variables just once and skip the ones with no value label? That is, instead of coding
            Code:
            ds , has(vallabel)
            local labeled_vars `r(varlist)'
            
            foreach var of lableled_vars {
                ...
            }
            wouldn't it be more straightforward to code
            Code:
            foreach var of varlist _all {
                
                local lblname : value label `var'
                
                if ("`lblname'" == "") /// skip variables with no value label
                    continue
                
                ...
              
            }
            Anyway, that's just a minor issue.

            Regarding the last block of code in #4, it is - as it stands - neither useful for the specific problem nor for the more generic one. When executed before the value labels have been copied, there will not be any value labels to copy; when executed after the value labels have been copied, the newly assigned value labels will be dropped again.

            Generally speaking, it is unnecessary to drop unused value labels because Stata will not save them (unless explicitly instructed to do so). As there are many pitfalls (see below), I recommend not dropping any value labels, which StataCorp tech support has chosen to do implicitly.

            The code in #5 also has problems beyond invalid names. It will fail if two variables have the same value label attached. The value label will be copied once and then will be dropped. Because dropping the value label will not detach it from any other variables that might have it attached, the label copy command will fail the second time the value label name comes up. The code will also fail when value labels are undefined. Remember: you may attach value labels to variables without defining the value label's contents (i.e., the integer-to-text mappings). You cannot copy such value labels, and you cannot drop them either. These problems are easily detected. There are more serious problems that, while less likely, will go unnoticed when they occur. The code assumes only one label language; alternatively, it assumes that value labels are not used across label languages. You certainly do not want to drop value labels that might still be attached to variables in a different label language. The suggested code will happily do that and you won't notice until you switch to the respective label language. Another, even more serious problem arises when a value label attached to one variable has the same name as another variable in the dataset. I have illustrated this situation and the consequences in #3 in this thread.

            A generic solution should handle all of these potential problems and, if it follows Stata's implicit rules, should leave the dataset unchanged when an error occurs, which none of the suggested codes does.

            Comment


            • #7
              daniel klein Excellent commentary!

              Comment


              • #8
                After reading numerous other comments on the initial post, I have to admit that I probably didn't fully understand the problem at first. So thanks to daniel klein , the second part of #4 is complete nonsense.

                Comment


                • #9
                  Originally posted by Clyde Schechter View Post

                  Code:
                  ds, has(vallabel)
                  local labeled_vars `r(varlist)'
                  
                  foreach lv of local labeled_vars {
                  local lbl: val label `lv' // GET THE NAME OF THIS VARIABLE'S VALUE LABEL
                  label copy `lbl' Label_`lv'
                  label values `lv' Label_`lv'
                  label drop `lbl'
                  }
                  Thank you, Clyde. This would work if the situation were as I had understood it as the beginning, i.e. with value labels being "labels0" for the first variable with labels, "labels1" for the second one, etc. Instead, when a given label set is repeated (for example: "No=0; Yes=1"), also the label name is. Since I am in this situation, in my case some label names (for example, "labels7") appear multiple times. In this way, I get an error message if I try to remove the original label names as I go. That's why I decided to do it separately. Also, to avoid thinking of something complicated or checking the actual number of original labels, I decided to say "drop labels# for the #-1 variable with a label", even if this meant to ask for cancelling unexisting label names in case of repeated label sets. That's why I had to add
                  Code:
                  capture label drop labels`k'

                  Comment


                  • #10
                    Originally posted by daniel klein View Post
                    I recommend not dropping any value labels, which StataCorp tech support has chosen to do implicitly.
                    Thanks for this clarification.

                    Originally posted by daniel klein View Post
                    The code in #5 also has problems beyond invalid names. It will fail if two variables have the same value label attached. The value label will be copied once and then will be dropped. Because dropping the value label will not detach it from any other variables that might have it attached, the label copy command will fail the second time the value label name comes up.
                    Sorry, I've just read it now. Thus, I can confirm this.
                    Last edited by Federico Tedeschi; 23 Jul 2024, 07:05.

                    Comment

                    Working...
                    X