Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Renaming variable labels with label values

    Colleagues,

    I'm interested in renaming variables with variable label values, as discussed here on Statalist. The problem I'm currently facing is that the names created in such a manner won't be syntactically correct (the variable labels contain spaces and other illegal characters). Consequently, I wanted to ask if there is a neat way of producing syntactically correct variable names from a given string, broadly on the lines of the R make.names function that shortenes the string and removes illegal characters from a given string so it can be used as a variable name? As an example, I have the following string: Aged 24 and under, claiming for over 12 months_Sep-85_Total. I'm not particularly fussy about the actual variable as soon as it it contains information that is crucial for me (age, over period, date, total identifier). I know that I can address this problem by manipulating the strings in loop, but ideally I would address this through one function as I will have to import numerous sets of variable and I am not keen to modify the code specifying how many characters to remove for which strings. On similar lines, it occurs to me that Stata should have something on the lines of the make.names function as it attempts to generate syntactically correct and meaningful variable names when importing the data from foreign files.
    Kind regards,
    Konrad
    Version: Stata/IC 13.1

  • #2
    Konrad,

    see help strtoname().

    Also see lab2varn (part of labutil2 from SSC), possibly in combination with labvarch (part of labutil from SSC).

    Best
    Daniel
    Last edited by daniel klein; 02 Sep 2014, 04:09.

    Comment


    • #3
      Daniel,

      Thank you very much for the helpful answer, much appreciated.
      Kind regards,
      Konrad
      Version: Stata/IC 13.1

      Comment


      • #4
        It appears that I'm doing something wrong:
        Code:
        // Variable labels to names
        foreach v of v* {
           local x : variable label `v'
           rename `v' `=strtoname(`x')'
           }
        invalid syntax
        r(198);
        I noticed that varlist was missing but it still doesn't work:
        Code:
        . // Variable labels to names
        . foreach v of varlist v* {
          2.    local x : variable label `v'
          3.    rename `v' `=strtoname(`x')'
          4. }
        claimingforover12months_Jul not found
        syntax error
            Syntax is
                rename  oldname    newname   [, renumber[(#)] addnumber[(#)] sort ...]
                rename (oldnames) (newnames) [, renumber[(#)] addnumber[(#)] sort ...]
                rename  oldnames              , {upper|lower|proper}
        r(198);
        
        end of do-file
        This: claimingforover12months_Jul is something that is generated in the process as I have no such variable in the data set.
        Last edited by Konrad Zdeb; 02 Sep 2014, 04:51. Reason: Code.
        Kind regards,
        Konrad
        Version: Stata/IC 13.1

        Comment


        • #5
          Your foreach syntax is wrong. See help foreach.

          Should be something along

          Code:
          foreach v of var v* {
              loc x : var l `v'
              if mi("`v'") continue
              ren `v' `= strtroname("`x'")'
          }
          You might want to create two lists and call rename onyl once.

          Code:
          foreach v of var v* {
              loc varl : var l `v'
              if mi("`varl'") continue
              loc varnames `varnames' `v'
              loc lblnames `lblnames' `= strtoname("`varl'")
          }
          
          ren (`varnames') (`lblnames')
          Best
          Daniel

          Comment


          • #6
            The first problem arises because you are omitting the keyword varlist in foreach. The loop is never entered.

            If you study the help for strtoname()all the examples use " ".

            As the FAQ advises,

            If you can, reproduce the error with one of Stata's provided datasets, a small fragment of your dataset, or a simple concocted dataset that you include in your posting.

            Comment


            • #7
              Thanks for getting back to me, I managed to get it to work:

              Code:
              // Variable labels to names
              foreach v of varlist a* {
                 local x : variable label `v'
                 di "`x'"
                 rename `v' age24clover12mnths_`=strtoname(substr("`x'",-6,.))'
              }
              Kind regards,
              Konrad
              Version: Stata/IC 13.1

              Comment


              • #8
                I need to make one more change, I have variables named as: n_age24clover12mnths_oct_04, n_age24clover12mnths_jun_04, I would like to remove last "_" so I can get: n_age24clover12mnths_oct04. I tried renvars (SJ) with postsub but it doesn't appear to work:
                Code:
                . renvars n_age24clover12mnths_*, postsub(_ ) test
                no renames necessary
                Kind regards,
                Konrad
                Version: Stata/IC 13.1

                Comment


                • #9
                  The help for renvars explains that postsub() is for trailing strings, i.e. strings that occur at the end of variable names. You don't have any variable names, of those supplied, that end in an underscore. So the message is correct; you are using the wrong option.

                  Why you are using renvars any way? rename was rewritten in Stata 12 to handle multiple renames. If you are not using an up-to-date Stata, you should be flagging that in your posts.

                  Comment


                  • #10
                    I'm using most recent Stata version, I just like renvars. I did but I don't think it's the most efficient code the one can develop:
                    Code:
                    foreach var of varlist n_age24clover12mnths_* {
                        local oldname = substr("`var'",1,24)
                        local newname = substr("`var'",-2,.)
                        rename `var' `oldname'`newname' , dryrun
                        }
                    Kind regards,
                    Konrad
                    Version: Stata/IC 13.1

                    Comment


                    • #11
                      I need to make one more change, I have variables named as: n_age24clover12mnths_oct_04, n_age24clover12mnths_jun_04, I would like to remove last "_" so I can get: n_age24clover12mnths_oct04.
                      In Stata 12 this should be as simple as

                      Code:
                      ren (*_*) (**)
                      Best
                      Daniel

                      Comment


                      • #12
                        Originally posted by daniel klein View Post
                        Code:
                        ren (*_*) (**)
                        Thanks may I ask why * not "?
                        Kind regards,
                        Konrad
                        Version: Stata/IC 13.1

                        Comment


                        • #13
                          This question is kind of hard to answer.

                          First, I do not know how exactly your variable names look like, as you are not very specific about this. This is why I used the most general approach, matching your description. Second, and more important, which of the four asterisk do you mean by *, and which of them do you want to replace by question mark(s)?

                          In my syntax *_* selects all variables that have one underscore followed by 0 or more characters - which matches your description. If the variables you want to select all end in an underscore and two digits exactly, *_(##) would probably be the better choice here.

                          The first asterisk in ** tells Stata to copy anything before the underscore. The second asterisk copies the matched characters after the underscore. Again, if you selected the variable names using (##) in oldnames, *(##) would be more explicit.

                          I do not see how a question mark could help here.


                          Edit:
                          If I further assume your variable names all have the format something_mmm_##, where mmm are three letters, indicating moth, as jan, feb, mar, .., dec, then an explicit code could read

                          Code:
                          ren (*_???_(##)) (*_???(##))
                          Best
                          Daniel
                          Last edited by daniel klein; 02 Sep 2014, 08:37.

                          Comment


                          • #14
                            Hi everyone,

                            I am trying to rename a bunch of stub variables generated after reshape wide. I want to replace the j variables codes in the name of the new variables with the cord ponding value labels. Any ideas?

                            Comment

                            Working...
                            X