Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trying to change variable names in the entire dataset to their labels, plus prefix

    Dear Statalisters,

    I am trying to change the names to all variables in my dataset to their variable labels but also adding a prefix of s if the variable string and n if the variable is numeric.

    Please let me know what is wrong with the following code? It is giving an error related to brackets among others.

    The problem is the variables start with a number, but the label is informative. I would like the variable to become the label but with a prefix depending on its format.

    Thank you for your help,

    May



    * Step 1: Store variable labels in local macros
    foreach var of varlist _all {
    local label`var' : variable label `var'
    }

    * Step 2: Clean and rename variables using their labels and add appropriate prefix
    foreach var of varlist _all {
    local newname = `"`label`var''"'

    * Clean the new name: replace spaces with underscores, remove special characters, and shorten if necessary
    local newname = subinstr(`"`newname'"', " ", "_", .) // Replace spaces with underscores
    local newname = subinstr(`"`newname'"', ".", "", .) // Remove periods
    local newname = subinstr(`"`newname'"', ",", "", .) // Remove commas
    local newname = substr("`newname'", 1, 32 - 2) // Truncate to fit with prefix (2 characters for prefix)

    * Determine the prefix based on variable type
    local prefix
    if strpos("`: type `var''", "str") {
    local prefix "s_"
    } else {
    local prefix "n_"
    }

    * Add prefix
    local newname = "`prefix'`newname'"

    if "`newname'" != "" {
    rename `var' `newname'
    }
    }

  • #2
    Hi May,
    I'm just too lazy to come up with sample data myself, which may not correspond to your data anyway. Hence the tip for next time: Add sample data using datex.

    At first glance
    your syntax looks correct, but you can't have any other syntax on the same line that has a closing bracket }, which is the case here:
    Code:
    local prefix "s_"
    } else {
    local prefix "n_"
    }
    If there are also brackets in your variable labels, you also have to remove them, otherwise the renameat the end will probably not work!

    Good luck

    Comment


    • #3
      I agree with Benno Schoenberger that we need a reproducible example.

      Otherwise you don't need two loops. Consider this variation, which I have not tested. It can be made shorter, but sometimes short can be too short.

      Code:
      help strtoname()
      
      foreach var of varlist _all {
           local newname : variable label `var'
           local newname = strtoname(`"`newname'"')
           local newname = substr("`newname'", 1, 30)
      
           local prefix = cond(strpos("`: type `var''", "str"), "s_", "n_")
          
           capture rename `var' `prefix'`newname'
      }
      Last edited by Nick Cox; 18 Jul 2024, 03:04. Reason: Fixed extra quote: See #4.

      Comment


      • #4
        I really like this short solution, but would use
        Code:
        local newname = strtoname(`"`newname'"')
        instead!
        Thanks Nick Cox for poiting me to strtoname, wasn't aware of this.


        Comment


        • #5
          Thanks much for the fix. Told you I hadn't tested it!

          Comment


          • #6
            All that said, using variable labels for better variable names may work well and it may work badly. For example, variable labels can be 80 characters long and it may be that the important details that distinguish one variable from another clearly are in the later part of a label.

            We're all in favour of automating what can be automated.

            But for datasets with say 3, 10, or 30 variables that I am going to use more than trivially,

            * I may not care much about variable names insofar as I will not use them in a report or I can rely on variable labels showing up on a graph.

            * If I've inherited or imported a dataset with poor choices of variable names, I will usually spend a few minutes thinking up better names ad hoc (here translated as fit for purpose). When names are lousy choices such as v1 v2 v3 and so forth, this saves time as otherwise you need to keep checking what is which.

            I don't often deal with datasets with hundreds or thousands of variables, but when i do it's rare that they're all important to me.
            Last edited by Nick Cox; 18 Jul 2024, 07:13.

            Comment

            Working...
            X