Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Renaming many variables for reshape wide from long

    Hello everyone!

    I have a large dataset, in long format, that I would like to reshape to wide format. However, some of my variable names are too long (but less than 32 characters long, so no errors when importing to Stata), and when I try the reshape command, I get an error that "thisvariableislongbutlessthan32_1" invalid variable name r(198);.

    Is there a way around this? Unfortunately, it is not possible to rename the variables before importing the dataset, I use REDCap and if I change the variable name there, I lose all data that was associated with that variable.

    Is there a way to change variable names for multiple variables at once? My dataset has 1423 variables and I found that 104 of them have 30 to 32 characters, so changing names one by one would consume a lot of time. It would be really helpful if I could do something like drop the last 4 or 5 characters of variables with more than 30 characters, for example (or the first 5 characters, anything really, as long as there is a pattern). Also, the variables names have some level of information, so I would like to avoid renaming them to "var1", "var2", "var3", etc., if possible (for example, one variable is "viral_infection_anatomopatholog": it would be fine as "viral_infection_anatomopat", and much more informative than "var97").

    Of course, if there is another way to solve this problem (without renaming variables), that would be very welcome too!

    Below, an artificial dataset to illustrate my problem:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float id str10 _j float(variablewithtoomanycharacters_01 variablewithtoomanycharacters_02 variablewithtoomanycharacters_03 variablewithtoomanycharacters_04 variablewithtoomanycharacters_05)
    1 "_1" 1 1 1 1 1
    1 "_2" 1 1 1 1 1
    1 "_3" 1 1 1 1 1
    1 "_4" 1 1 1 1 1
    2 "_1" 1 1 1 1 1
    2 "_2" 1 1 1 1 1
    2 "_3" 1 1 1 1 1
    3 "_1" 1 1 1 1 1
    3 "_2" 1 1 1 1 1
    3 "_3" 1 1 1 1 1
    3 "_4" 1 1 1 1 1
    3 "_5" 1 1 1 1 1
    end
    . reshape wide variablewithtoomanycharacters_01- variablewithtoomanycharacters_05, i(id) j(_j) string
    (note: j = _1 _2 _3 _4 _5)
    variablewithtoomanycharacters_01_1 invalid variable name
    r(198);

    Thank you!!

    Leo




  • #2
    Code:
    rename (*) (var#), addnumber(1)
    It helps if you have common prefixes/ suffixes. See

    Code:
    help rename group

    so I would like to avoid renaming them to "var1", "var2", "var3", etc., if possible (for example, one variable is "viral_infection_anatomopatholog": it would be fine as "viral_infection_anatomopat", and much more informative than "var97").
    Here is some technique:


    Code:
    qui ds
    local list
    foreach var in `r(varlist)'{
        local v= cond(length("`var'") >10, substr("`var'",1,5) + "_" + substr("`var'", -5, 5), "`var'")
        local list "`list' `v'"
    }
    rename (`r(varlist)') (`list')

    Your trimming rule in the code above can be represented as (e.g., if the variable has more than 20 characters, use the first 20)

    Code:
    local v= cond(length("`var'") >20, substr("`var'",1,20), "`var'")
    Last edited by Andrew Musau; 02 Mar 2021, 14:18.

    Comment


    • #3
      Dear Andrew, you are a genius, thanks!

      I just tried the commands with my dataset, and they worked perfectly! The only difference I made is that I am using the first 20 characters + the last 3, so there is no risk of having two or more variables with the same name.

      Again, thank you so much!!

      Leo

      Comment


      • #4
        Thanks, but it's just experience. Thinking about it, it would be more efficient to rename each variable separately. In this way, you avoid long lists.

        Code:
        foreach var of varlist *{
            rename `var' `=substr("`var'", 1, 20)'
        }

        Comment

        Working...
        X