Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Renaming 300 variables with meaningful names

    Hello, inexperienced user here. I have a dataset (provided as .dtas) of 5 waves and 300 variables sourced from a questionnaire, which I would like to combine to create a longitudinal panel dataset. However, while the variables are the same in content, they are all named after the question number in their year's questionnaire, and questions are often asked in different orders. For example, every wave has a gender variable, but 2016's is named "q3" and 2017's is named "q11". As such, I need to rename each wave's variables before I can combine them. However, instead of just renaming them "var1, var2, var3" etc, I would like to rename them "age, gender, birth_city" and so on. This prevents me from using loops to generate new variable names as I have seen in many of the other queries here regarding renaming.

    I am aware of the rename command's "rename group" feature. However, with 300 variables, the command looks like a solid wall of text. Is there any easier way to, for example, import variable names as a .csv and replace them? Or is my only option the 50-line rename?

    Would appreciate any advice!

  • #2
    Welcome to Statalist!

    Or is my only option the 50-line rename?
    Given you self-claimed to be an inexperienced user, yes, I'd agree that sitting down and writing out a recoding do-file would be the way. At least, this will provide a very clear record of renaming and is extremely straightforward.

    However, if you have a table recording information about their name changes like this:
    Wave 1 Wave 2 Wave 3 Wave 4 Wave 5
    Sex q2 q7 q7 q11 q8
    Then it's possible to write a simple program to make the input less tedious.

    And regarding the title, I'd also suggest to rename and merge the variables you'll need for the analysis. It's very rare an analysis will utilize 300 variables.
    Last edited by Ken Chui; 26 Nov 2022, 09:05.

    Comment


    • #3
      Here is an example demonstrating the sort of program Ken Chui described in post #2.
      Code:
      // list of variable names
      frame create names
      cwf names
      input str16 (new v2016 v2017 v2018 v2019 v2020)
      gender q3 q11 q18 q19 q20
      age q4 q9 q9 q10 q11
      end
      cwf default
      
      // wave 2016 data
      input id q3 q4
      101 1 42
      102 2 33
      103 1 16
      end
      
      // rename using 2016 names
      cwf names
      forvalues v = 1/`c(N)' {
          local o = v2016[`v']
          local n = new[`v']
          frame default: rename `o' `n'
      }
      cwf default
      
      list
      Code:
      . list
      
           +--------------------+
           |  id   gender   age |
           |--------------------|
        1. | 101        1    42 |
        2. | 102        2    33 |
        3. | 103        1    16 |
           +--------------------+

      Comment


      • #4
        Thank you both for the advice! I do have a table as described so I will give William’s code a go

        Comment

        Working...
        X