Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Renaming variables and generating new variables based on key

    Dear all,

    I am new to Stata and currently in the process of cleaning my data / coding variables.

    I have come across two issues and would appreciate your guidance:

    (1) Renaming variables according to a key
    I have ~300 variables which are to be renamed according to a key in an Excel file. The key specifies the current name of the variable (string), the new name and a label for the variable.
    Is there a way to automatically rename and label all variables according to the key (without having to manually enter the rename command and new name for each variable)?

    (2) Generating new variables according to key
    I have a variable called "Industry" which specifies industry segments on a very granular level and would like to generate a new variable for industry which subsumes industry segments into higher level categories. Again, I have a key that specifies how industry segments are subsumed. What would be the command to generate a new variable based on this key?

    Example:
    Current category New category
    Retail banking Financial services
    Investment banking Financial services
    Investment advisory Financial services

    Many thanks in advance for your help!

  • #2
    (1). This is a tad complicated. First, use -import excel- to create a Stata data file with the old names, new names, and labels. I'll assume that the variables in it are called old_name, new_name, and new_label. Let's say this new data set is called renaming.dta. Then you do something like this:

    Code:
    clear*
    use renaming
    count
    local nobs = r(N)
    forvalues i = 1/`nobs' {
        local oldname`i' = old_name[`i']
        local newname`i' = new_name[`i']
        local newlabel`i' = new_label[`i']
    }
    
    use the_original_data, clear
    forvalues i = 1/`nobs' {
        rename `oldname`i'' `newname`i''
        label var `newname`i'' `"`newlabel`i''"'
    }
    Note: Not tested, be ware of typos or other errors. Be aware also that this code (or any other code that attempts the same task) can only work if the contents of the Excel spreadsheet contains the correct existing variable names and the new names are all legal Stata variable names.

    (2) Assuming that your key is again a spreadsheet, import it into a Stata data set. Let's call this new data set category_crosswalk.dta. And let's assume the variables are called current_category and new_category respectively.
    Code:
    //  VERIFY EACH CURRENT CATEGORY IS MAPPED CONSISTENTLY
    //  TO ONLY ONE NEW CATEGORY
    isid current_category using category_crosswalk
    
    //  MERGE THE NEW CATEGORIES INTO THE ORIGINAL DATA
    use the_original_data, clear
    merge m:1 current_category using category_crosswalk
    Again, not tested.

    Comment


    • #3
      Many thanks, Clyde! It worked perfectly.

      Comment

      Working...
      X