Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Renaming variables if they exist

    I am merging many files with similar, but not identical variable names. For example, start_year, startYear, yearStart may all be used in different files. Is there a way to ask Stata to check whether a variable exists and then rename it if it does? Essentially, I want to do the following:

    rename (start_year startYear yearStart) (beginYear beginYear beginYear) without it returning any errors.

    Thank you,
    Krista

  • #2
    That's always going to produce errors as the three new names are the same. For many, many files, I might do it something like this. Code not tested.


    Code:
     
    local done 0 
    
    foreach v in start_year startYear yearStart { 
    
           capture confirm var `v' 
    
           if _rc == 0  & !`done' { 
                    rename `v' beginYear 
                    local done = 1 
                    local V `v' 
           }
           else if _rc == 0 & `done' { 
                   di as err "`v' exists as well as `V'" 
           } 
    
    }



    Comment


    • #3
      This is working well. Is there a way to add a few lines that would allow me to rename the variables based on the first few characters of their old variable names? I was thinking of adding in the following lines. If I can add these sorts of if statements, I could rename startYear, endYear, plantID, plantCode, etc. all at once. I know tostring can't be used in this way. Is there another way to do this kind of thing?

      Code:
      tostring(`v'), generate(old_var)
      if substr(old_var,1,3)="sta"{
      rename `v' beginYear
      }

      Last edited by Krista Lane; 14 Dec 2015, 10:07.

      Comment


      • #4
        That's a heck of a mess to try to understand. Please use CODE delimiters. Let's say that again: Please use CODE delimiters!

        With a guess at what you mean:

        Trying to write code like that is likely to cause more problems than it solves. Just double up: that's my advice.


        Code:
        local done 0 
        
        foreach v in start_year startYear yearStart { 
        
               capture confirm var `v' 
        
               if _rc == 0  & !`done' { 
                        rename `v' beginYear 
                        local done = 1 
                        local V `v' 
               }
               else if _rc == 0 & `done' { 
                       di as err "`v' exists as well as `V'" 
               } 
        
        }
        
        local done 0 
        
        foreach v in end_year endYear yearEnd { 
        
               capture confirm var `v' 
        
               if _rc == 0  & !`done' { 
                        rename `v' concludeYear 
                        local done = 1 
                        local V `v' 
               }
               else if _rc == 0 & `done' { 
                       di as err "`v' exists as well as `V'" 
               } 
        
        }


        Comment


        • #5
          Sorry about that! And thanks for the help.

          Comment


          • #6
            I encountered exactly the same problem recently. Though Nick's solution is elegant, I wonder if there is any built-in or user-written command to finish this job? If not, I have to wirte a lot of loops in my code

            Comment


            • #7
              Exactly the same problem presumably can't be what is in #1.

              This may help. Suppose the good names are a b c and the existing names are a b h. So a b are good, but h should be c. Sounds simple, but this is a very dangerous tool!

              Code:
              clear
              set obs 1
              gen a = 1
              gen b = 1
              gen h = 1
              
              begood a b h, good(a b c)
              
                 oldname | newname
                ---------+---------
                       h | c
                -------------------

              Code:
               
              *! 1.0.0 NJC 25 April 2018 
              program begood
                  version 15 
                  syntax varlist, good(string) 
              
                  local nexist : word count `varlist' 
                  local ngood : word count `good' 
                  if `nexist' != `ngood' { 
                      local se = cond(`nexist' > 1, "s", "") 
                      local sg = cond(`ngood' > 1, "s", "") 
                      di as err "`nexist' variable`se', but `ngood' name`sg'" 
                      exit 498 
                  } 
              
                  foreach v of local varlist { 
                      gettoken this good : good 
                      if "`this'" != "`v'"  { 
                           local old `old' `v' 
                           local new `new' `this' 
                      } 
                 }
              
                 if "`old'" != "" { 
                      capture noisily rename (`old') (`new'), dryrun 
                      if _rc == 0  rename (`old') (`new') 
                 }
                 else { 
                      di "no news is good news" 
                      exit 0 
                 }
              end



              Comment


              • #8
                Many thanks for above canned programme. My original post is a little bit misleading and should be "quite similar" rather than "exactly the same". I want to do something like a <-- (a b c d) e <-- (e f g) h <-- (i j k) and so on, where a e h are "good names" or "chosen names". So what I want is put a canned programme (routine) in a loop , and the loop go over all variables in the dataset

                Comment


                • #9
                  That's more difficult to do. My program may be a start for you but sorry, I can't be enthusiastic about generalizing it. It also sounds like a way to mess things up mightily if you get something wrong somewhere.

                  For example, I think you're saying

                  if there is an -a- that is fine

                  but if not if there is one variable that is -b- or -c- or -d- then that should be renamed -a-

                  But what if there are two or more of -a b c d-?
                  Last edited by Nick Cox; 26 Apr 2018, 02:24.

                  Comment


                  • #10
                    Originally posted by Nick Cox View Post
                    That's more difficult to do. My program may be a start for you but sorry, I can't be enthusiastic about generalizing it. It also sounds like a way to mess things up mightily if you get something wrong somewhere.

                    For example, I think you're saying

                    if there is an -a- that is fine

                    but if not if there is one variable that is -b- or -c- or -d- then that should be renamed -a-

                    But what if there are two or more of -a b c d-?
                    In my case, (a b c d) form a group, so does (e f g) and so on. The reason is the data is appended from many separated (collected) dataset so that the naming standard is different for the same object. I can somehow choose a as the "standard" name of all variable name in the same group : a <-- (a b c d), means I will rename variable name to a if variable name belongs to a b c d. If written in a observation-style , it's something like (pseudocode)
                    Code:
                          foreach j of varlist _all { 
                                 ren `j' "a" if inlist(`j',"b","c","d")
                          }
                    And there are many "groups".
                    Unfortunately, --rename-- does not have --if-- option

                    Comment


                    • #11
                      Even if rename supported an if qualifier and even if your pseudo-code would work as expected [we can make it work, see Edit below], it would not work as expected for the reason implied by Nick's question. Say you have three variables: b, c, d. You would then

                      rename b a

                      Then you would (attempt to)

                      rename c a

                      but there is already a variable a, the first in your list, i.e., former b. You cannot have the same variable name twice. What would you want to do in this situation? Should the first variable be renamed and the latter be left as is? Should no variable be renamed if more than one in the list exist?

                      Nick's code can easily be modified to cope with such situations but you have to lay down the rules. Also, I agree with Nick that this seems to be a dangerous approach.

                      Edit:

                      Pseudo-code to working code (perhaps not as expected and almost certainly dangerous)

                      Code:
                      foreach v of varlist _all {
                          if (inlist("`v'", "b", "c", "d")) {
                              rename `v' a
                          }
                      }
                      Best
                      Daniel
                      Last edited by daniel klein; 26 Apr 2018, 07:36.

                      Comment


                      • #12
                        That's not going to work easily. For example, inlist() can't trap two or more such variables easily. Try this.

                        Code:
                        . clear
                        
                        . set obs 1
                        number of observations (_N) was 0, now 1
                        
                        . gen d = 1
                        
                        . shouldbe a b c d, shouldbe(d)
                        good news: d already present
                        
                        . shouldbe a b c d, shouldbe(a)
                        d renamed as a
                        
                        . rename a g
                        
                        . shouldbe a b c d, shouldbe(a)
                        none of a b c d found
                        r(498);
                        Code:
                        *! not much tested at all
                        *! 1.0.0 NJC 26 April 2018
                        program shouldbe
                            version 15
                            syntax namelist , shouldbe(name)
                            
                            foreach v of local namelist {  
                                capture confirm variable `v'
                                if _rc == 0 {
                                    local exists `exists' `v'
                                }
                            }
                            
                            if "`exists'" == "" {
                                di as err "none of `namelist' found"
                                exit 498
                            }
                            else if word("`exists'", 2) != "" {
                                di as err "different names of `namelist' found: `exists'"
                                exit 498
                            }
                            * get to here: only one of those variables in dataset
                            else if "`exists'" == "`shouldbe'" {
                                di "good news: `shouldbe' already present"
                                exit 0
                            }
                            else {
                                di "`exists' renamed as `shouldbe'"
                                rename `exists' `shouldbe'
                                exit 0
                            }
                        end

                        So the rules are:

                        1. You supply a list of legal names. Only one of those names is allowed to be a variable in the dataset.

                        2. You supply the preferred name.

                        3. If that preferred name is already in use, good and exit.

                        4. Otherwise if one of those other legal names exists as variable name, rename.

                        5. If two or more of those names are in use, you have a problem.

                        6. If none of those names are in use, you have a problem.

                        So, perhaps, you need a script like this

                        Code:
                        shouldbe a b c d, shouldbe(a)
                        shouldbe e f g, shouldbe(e)
                        shouldbe i j k, shouldbe(h)

                        Comment

                        Working...
                        X