Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • renaming variables using values they hold

    Hey everyone,

    Is there a simple command (or few lines of code) to rename variables based on a specific string value they hold?

    In more detail, the situation is that I have an elections results data set in which every variable is a "party", and the values it holds represent the amount of votes for that party. the problem is that variables are named by the letters represent each party on the ballots. I would like to rename the variables so that they indicate the parties' names. For doing that, I have another small table that has the same unwanted names (the representing letters) as variable names, but under each of them there is a string value of the party's real name.

    Any ideas how to deal with this issue?

    Much appreciation,
    Ben

  • #2
    This is all abstract. Why not give a data example? The names of the parties will be problematic as variable names unless they are single words without punctuation, spaces or other punctuation. (They would also be problematic if the names were very long or began with numbers, which seem less likely.) For most Stata purposes names would be better as variable labels.

    Please read and act on https://www.statalist.org/forums/help#stata

    Comment


    • #3
      Hi Nick,
      I coudln't upload a file, but to make it short, my data looks like this:

      amt bcd
      ha'avoda yahadut_hatora
      2 0
      2 0
      1 0
      4 0

      I would like to rename "amt" to "ha'avoda", and "bcd" to "yahadut_hatora".

      (Note that the numbers are from the results data set I mentioned earlier, and the words are from the table. The example here is taken from my data set after appending them together).

      Comment


      • #4
        Alright I think I got it now:


        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str16 אמת str14 ג str11 דעם
        "ha'avoda" "yahadut_hatora" "ra'am_balad"
        "2"        "0"              "31"         
        "2"        "0"              "69"         
        "1"        "0"              "75"         
        "4"        "0"              "36"         
        "7"        "0"              "69"         
        "5"        "0"              "26"         
        "5"        "1"              "53"         
        "4"        "1"              "38"         
        "9"        "0"              "79"         
        "1"        "0"              "32"         
        "0"        "0"              "35"         
        "9"        "0"              "59"         
        "1"        "0"              "77"         
        "5"        "0"              "31"         
        "2"        "0"              "32"         
        "7"        "0"              "65"         
        "2"        "0"              "23"         
        "3"        "1"              "21"         
        "3"        "1"              "54"         
        "4"        "0"              "62"         
        "7"        "0"              "73"         
        "6"        "4"              "17"         
        "10"       "0"              "22"         
        "1"        "0"              "56"         
        "0"        "0"              "2"          
        "3"        "0"              "125"        
        "0"        "0"              "118"        
        "1"        "0"              "110"        
        "4"        "0"              "80"         
        "0"        "0"              "82"         
        "6"        "0"              "69"         
        "1"        "0"              "58"         
        "3"        "0"              "103"        
        "4"        "0"              "70"         
        "2"        "0"              "78"         
        "8"        "0"              "139"        
        "0"        "0"              "99"         
        "1"        "1"              "108"        
        "0"        "2"              "186"        
        "0"        "0"              "267"        
        "0"        "0"              "135"        
        "0"        "0"              "89"         
        "1"        "0"              "123"        
        "0"        "0"              "183"        
        "0"        "0"              "172"        
        "4"        "0"              "20"         
        "1"        "0"              "41"         
        "2"        "0"              "146"        
        "35"       "0"              "0"          
        "44"       "1"              "0"          
        "1"        "2"              "0"          
        "49"       "1"              "0"          
        "38"       "0"              "1"          
        "39"       "0"              "0"          
        "6"        "2"              "0"          
        "39"       "0"              "0"          
        "30"       "0"              "0"          
        "29"       "2"              "1"          
        "19"       "8"              "0"          
        "30"       "9"              "0"          
        "35"       "0"              "0"          
        "56"       "0"              "0"          
        "66"       "2"              "0"          
        "23"       "0"              "0"          
        "42"       "1"              "0"          
        "50"       "0"              "0"          
        "73"       "0"              "0"          
        "26"       "2"              "0"          
        "56"       "0"              "1"          
        "42"       "0"              "0"          
        "64"       "4"              "0"          
        "42"       "0"              "0"          
        "15"       "1"              "0"          
        "19"       "1"              "0"          
        "3"        "18"             "0"          
        "2"        "18"             "0"          
        "1"        "0"              "0"          
        "1"        "5"              "0"          
        "0"        "5"              "0"          
        "2"        "3"              "0"          
        "4"        "1"              "0"          
        "0"        "3"              "0"          
        "23"       "0"              "0"          
        "13"       "5"              "0"          
        "25"       "0"              "0"          
        "34"       "0"              "0"          
        "5"        "0"              "0"          
        "12"       "0"              "0"          
        "1"        "0"              "70"         
        "1"        "0"              "16"         
        "1"        "0"              "70"         
        "1"        "0"              "64"         
        "1"        "0"              "141"        
        "0"        "0"              "32"         
        "0"        "0"              "15"         
        "0"        "0"              "22"         
        "0"        "0"              "77"         
        "0"        "0"              "51"         
        "0"        "0"              "10"         
        end

        Comment


        • #5
          The first observation contains metadata that are suitable variable labels.

          That's standard for say MS Excel, but not for Stata. String versions of numeric variables are no use.

          It's probably going to be easiest for you to go back and do the import correctly. I guess you have many other variables, e.g. place and/or time too. The help for import commands explains options to handle this.

          Comment


          • #6
            Hey Nick,
            it's not exactly correct.
            The first data set I've described is made only with numeric variables and with variable names that are in Hebrew (Parties' names). Elsewhere, I have another tiny data set (which I produced manually) that has the same variable names, each of the holding a string value with the corresponding English name of that party.

            What can be seen here is the two data sets already appended (and for that purpose I converted the first's values to strings).

            Is there a way to tell Stata "take the first row's values and use them as variable names"?

            Comment


            • #7
              You can, with:
              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str16 אמת str14 ג str11 דעם
              "ha'avoda" "yahadut_hatora" "ra'am_balad"
              "2"        "0"              "31"        
              "2"        "0"              "69"        
              "1"        "0"              "75"        
              "4"        "0"              "36"        
              "7"        "0"              "69"            
              end
              foreach v of varlist *{
              label var `v' "`v'"
              local name = subinstr(`v'[1],"'","",.)
              ren `v' `name'
              }
              drop in 1
              Notice that there is a line that gets rid of the ', this is because Stata variable names cannot have special characters other than the _

              Nick's advice still holds though.
              Better solution with your data would be: import the dataset with the actual data, then use your other dataset or excelfile to create some code to apply variable labels (or rename).

              Comment


              • #8
                Hi, thanks!

                To learn for next times, is the following line

                Code:
                 
                 local name = subinstr(`v'[1],"'","",.)
                the one that indicates from which row it'll take the strin for the name? Specifically, the "[1]" part indicates the first row?



                Ben

                Comment


                • #9
                  Indeed. [1] indicates the first observation, which people who have spent too much time around spreadsheets often call the first row. That's not terminology used in Stata itself.

                  Comment


                  • #10
                    I guess you're right, Nick, I do spend a lot of time around them.

                    As for the code proposed by Jorrit Gosens, what if I have a variable whose first observation is empty (don't know the actual party's name)? The loop stops when it gets to that variable. How can I make it just skip it?

                    Comment


                    • #11
                      The loop can be modified to put capture in front of the rename. That would catch not only cases where the observation's value was an empty string, but also other problems such as invalid characters not removed.

                      Code:
                      foreach v of varlist * {    
                           label var `v' "`v'"    
                           local name = subinstr(`v'[1],"'","",.)    
                           capture ren `v' `name'
                      }

                      Comment


                      • #12
                        Code:
                        * Example generated by -dataex-. To install: ssc install dataex
                        clear
                        input str16 אמת str14 ג str11 דעם
                        "ha'avoda" "yahadut_hatora" ""
                        "2"        "0"              "31"        
                        "2"        "0"              "69"        
                        "1"        "0"              "75"        
                        "4"        "0"              "36"        
                        "7"        "0"              "69"            
                        end
                        foreach v of varlist *{
                        label var `v' "`v'"
                        local name = subinstr(`v'[1],"'","",.)
                        capture ren `v' `name'
                        }
                        drop in 1
                        But again, not a great idea to follow this route. Your variables are still all string variables afterwards, you would still have issues with other special characters in the items you which to use as variable names, and now you have a mix of variable names in hebrew and phonetic english. You are basically creating an issue and then spending time fixing it, and you could also try to prevent the issue in the first place.

                        Comment


                        • #13
                          Jorrit in #12 is I think making the point I made in #5. From what we see, repeating the import looks a better route.

                          (We've not yet touched on the need to destring.)

                          Comment


                          • #14
                            The "capture" solved it.

                            String variables don't impose too much of a problem - I dealt with them using "destring" later.
                            Importing from the beginning isn't an option as I have many files from many elections campaigns, each of them containing different variables (parties pop up and disappear occationally). So instead, I have a conclusive table in Excel in which real names in English are assigned to the Hebrew letters. And I'm able to add one row to this table whenever another elections campaing takes place, with the relevant parties that took part in it. Above all, I would like everything to be documented carefully, i.e., to have the original files in Excel, the table with the names, the final data sets in Stata, and of course a thoroughly documented process. For these reasons I preferred all the mass over just changing the variables names manually in Excel before the import.

                            Above all - thank you both very much! It's all working now.

                            Ben

                            Comment


                            • #15
                              Hello Statlist

                              I have an identifier variable that contains "-" and "A" or "D".
                              And I want, remove "-" and replace "A" by "01" and "D" by "04".
                              So the expected result is from "21-00629642062-A0581" to "2100629642062010581"
                              Tried with subinstr () command
                              but it does not give me the expected result quickly and directly.
                              Could somebody please guide me?
                              folio is a str20 and %20s format
                              Thanks.

                              P.D. In the dataset there are only A or D but in another data set I have more variety from A to Z, could we assign the number that corresponds to each letter, A = 01, B = 02 ,,, Z = 26?


                              Code:
                              * Example generated by -dataex-. To install: ssc install dataex
                              clear
                              input str20 folio
                              "21-00629642062-A0581"
                              "21-00629817100-A0161"
                              "21-00630303404-A0521"
                              "21-06031055934-A0091"
                              "21-06045510794-A0131"
                              "21-06080251183-A0181"
                              "21-06110088342-A0011"
                              "21-06111278168-A0021"
                              "21-06136982166-A0081"
                              "21-06141153867-A0111"
                              "21-06656603331-A0081"
                              "31-00526186056-A0161"
                              "31-05179714140-A0061"
                              "31-05187499372-A0241"
                              "31-05233004267-A0551"
                              "41-00475688171-A0171"
                              "62-03097089976-D0181"
                              "71-05275576380-A0031"
                              "81-07317594230-A0291"
                              "92-11254755387-D0201"
                              end


                              Comment

                              Working...
                              X