Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Use another dataset to rename variable based on condition

    Hello,
    I'm new to Stata (and new to this forum ) and I'm trying to do something probably quite basic but it doesn't want to work.
    Basically, I have 2 datasets, firstfile_new (my main dataset) and secondfile (that gives me additionnal information on my variables).

    I would like to rename the variables of the firstfile_new based on an information that I can find on the secondfile. More precisely, I would like to add to the variable names the first 2 characters of their type, so having "2 first characters of type" + "." + "variable name", type being a variable of the secondfile.
    Remark : most of the variables have 8 characters in their names and start all with gfdd. Only 4 are different and I wanted to thus start with all of those that start with gfdd (and change the other without a loop or anything).

    For example, the variable gfddai01 is of the type access (information that I have on the secondfile), and I want to rename it as "AC.gfddai01".

    I have started by creating a new variable on the secondfile that consists in the old variables names transformed the want I want it.

    Code:
    use secondfile3.dta, clear
    
    *Create this new variable
    replace Type = upper(Type)
    generate ty_var = substr(Type,1,2) + "." + Old_Variable
    replace Type = lower(Type)
    
    *Replace for the 4 last variables (otherwise other and other economic variables have the same starting name)
    replace Variable = "OE.ny_gdp_mktp_cd" if Variable == "OT.ny_gdp_mktp_cd"
    replace Variable = "OE.ny_gdp_pcap_kd" if Variable == "OT.ny_gdp_pcap_kd"
    replace Variable = "OE.ny_gnp_mktp_cd" if Variable == "OT.ny_gnp_mktp_cd"
    replace Variable = "OE.sp_pop_totl" if Variable == "OT.sp_pop_totl"
    
    *Reorder 
    order ty_var, b(Label)
    
    *Rename new_variable
    rename ty_var Variable
    
    *save secondfile
    save secondfile4
    Then, I do :

    Code:
    quietly levelsof Variable, local(Var_new_name)
    foreach y in local Var_new_name {
        local pref = substr("`y'",4,7)
        if "`pref'" == "gfdd" {
            local teest = "`y'"
        } 
    }
    
    *use firstfile
    use firstfile_new.dta, clear
    
    foreach x of local teest  {
        local old_var = substr("`x'",4,11)
        quietly ds, has(varlabel *"`old_var'")
        local varlist `r(varlist)'
        
        foreach var of local varlist {
        rename `var' `"`x'"'
        }
    }
    it doesn't give me error message, but it doesn't change my variable names. So I guess the problem comes from macros (that I still have a hard time using).

    Could you help please?

    Thanks
    Saloua

  • #2
    You cannot have periods in Stata variable names. Rather than showing us your code, you can use dataex to present a selection of the variable names in the two data sets and explain what you aim to achieve.

    Comment


    • #3
      Hello,
      First, thanks for your answer.
      Since its the first time I use dataex, don't hesitate to tell me if something is not clear or I've done something wrong/incomplete.

      My secondfile (dataset in which I have variable information) looks (partly) like this :

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str14(Type Old_Variable)
      "access"     "gfddai01"
      "access"     "gfddai02"
      "access"     "gfddai03"
      "access"     "gfddai04"
      My firstfile_new (main dataset) looks (partly) like this :

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str18 country int year str7(gfddai01 gfddai02) str4 gfddai03
      "Argentina" 2010 "673.256" "13.0147" "49.3"
      "Argentina" 2011 "726.028" "13.1224" "NA"  
      "Argentina" 2012 "806.48"  "13.2093" "NA"  
      "Argentina" 2013 "862.056" "13.2379" "NA"  
      "Argentina" 2014 "908.223" "13.223"  "NA"  
      "Argentina" 2015 "954.391" "13.2561" "NA"  
      "Argentina" 2016 "1066.76" "13.4477" "NA"  
      "Argentina" 2017 "1147.55" "13.5291" "42.4"
      "Australia" 2010 "NA"      "30.7473" "NA"  
      "Australia" 2011 "NA"      "30.4056" "NA"
      I simply want that the variables of the main dataset (firstfile_new) have the two first letters that indicate to me their type (in the secondfile, here it's access). So instead of having the variable name being gfddai01, it would be AC.gfddai01 (or any other allowed character between AC and the variable name).

      That is to say, I want my variable names to go from this :

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str14 Old_Variable
      "gfddai01"
      "gfddai02"
      "gfddai03"
      "gfddai04"
      "gfddai05"
      "gfddai06"
      "gfddai07"
      "gfddai08"
      "gfddai09"
      "gfddai10"
      "gfddai11"
      "gfddai12"
      "gfddai13"
      "gfddai14"
      "gfddai15"
      "gfddai16"
      "gfddai17"
      "gfddai18"
      "gfddai19"
      "gfddai20"
      "gfddai21"
      "gfddai22"
      "gfddai23"
      "gfddai24"
      "gfddai25"
      "gfddai26"
      "gfddai27"
      "gfddai28"
      "gfddai29"
      "gfddai30"
      "gfddai31"
      "gfddai32"
      "gfddai33"
      "gfddai34"
      "gfddai35"
      "gfddai36"
      "gfddam01"
      "gfddam02"
      "gfddam03"
      "gfddam04"
      "gfdddi01"
      "gfdddi02"
      "gfdddi03"
      "gfdddi04"
      "gfdddi05"
      "gfdddi06"
      "gfdddi07"
      "gfdddi08"
      "gfdddi09"
      "gfdddi10"
      "gfdddi11"
      "gfdddi12"
      "gfdddi13"
      "gfdddi14"
      "gfdddm01"
      "gfdddm02"
      "gfdddm03"
      "gfdddm04"
      "gfdddm05"
      "gfdddm06"
      "gfdddm07"
      "gfdddm08"
      "gfdddm09"
      "gfdddm10"
      "gfdddm11"
      "gfdddm12"
      "gfdddm13"
      "gfdddm14"
      "gfdddm15"
      "gfddei01"
      "gfddei02"
      "gfddei03"
      "gfddei04"
      "gfddei05"
      "gfddei06"
      "gfddei07"
      "gfddei08"
      "gfddei09"
      "gfddei10"
      "gfddem01"
      "gfddsi01"
      "gfddsi02"
      "gfddsi03"
      "gfddsi04"
      "gfddsi05"
      "gfddsi06"
      "gfddsi07"
      "gfddsm01"
      "gfddoi01"
      "gfddoi02"
      "gfddoi03"
      "gfddoi04"
      "gfddoi05"
      "gfddoi06"
      "gfddoi07"
      "gfddoi08"
      "gfddoi09"
      "gfddoi10"
      "gfddoi11"
      "gfddoi12"
      end
      To this :
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str17 Variable
      "AC.gfddai01"
      "AC.gfddai02"
      "AC.gfddai03"
      "AC.gfddai04"
      "AC.gfddai05"
      "AC.gfddai06"
      "AC.gfddai07"
      "AC.gfddai08"
      "AC.gfddai09"
      "AC.gfddai10"
      "AC.gfddai11"
      "AC.gfddai12"
      "AC.gfddai13"
      "AC.gfddai14"
      "AC.gfddai15"
      "AC.gfddai16"
      "AC.gfddai17"
      "AC.gfddai18"
      "AC.gfddai19"
      "AC.gfddai20"
      "AC.gfddai21"
      "AC.gfddai22"
      "AC.gfddai23"
      "AC.gfddai24"
      "AC.gfddai25"
      "AC.gfddai26"
      "AC.gfddai27"
      "AC.gfddai28"
      "AC.gfddai29"
      "AC.gfddai30"
      "AC.gfddai31"
      "AC.gfddai32"
      "AC.gfddai33"
      "AC.gfddai34"
      "AC.gfddai35"
      "AC.gfddai36"
      "AC.gfddam01"
      "AC.gfddam02"
      "AC.gfddam03"
      "AC.gfddam04"
      "DE.gfdddi01"
      "DE.gfdddi02"
      "DE.gfdddi03"
      "DE.gfdddi04"
      "DE.gfdddi05"
      "DE.gfdddi06"
      "DE.gfdddi07"
      "DE.gfdddi08"
      "DE.gfdddi09"
      "DE.gfdddi10"
      "DE.gfdddi11"
      "DE.gfdddi12"
      "DE.gfdddi13"
      "DE.gfdddi14"
      "DE.gfdddm01"
      "DE.gfdddm02"
      "DE.gfdddm03"
      "DE.gfdddm04"
      "DE.gfdddm05"
      "DE.gfdddm06"
      "DE.gfdddm07"
      "DE.gfdddm08"
      "DE.gfdddm09"
      "DE.gfdddm10"
      "DE.gfdddm11"
      "DE.gfdddm12"
      "DE.gfdddm13"
      "DE.gfdddm14"
      "DE.gfdddm15"
      "EF.gfddei01"
      "EF.gfddei02"
      "EF.gfddei03"
      "EF.gfddei04"
      "EF.gfddei05"
      "EF.gfddei06"
      "EF.gfddei07"
      "EF.gfddei08"
      "EF.gfddei09"
      "EF.gfddei10"
      "EF.gfddem01"
      "ST.gfddsi01"
      "ST.gfddsi02"
      "ST.gfddsi03"
      "ST.gfddsi04"
      "ST.gfddsi05"
      "ST.gfddsi06"
      "ST.gfddsi07"
      "ST.gfddsm01"
      "OT.gfddoi01"
      "OT.gfddoi02"
      "OT.gfddoi03"
      "OT.gfddoi04"
      "OT.gfddoi05"
      "OT.gfddoi06"
      "OT.gfddoi07"
      "OT.gfddoi08"
      "OT.gfddoi09"
      "OT.gfddoi10"
      "OT.gfddoi11"
      "OT.gfddoi12"
      end

      Comment


      • #4
        I could store the names in a local macro and then rename, but in this case I think that the simplest code is

        Code:
        rename gfddai* AC_gfddai*
        rename gfddam* AC_gfddam*
        and so on.

        Comment


        • #5
          The problem is that I don't know the first 2 characters of each variable, related to the type.
          Here it was AC because the type was access, but they are 5 or 6 other types (DE, ST, OT ...) so I don't think I can do it this way.

          Comment


          • #6
            OK, here is another way. Make sure that the number of observations in the first dataset is greater than the number of variables before running the code.

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str18 country int year str7(gfddai01 gfddai02) str4 gfdddi01
            "Argentina" 2010 "673.256" "13.0147" "49.3"
            "Argentina" 2011 "726.028" "13.1224" "NA"  
            "Argentina" 2012 "806.48"  "13.2093" "NA"  
            "Argentina" 2013 "862.056" "13.2379" "NA"  
            "Argentina" 2014 "908.223" "13.223"  "NA"  
            "Argentina" 2015 "954.391" "13.2561" "NA"  
            "Argentina" 2016 "1066.76" "13.4477" "NA"  
            "Argentina" 2017 "1147.55" "13.5291" "42.4"
            "Australia" 2010 "NA"      "30.7473" "NA"  
            "Australia" 201
            end
            
            gen vars=""
            local i=1
            foreach var of varlist g*{
            replace vars= "`var'" in `i'
            local ++i
            }
            preserve
            
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str14(Type Old_Variable)
            "access"     "gfddai01"
            "access"     "gfddai02"
            "access"     "gfddai03"
            "access"     "gfddai04"
            "access"     "gfddai04"
            "de"         "gfdddi01"
            end
            
            tempfile type
            rename Old_Variable vars
            contract Type vars
            save `type'
            restore, preserve
            keep vars
            keep if !missing(vars)
            
            merge 1:1 vars using `type'
            drop if _merge==2
            drop _merge
            gen list= upper(substr(Type, 1,2))+"_"+vars
            forval i= 1/`=_N'{
            local allvars `allvars' `=list[`i']'
            }
            restore
            rename (g*) (`allvars')
            Res.:

            Code:
             l in 1/5, abb(20)
            
                 +-----------------------------------------------------------------------+
                 |   country   year   AC_gfddai01   AC_gfddai02   DE_gfdddi01       vars |
                 |-----------------------------------------------------------------------|
              1. | Argentina   2010       673.256       13.0147          49.3   gfddai01 |
              2. | Argentina   2011       726.028       13.1224            NA   gfddai02 |
              3. | Argentina   2012        806.48       13.2093            NA   gfdddi01 |
              4. | Argentina   2013       862.056       13.2379            NA            |
              5. | Argentina   2014       908.223        13.223            NA            |
                 +-----------------------------------------------------------------------+

            Comment


            • #7
              It indeed works
              However, could you explain to me the methodology you followed? And why my code is not working? I'm not sure I quite understand what you did.
              Also, does it mean that the variables that you put in vars have to be in the same order than on the main dataset?
              Thank you very much for your help.

              Saloua

              Comment


              • #8
                does it mean that the variables that you put in vars have to be in the same order than on the main dataset?
                Yes, rename group crucially relies on variable order, so any code with levelsof messes up the order unless you take steps to preserve the order. I have not carefully studied your code but from a first glance, it appears that you do not consider order. As far as mine goes, I extract the variable names in the original dataset and store them as observations in the order that they occur. Since each variable of interest begins with the letter "g", I have

                Code:
                gen vars=""
                local i=1
                foreach var of varlist g*{
                replace vars= "`var'" in `i'
                local ++i
                }
                Then I will merge these variable names with those from the second dataset, keeping matches and unmatched observations in the first dataset only. Then I specify your name condition, i.e., the first two letters of variable "Type", capitalized, should be inserted at the beginning of each variable name.

                Code:
                gen list= upper(substr(Type, 1,2))+"_"+vars
                Once I have this, I can now hold the renamed variables in a local macro, and I do this again preserving their order

                Code:
                forval i= 1/`=_N'{
                local allvars `allvars' `=list[`i']'
                }
                Finally I can rename the group of variables since the order of the first variables corresponds to the order of the renamed variables.

                Code:
                rename (g*) (`allvars')
                Last edited by Andrew Musau; 06 Feb 2020, 09:19.

                Comment


                • #9
                  Ok I see, it's much more clear with the explanation.
                  Thank you very much for your help!
                  Best
                  Saloua

                  Comment

                  Working...
                  X