Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Use list of variables in dta file to rename variable in other dataset

    Hi,

    In the current project I'm working on, I keep updating a list of variables. I would like to use the list of variables I create in 1 dta file to rename the variables in a second dta file. The number of variables changes as I progress in the project.

    Here is the list of variables I have in the 1st file:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str8 DATATYPE
    "400E"    
    "897E"    
    "BETA"    
    "DY"      
    "IBNOSH"  
    "MSCA"    
    "MTBV"    
    "MV"      
    "P"       
    "PI"      
    "PTBV"    
    "RI"      
    "VO"      
    "VOL"     
    "WC01001" 
    "WC01151" 
    "WC01551" 
    "WC02001" 
    "WC02005" 
    "WC02201" 
    "WC02999" 
    "WC03051" 
    "WC03051A"
    "WC03061" 
    "WC03062" 
    "WC03063" 
    "WC03101" 
    "WC03151" 
    "WC03251" 
    "WC03255" 
    "WC04049" 
    "WC04821" 
    "WC04828" 
    "WC04900" 
    "WC05301" 
    "WC05302" 
    "WC05476" 
    "WC05491" 
    "WC05508" 
    "WC06615" 
    "WC06699" 
    "WC08311" 
    "WC08326" 
    "WC08621" 
    "WC08806" 
    "WC09802" 
    "WC18226" 
    end
    I would like to rename the variables in a second file using the 1st file. Currently, I am doing it manually with a copy paste. I would like to automate the process. In the following example, I could not include the 47 variables data1-data47 because dataex would not let me, so I included up to data10:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long unit_id int time double(data1 data2 data3 data4 data5 data6 data7 data8 data9 data10)
    1 1 .2647  .9581 . 1.16 62.144 . 2.21 11821.59   190.23  772.5
    1 2 .2648 1.0914 . 1.16 62.199 .  2.2 11759.45   189.23  768.4
    1 3 .2509 1.0191 .  .97 62.501 . 2.64 14152.11   227.53    924
    1 4 .2686  1.081 .  .83 62.501 . 3.08 16487.64    263.8 1071.3
    1 5 .2672 1.0706 .  .79 62.501 . 3.27 17477.64 279.6399 1135.6
    end
    label values unit_id unit_id
    label def unit_id 1 "130062", modify
    in the 1st list, 1 variable (_400E) correspond to 1st "data" variable (data1) of the second dataset. I am currently running the following code, but this is manual copy/paste update:
    Code:
    ui ds
    loc lastvar: word `c(k)' of `r(varlist)'
    rename (data1-`lastvar') (_400E    _897E    BETA    DY    IBNOSH    MSCA    MTBV    MV    P    PI    PTBV    RI    VO    VOL    WC01001    WC01151    WC01551    WC02001    WC02005    WC02201    WC02999    WC03051    WC03051A    WC03061    WC03062    WC03063    WC03101    WC03151    WC03251    WC03255    WC04049    WC04821    WC04828    WC04900    WC05301    WC05302    WC05476    WC05491    WC05508    WC06615    WC06699    WC08311    WC08326    WC08621    WC08806    WC09802    WC18226 WC08621 WC08806 WC09802 WC18226)
    Can someone help? Thanks!







  • #2
    If you loop over the observations in the first data set, and add to each value to a macro, then input the second data set, you can have the variable list you need in the macro from the first data set. I would be a little concerned that you might end up misnaming some variables if you're not extremely careful.

    Code:
    local names
    des 
    local numob=r(N)
    set trace on
    forvalues i=1/`numob' {
        local name = DATATYPE[`i']
        local names `names' `name'
    }
    di "`names'"
    Read and the new data, and replace your list of names with this macro.

    Comment


    • #3
      Thanks a lot, it really helps! How would you adapt the code if some of the variables in the 1st file (for instance the 1st and 2nd variables in my previous example) start with a number? So far, I have manually added "_" when I rename then. Please help. Thanks, again!

      Comment


      • #4
        For instance, would it be possible to rename variables in the first file "if the first character is a number, then add "_" before the number". I'm not really sure how to code it in Stata. Any thought? Thanks.

        Comment


        • #5
          If you are creating the initial file, why not just create it in a way that the values in DATATYPE are legal variable names? Otherwise, you could use the following to capture any values of DATATYPE that start with a number to add the _.
          Code:
          replace DATATYPE = "_" + DATATYPE if regexm(DATATYPE,"[0-9]")

          Comment


          • #6
            Great idea, thanks Lance!

            And thanks a lot Phil for your help too, it works now!

            Comment


            • #7
              Lance,

              Actually, your code replaces all strings that contain some numbers in the name. I want to replace only those that start with a number. How can I modify your code to achieve this goal? Thanks.

              Comment


              • #8
                Sorry about that. I think this works.
                Code:
                replace DATATYPE = "_" + DATATYPE if regexm(DATATYPE,"^[0-9]")

                Comment

                Working...
                X