Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • If argument for the name of the variable

    Hi,

    I have thousands of stata datasets and I want to create some harmony between them.
    The first step for me is : if the name of the first variable is "v1", I want to change the name of all of the variables equal to their first observation. Like below:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str32(v1 v2) str19 v3 str43 v4 str22 v5 str5 v6 str16 v7
    `""Personal identity number""' "Name" "0711 Theory 10.0 hp" "0712 Laboratory Work and Assignments 5.0 hp" "% for the whole course" "Grade" "Examination date"
    "1"                                  "A"    "VG"                  "G"                                           "93"                     "VG"    "2023-03-21"      
    "2"                                  "S"    "VG"                  "G"                                           "80"                     "VG"    "2023-03-21"      
    "3"                                  "D"    "G"                   "G"                                           "75,5"                   "G"     "2023-03-21"      
    "4"                                  "R"    "VG"                  "G"                                           "90,5"                   "VG"    "2023-03-21"      
    "5"                                  "B"    "VG"                  "G"                                           "80"                     "VG"    "2023-03-21"      
    "6"                                  "Y"    "VG"                  "G"                                           "82,5"                   "VG"    "2023-03-21"      
    "7"                                  "J"    "VG"                  "G"                                           "84,8"                   "VG"    "2023-03-21"      
    "8"                                  "N"    "VG"                  "G"                                           "97,8"                   "VG"    "2023-03-21"      
    "9"                                  "F"    "VG"                  "G"                                           "87,8"                   "VG"    "2023-03-21"      
    "10"                                 "T"    "VG"                  "G"                                           "85,8"                   "VG"    "2023-03-21"      
    
    end
    ------------------ copy up to and including the previous line ------------------



    It is probably easier than what I think but how can I do that?
    Last edited by Neg Kha; 26 Dec 2023, 09:08.

  • #2
    My guess is that there is no solution that will work for thousands of files unless they are all pretty similar to each other. But the following will handle most situations:
    Code:
    ds _all
    local all_vars `r(varlist)'
    if "`:word 1 of `all_vars''" == "v1" {
        foreach v of varlist `all_vars' {
            rename `v' `=substr(strtoname(`v'[1]), 1, 32)'
        }
        drop in 1
    }
    The problem that you are likely to run into is that the contents of the first observations are, in some situations, not going to be valid as variable names. In fact, in your own example, several of them are not legal variable names. The -substr()- and -strtoname()- functions as used in the above code will convert them to legal, if not particularly handy, variable names. But if you apply this to thousands of data sets, I suspect you will stumble over some where two different values, after being laundered by -substr(strtoname())-, end up as the same candidate name. This would arise, for example, if a data set has two variables where the entries in the first observation are identical for their first 32 characters. In that situation, -rename- will refuse to deal with the second one because you can't have two variables with the same name in a valid Stata data set, and the code will break there. But apart from that situation, this will handle things.

    Comment

    Working...
    X