Hi all! I am importing a large dataset from Excel with many variable names that are over 32 characters. Obviously these are truncated to 32 characters upon import; however a large series of these share the first 32 characters in common. As such I am left with a number of variables whose names are the column number from excel, but whose label's contain the full name.
Example:
I've been trying to write code that selects variables with names greater than 32 characters, generates a variable name from the label, truncates that name to 30 characters then adds a sequential two digit number at the end to produce the following result:
Ideally these numbers would restart at 01 for each common 30 character stem.
I have limited coding experience in languages other than STATA so I've been struggling for several hours writing the following clunky code:
gen k = 0
foreach var of varlist _all {
local label : variable label `var'
local new_name = lower(substr(strtoname("`label'"), 1, 30))
if strlen(`new_name') = 30 {
replace k = k + 1
tostring k, replace
local num_name = `new_name' + k
rename `var' `num_name'
destring k, replace
}
else {
rename `var' `new_name'
replace k = 0
}
I also tried using addnumber but the way I've written it each variable gets a "1" added, instead of sequential numbers.
foreach var of varlist _all {
local label : variable label `var'
local new_name = lower(substr(strtoname("`label'"), 1, 30))
rename `var' `new_name'#, addnumber
}
I would love any advice on writing this code or tips on how others have handled a similar issue.
Thanks so much!
Mark
Example:
Name | Label |
PayorGovernmentHealthInsura | Payor - Government Health Insurance - A |
AJ | Payor - Government Health Insurance - B |
AK | Payor - Government Health Insurance - C |
AL | Payor - Government Health Insurance - D |
AM | Payor - Government Health Insurance - E |
AN | Payor - Government Health Insurance - F |
AO | Payor - Government Health Insurance - G |
AP | Payor - Government Health Insurance - H |
AQ | Payor - Government Health Insurance - I |
I've been trying to write code that selects variables with names greater than 32 characters, generates a variable name from the label, truncates that name to 30 characters then adds a sequential two digit number at the end to produce the following result:
Name | Label |
PayorGovernmentHealthInsu01 | Payor - Government Health Insurance - A |
PayorGovernmentHealthInsu02 | Payor - Government Health Insurance - B |
PayorGovernmentHealthInsu03 | Payor - Government Health Insurance - C |
PayorGovernmentHealthInsu04 | Payor - Government Health Insurance - D |
PayorGovernmentHealthInsu05 | Payor - Government Health Insurance - E |
PayorGovernmentHealthInsu06 | Payor - Government Health Insurance - F |
PayorGovernmentHealthInsu07 | Payor - Government Health Insurance - G |
PayorGovernmentHealthInsu08 | Payor - Government Health Insurance - H |
PayorGovernmentHealthInsu09 | Payor - Government Health Insurance - I |
Ideally these numbers would restart at 01 for each common 30 character stem.
I have limited coding experience in languages other than STATA so I've been struggling for several hours writing the following clunky code:
gen k = 0
foreach var of varlist _all {
local label : variable label `var'
local new_name = lower(substr(strtoname("`label'"), 1, 30))
if strlen(`new_name') = 30 {
replace k = k + 1
tostring k, replace
local num_name = `new_name' + k
rename `var' `num_name'
destring k, replace
}
else {
rename `var' `new_name'
replace k = 0
}
I also tried using addnumber but the way I've written it each variable gets a "1" added, instead of sequential numbers.
foreach var of varlist _all {
local label : variable label `var'
local new_name = lower(substr(strtoname("`label'"), 1, 30))
rename `var' `new_name'#, addnumber
}
I would love any advice on writing this code or tips on how others have handled a similar issue.
Thanks so much!
Mark
Comment