Help with changing variable format

Elizabeth Duthie

Join Date: Apr 2014

Posts: 5
#1

Help with changing variable format

15 Jul 2015, 12:54

Hello Stata users,
I'm a bit of a novice with data management, and am trying to solve a data management problem. My sense is that there could be an obvious answer, but I haven't been able to find it in the documentation or in my many searches in the archives. I would very much appreciate any advice you can offer.

I have a data set given to me in an Excel spreadsheet that I have imported into Stata (467 observations, 213 variables). Nearly all the variables contain numeric values, but they are all read by Stata as string variables because whoever entered the data into excel included "%" symbols for some values, and for others entered fractions. So, for example, var1 could have a value of "7.7%" for observation1 and "3 / 7" for observation2 (including the spaces on either side of the fraction sign). Each variable has a mix of %s and fractions, and each observation has a mix of %s and fractions. I understand that if the only problem was the "%" symbols, I could address this using --destring-- as in:
destring var1 , generate(var2) ignore("%")

But with the fractions thrown into the mix, I'm not sure what to do. I'm desperately hoping there is someway to handle this in Stata that I'm just not yet aware of (I'm afraid of the vast potential for introducing error if I attempt to address this myself in Excel). Is there perhaps a command that can tell Stata to read a fraction in a string variable as an expression and generate a new variable with the numeric value of the expression? Some other more creative solution I haven't thought of?

Thank you in advance for any advice you can offer,
Kindly,
Elizabeth
Tags: None

Jorge Eduardo Perez Perez

Join Date: Mar 2014
Posts: 430

15 Jul 2015, 13:11

Code:

clear
* Generate example data
input str7 varstr
"5/2"
"3.3%"
"20%"
"9/18"
end

* First get the clean values that are not fractions or % in var1
destring varstr, gen(var1) force

* Now get rid of the %, store in var2
gen var2=subinstr(varstr,"%","",.)
destring var2, replace force
replace var2=var2/100
* Now get rid of the fraction, store in var3
gen var3num=regexs(1) if regexm(varstr,"([0-9]+)(\/)([0-9]+)")
gen var3den=regexs(3) if regexm(varstr,"([0-9]+)(\/)([0-9]+)")
destring var3num, replace
destring var3den, replace
gen var3=var3num/var3den

* Combine in var
gen var=var1
replace var=var2 if mi(var)
replace var=var3 if mi(var)

Jorge Eduardo Pérez Pérez
www.jorgeperezperez.com

Comment

Elizabeth Duthie

Join Date: Apr 2014

Posts: 5
#3

15 Jul 2015, 13:58

Wow, it would have taken me years to figure that out on my own (not an exaggeration!) ~ Thanks so much for your help!
Comment

Announcement

Help with changing variable format

Comment

Comment