Hello Stata users,
I'm a bit of a novice with data management, and am trying to solve a data management problem. My sense is that there could be an obvious answer, but I haven't been able to find it in the documentation or in my many searches in the archives. I would very much appreciate any advice you can offer.
I have a data set given to me in an Excel spreadsheet that I have imported into Stata (467 observations, 213 variables). Nearly all the variables contain numeric values, but they are all read by Stata as string variables because whoever entered the data into excel included "%" symbols for some values, and for others entered fractions. So, for example, var1 could have a value of "7.7%" for observation1 and "3 / 7" for observation2 (including the spaces on either side of the fraction sign). Each variable has a mix of %s and fractions, and each observation has a mix of %s and fractions. I understand that if the only problem was the "%" symbols, I could address this using --destring-- as in:
destring var1 , generate(var2) ignore("%")
But with the fractions thrown into the mix, I'm not sure what to do. I'm desperately hoping there is someway to handle this in Stata that I'm just not yet aware of (I'm afraid of the vast potential for introducing error if I attempt to address this myself in Excel). Is there perhaps a command that can tell Stata to read a fraction in a string variable as an expression and generate a new variable with the numeric value of the expression? Some other more creative solution I haven't thought of?
Thank you in advance for any advice you can offer,
Kindly,
Elizabeth
I'm a bit of a novice with data management, and am trying to solve a data management problem. My sense is that there could be an obvious answer, but I haven't been able to find it in the documentation or in my many searches in the archives. I would very much appreciate any advice you can offer.
I have a data set given to me in an Excel spreadsheet that I have imported into Stata (467 observations, 213 variables). Nearly all the variables contain numeric values, but they are all read by Stata as string variables because whoever entered the data into excel included "%" symbols for some values, and for others entered fractions. So, for example, var1 could have a value of "7.7%" for observation1 and "3 / 7" for observation2 (including the spaces on either side of the fraction sign). Each variable has a mix of %s and fractions, and each observation has a mix of %s and fractions. I understand that if the only problem was the "%" symbols, I could address this using --destring-- as in:
destring var1 , generate(var2) ignore("%")
But with the fractions thrown into the mix, I'm not sure what to do. I'm desperately hoping there is someway to handle this in Stata that I'm just not yet aware of (I'm afraid of the vast potential for introducing error if I attempt to address this myself in Excel). Is there perhaps a command that can tell Stata to read a fraction in a string variable as an expression and generate a new variable with the numeric value of the expression? Some other more creative solution I haven't thought of?
Thank you in advance for any advice you can offer,
Kindly,
Elizabeth

Comment