Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with changing variable format

    Hello Stata users,
    I'm a bit of a novice with data management, and am trying to solve a data management problem. My sense is that there could be an obvious answer, but I haven't been able to find it in the documentation or in my many searches in the archives. I would very much appreciate any advice you can offer.

    I have a data set given to me in an Excel spreadsheet that I have imported into Stata (467 observations, 213 variables). Nearly all the variables contain numeric values, but they are all read by Stata as string variables because whoever entered the data into excel included "%" symbols for some values, and for others entered fractions. So, for example, var1 could have a value of "7.7%" for observation1 and "3 / 7" for observation2 (including the spaces on either side of the fraction sign). Each variable has a mix of %s and fractions, and each observation has a mix of %s and fractions. I understand that if the only problem was the "%" symbols, I could address this using --destring-- as in:
    destring var1 , generate(var2) ignore("%")

    But with the fractions thrown into the mix, I'm not sure what to do. I'm desperately hoping there is someway to handle this in Stata that I'm just not yet aware of (I'm afraid of the vast potential for introducing error if I attempt to address this myself in Excel). Is there perhaps a command that can tell Stata to read a fraction in a string variable as an expression and generate a new variable with the numeric value of the expression? Some other more creative solution I haven't thought of?

    Thank you in advance for any advice you can offer,
    Kindly,
    Elizabeth

  • #2
    Code:
    clear
    * Generate example data
    input str7 varstr
    "5/2"
    "3.3%"
    "20%"
    "9/18"
    end
    
    * First get the clean values that are not fractions or % in var1
    destring varstr, gen(var1) force
    
    * Now get rid of the %, store in var2
    gen var2=subinstr(varstr,"%","",.)
    destring var2, replace force
    replace var2=var2/100
    * Now get rid of the fraction, store in var3
    gen var3num=regexs(1) if regexm(varstr,"([0-9]+)(\/)([0-9]+)")
    gen var3den=regexs(3) if regexm(varstr,"([0-9]+)(\/)([0-9]+)")
    destring var3num, replace
    destring var3den, replace
    gen var3=var3num/var3den
    
    * Combine in var
    gen var=var1
    replace var=var2 if mi(var)
    replace var=var3 if mi(var)
    Jorge Eduardo Pérez Pérez
    www.jorgeperezperez.com

    Comment


    • #3
      Wow, it would have taken me years to figure that out on my own (not an exaggeration!) ~ Thanks so much for your help!

      Comment

      Working...
      X