Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Transform string variable into numeric variable

    Hi, I have done this task before using "other variables transformation commands - convert variables from string to numeric". This time didn't work. I believe it's because my variable ESTADIAG contains some observations stored as numbers (only numbers) but others stored as letters (only letters). For example, I have thousands of obs stored as either "I", "B", "A" or "Z".

    I would like to transform all "I", "B", "A" or "Z" into 99 and then have this variable ESTADIAG stored as numeric so I can actually run my analysis!

    any insights?

    thanks very much!

  • #2
    Well, I direct and somewhat minimal solution to your question would be:
    Code:
    foreach x in I B A Z {
        replace ESTDIAG = "99" if ESTDIAG == "`x'"
    }
    destring ESTDIAG, replace
    But, this may not be the best thing to do. Are the numeric values that ESTDIAG takes on integers? And why do you want to use 99, in particular, to replace these non-numeric values? Are you using 99 as a numeric code for missing value--that's usually a bad idea in Stata. So if you could explain more about this problem, a more effective way to organize this data might be available.

    Comment


    • #3
      Sure Clyde! Thanks for answering. This is how my variable is presented.

      ESTADIAG | Freq. Percent Cum.
      ------------+-----------------------------------
      0 | 46,473 17.37 17.37
      1 | 28,822 10.78 28.15
      2 | 44,074 16.48 44.63
      3 | 55,968 20.92 65.55
      4 | 13,379 5.00 70.55
      8 | 19,273 7.21 77.76
      9 | 45,026 16.83 94.59
      A | 4 0.00 94.59
      B | 12 0.00 94.60
      I | 8 0.00 94.60
      Z | 14,440 5.40 100.00
      ------------+-----------------------------------
      Total | 267,479 100.00


      ESTADIAG str1 %9s

      I believe they are all stored as string.

      I would like to drop if ESTADIAG=0 (but this command didn't work)

      then ESTADIAG 1 should be = 0

      the ESTADIAG 2,3 and 4 should be = 1

      all 8,9 and the letters A, B, I and Z should be coded as missing values.

      Answering your questions:

      "And why do you want to use 99, in particular, to replace these non-numeric values?" yes that was my idea.

      "Are you using 99 as a numeric code for missing value--that's usually a bad idea in Stata." I know, but since I wanna run a descriptive and the analytical analysis of the missing values, comparing them with the rest of data in terms of age and location...I thought of using 99 for that instead of "."

      So if you could explain more about this problem, a more effective way to organize this data might be available. I hope I could get myself more clear
      Last edited by Mrs Pecego; 14 Dec 2019, 15:04.

      Comment


      • #4
        Sorry!! when I clicked "post" it lost the proper format.


        Comment


        • #5
          This example may do what you need.
          Code:
          cls
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str1 ED_s
          "0"
          "1"
          "2"
          "3"
          "4"
          "8"
          "9"
          "A"
          "B"
          "I"
          "Z"
          end
          
          destring ED_s, generate(ED_n) ignore(8 9 A I B Z)
          drop if ED_n==0
          recode ED_n (1=0) (2/4=1), generate(ED_r)
          list, clean
          Code:
          . destring ED_s, generate(ED_n) ignore(8 9 A I B Z)
          ED_s: characters 8 9 A I B Z removed; ED_n generated as byte
          (6 missing values generated)
          
          . drop if ED_n==0
          (1 observation deleted)
          
          . recode ED_n (1=0) (2/4=1), generate(ED_r)
          (4 differences between ED_n and ED_r)
          
          . list, clean 
          
                 ED_s   ED_n   ED_r  
            1.      1      1      0  
            2.      2      2      1  
            3.      3      3      1  
            4.      4      4      1  
            5.      8      .      .  
            6.      9      .      .  
            7.      A      .      .  
            8.      B      .      .  
            9.      I      .      .  
           10.      Z      .      .

          Comment


          • #6
            Thank you very much.
            it worked!
            So, the command ignore (8 9 A I B Z) transformed all of them into true missing values, right? Yep,just checked. Thank you , helped a lot.
            Last edited by Mrs Pecego; 14 Dec 2019, 15:52.

            Comment

            Working...
            X