Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixing mismatched variable with string and numeric data - converting it into variable with only numeric data

    Dear all,

    I have a categorical variable from a dataset that contains both numeric and string data:
    Variable: bmi, the categories are from 1 to 3 and the 4th category is "NA". How can I manage it that NA is recoded into 4, so that I don't get error messages anymore? the Recode or encode command did not work so far

    Many thanks,
    Lena


  • #2
    Code:
    replace bmi = "4" if bmi=="NA"
    destring bmi, replace
    Whatever you do, encode is not the correct command to use.

    Comment


    • #3
      Code:
      gen better = real(bmi)
      should work too. The "NA" values would get mapped to missing, which seems about right to me.

      Comment


      • #4
        Thank you so much,

        Williams solution with command
        replace bmi = "4" if bmi=="NA" did the trick Nick's command worked also but got rid of NA completely and turned it into missing

        Comment


        • #5
          Indeed. That was exactly the point of my proposal and explained as a consequence,

          If you want to see a value label "NA" in results then go

          Code:
          replace better = .a if better == . 
          label def better .a "NA"
          label val better better
          Or start again:

          Code:
          gen better = cond(bmi == "NA", .a, real(bmi))
          and define and apply a value label as just done.

          Comment


          • #6
            Thank you very much for clarifying this Nick, I just tried this and it worked really well.
            Many thanks!

            Comment


            • #7
              I was called away when writing post #2 and for the benefit of anyone finding this topic later (perhaps searching for "NA") want to elaborate.

              The encode command is designed for assigning numerical codes to non-numeric strings like "France", "Germany", "United States". The output of help encode instructs us

              Do not use encode if varname contains numbers that merely happen to be stored as strings; instead, use generate newvar = real(varname) or destring; see real() or [D] destring.
              Consider the following example.
              Code:
              . destring string, generate(right)
              string: all characters numeric; right generated as byte
              
              . encode string, generate(wrong)
              
              . // wrong looks correct
              . list, clean 
              
                     string   right   wrong  
                1.        1       1       1  
                2.        2       2       2  
                3.        3       3       3  
                4.       11      11      11  
                5.       12      12      12  
                6.       13      13      13  
              
              . // but the actual values encoded are incorrect
              . label list wrong
              wrong:
                         1 1
                         2 11
                         3 12
                         4 13
                         5 2
                         6 3
              
              . list, clean nolabel
              
                     string   right   wrong  
                1.        1       1       1  
                2.        2       2       5  
                3.        3       3       6  
                4.       11      11       2  
                5.       12      12       3  
                6.       13      13       4  
              
              .

              Comment

              Working...
              X