Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Recode: 0 changes made

    I'm trying to recode a variable but each time I do it, 0 changes are made.

    The variable in question, w6867, was originally a string variable with str6 storage and %6s display. I first encoded and recast it:
    encode w6867, gen(w6867m1m)
    recast int w6867m1m

    Then, when I recode, zero changes are made. Here is the command:
    recode w6867m1m (149=101) (151.9=101) (153.9=101) (155.2=101) (157.9=101) (162.9=101) (164.1=101) (170.9=101) (174.9=101) (183=101) (185=101) (191.9=101) (199.1=101) (202.8=101) (208=101)

    Thanks in advance.

  • #2
    Please add an extract of your data using the dataex command.

    Comment


    • #3
      I agree with Hemanshu Kumar - but also, you cannot use non-integer values in your recode command anyway; more information on what you re (trying) to do would also help; see, in addition to the FAQ,:
      Code:
      h recode

      Comment


      • #4
        Also, I get the feeling you are misunderstanding the purpose of -encode-. My guess is you actually want to do
        Code:
        gen w6867m1m = real(w6867)
        replace w6867m1m = 101 if inrange(w6867m1m, 149, 208)
        See also
        Code:
        help encode
        which includes the line
        Do not use encode if varname contains numbers that merely happen to be stored as strings; instead, use generate newvar = real(varname) or destring; see real() or [D] destring.
        Last edited by Hemanshu Kumar; 19 Jun 2023, 13:49.

        Comment


        • #5
          Let me also elaborate what is actually happening in what you are doing.

          Say we start with the data
          Code:
          clear
          input str6 w6867
          "149"
          "151.9"
          "153.9"
          "155.2"
          "157.9"
          "162.9"
          end
          Now when you -encode- these strings, Stata actually creates a variable that is just integers -- starting from 1 for the lowest value in the data, here 149, 2 for 151.9, etc. 149 is merely a label attached to the number 1, 151.9 is the label for the number 2, and so on. The storage type might be long or something else, depending on your Stata settings. In the next step, when you -recast- this variable to int, it changes the data type, but makes no practical difference. 149 is still being stored as 1, 151.9 as 2, and so on.

          You can verify this as follows:

          Code:
          . encode w6867, gen(w6867m1m)
          . recast int w6867m1m
          
          . list, noobs nolabel sep(0)
          
            +------------------+
            | w6867   w6867m1m |
            |------------------|
            |   149          1 |
            | 151.9          2 |
            | 153.9          3 |
            | 155.2          4 |
            | 157.9          5 |
            | 162.9          6 |
            +------------------+
          where the -list- command has been asked to show the variables without their labels. So now you can see why your recode doesn't change any values!
          Last edited by Hemanshu Kumar; 19 Jun 2023, 14:05.

          Comment


          • #6

            Hermanshu,

            I am beyond grateful for all your help, thank you so much. You were right, I didn't understand what encoding was actually doing and it wasn't what I needed to do. The destring command doesn't work with this variable but I was FINALLY able to recode with generate newvar = real(varname).

            I'm doing dissertation research and am a complete beginner at Stata, so I appreciate you walking me through this issue.

            Comment


            • #7
              The destring command doesn't work with this variable but I was FINALLY able to recode with generate newvar = real(varname).
              RED FLAG. You may be losing data. -destring- is, at bottom, a wrapper for -gen newvar = real(varname)-. The difference is that -destring- checks for values of the string variable that are not actually translatable to numbers, or can only be translated with loss of information. So you should go back and run:
              Code:
              list varname if missing(real(varname)) & !missing(varname)
              This will show you the values of varname that -destring -is rejecting. Inspect that list carefully. Some of them may be perfectly good looking numbers, they just will lose some precision (typically one decimal place) if converted. Those are not really a problem for you. But some of them may be malformed numbers, things like "3.4.5" Is that supposed to be 3.45 or 34.5? Some of them may contain non-numeric content altogether, like "2.5w." What is that supposed to be? Maybe it'a typo for 2.52 or 2.53, but who knows? You should do whatever you can do find out what the correct values really are for these and fix the data here as well.

              Wherever you find a malformed number or non-numeric content you need to figure out what it is supposed to actually be and fix the error in the data.

              After you have fixed those, run -list varname if missing(real(varname)) & !missing(varname)- again. Verify that the results you see now consist entirely of things that you cannot fix into actual numbers (or, if you are lucky, you see no results of this at all.) At that point, -gen newvar = real(varname)- will only lose data that is invalid and irretrievable anyway.
              Last edited by Clyde Schechter; 19 Jun 2023, 16:38.

              Comment


              • #8
                Hi Clyde,

                Thanks so much for jumping in when you did! Your code was very useful and indeed caught some issues. These are the values the -destring- command is rejecting:

                w6867

                1763. E819.9
                7282. E958.9
                7838. E819.9
                8293. E958.9
                8915. E819.9
                9214. E899

                These codes are death codes according to the ICD-9, so they are categorical. Upon your suggestion, I inspected my data and learned that after I initially ran -gen newvar = real(varname)-, that these codes disappeared in my newvar. So what I did was replace the "E" with a "1":

                replace w6867 = "1819.9" in 1763
                replace w6867 = "1899" in 9214
                replace w6867 = "1819.9" in 7838
                replace w6867 = "1819.9" in 8915
                replace w6867 = "1958.9" in 7282
                replace w6867 = "1958.9" in 8293

                After doing this, I reran your code and nothing popped up. Then, the initial -gen newvar = real(varname)- ran perfectly with no missing values. Next, I was able to make actual changes with the recode command:

                recode w6867m1m (250 250.1 = 141) (263.9 = 144) (332 = 163) (344.81 = 115) (348.1 = 169) (410.9 414.9 425.4 427.5 428 428.9 429.9 436 437.3 441.9 442.9 444.9 746.9 746.89= 121) (415.1 458.9 459 = 129) (486 = 133) (492.8 518.89 = 139) (557 557.9 569.5 = 151) (571.5 573.8 = 152) (577.9 = 156) (586 = 153) (710 = 119) (781.9 789.1 790.2 799.8 = 997) (959.9 994.1 996.62 997.1 998 1899 1819.9 1910.9 1919.9 1958.9 = 194)

                w6867m1m: 104 changes made

                Lastly, I applied the label for the death codes I'm using:
                label values w6867m1m w6867m2m w6867m3m w6867m4m CauseofDeath

                Again, thanks so much for your good eye and insight, Clyde. Life saver!

                Lauren



                Comment


                • #9
                  This still seems confused and roundabout to me. What exactly do you want to do and why?

                  Your original string variable can be encoded directly so that obvious string codes like "E899" and the numeric-looking codes such as "151.9" alike are mapped to integers and the original codes become value labels. So, if you need a numeric version of the variable, that is a safe one-step mapping.

                  If you want to coarsen the classification, then recode of that encoded variable is one way to proceed, but the syntax will not involve any non-integers.

                  Comment


                  • #10
                    Hi Nick,

                    Thanks for your input. To answer your questions, I needed to recode the ICD-9 codes of a merged variable to the death codes of my main dataset. I need to do this so I have uniform death codes and can perform cox models later. .

                    Comment


                    • #11
                      Thanks for the detail which would have been invaluable in #1. If the problem is a string variable in one dataset and a numeric variable with value labels in another I would have tried a decode in the second case.

                      Comment


                      • #12
                        Thanks Nick, that is very useful to know.

                        Comment

                        Working...
                        X