Recode: 0 changes made

Lauren Hunter

Join Date: Jun 2023

Posts: 5
#1

Recode: 0 changes made

19 Jun 2023, 13:26

I'm trying to recode a variable but each time I do it, 0 changes are made.

The variable in question, w6867, was originally a string variable with str6 storage and %6s display. I first encoded and recast it:
encode w6867, gen(w6867m1m)
recast int w6867m1m

Then, when I recode, zero changes are made. Here is the command:
recode w6867m1m (149=101) (151.9=101) (153.9=101) (155.2=101) (157.9=101) (162.9=101) (164.1=101) (170.9=101) (174.9=101) (183=101) (185=101) (191.9=101) (199.1=101) (202.8=101) (208=101)

Thanks in advance.
Tags: None
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1478
#2

19 Jun 2023, 13:35

Please add an extract of your data using the dataex command.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4491
#3

19 Jun 2023, 13:42

I agree with Hemanshu Kumar - but also, you cannot use non-integer values in your recode command anyway; more information on what you re (trying) to do would also help; see, in addition to the FAQ,:

Code:

h recode
1 like
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1478
#4

19 Jun 2023, 13:44

Also, I get the feeling you are misunderstanding the purpose of -encode-. My guess is you actually want to do

Code:

gen w6867m1m = real(w6867) replace w6867m1m = 101 if inrange(w6867m1m, 149, 208)

See also

Code:

help encode

which includes the line

Do not use encode if varname contains numbers that merely happen to be stored as strings; instead, use generate newvar = real(varname) or destring; see real() or [D] destring.

Last edited by Hemanshu Kumar; 19 Jun 2023, 13:49.
1 like
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1478
#5

19 Jun 2023, 14:02

Let me also elaborate what is actually happening in what you are doing.

Say we start with the data

Code:

clear input str6 w6867 "149" "151.9" "153.9" "155.2" "157.9" "162.9" end

Now when you -encode- these strings, Stata actually creates a variable that is just integers -- starting from 1 for the lowest value in the data, here 149, 2 for 151.9, etc. 149 is merely a label attached to the number 1, 151.9 is the label for the number 2, and so on. The storage type might be long or something else, depending on your Stata settings. In the next step, when you -recast- this variable to int, it changes the data type, but makes no practical difference. 149 is still being stored as 1, 151.9 as 2, and so on.

You can verify this as follows:

Code:

. encode w6867, gen(w6867m1m) . recast int w6867m1m . list, noobs nolabel sep(0) +------------------+ | w6867 w6867m1m | |------------------| | 149 1 | | 151.9 2 | | 153.9 3 | | 155.2 4 | | 157.9 5 | | 162.9 6 | +------------------+

where the -list- command has been asked to show the variables without their labels. So now you can see why your recode doesn't change any values!

Last edited by Hemanshu Kumar; 19 Jun 2023, 14:05.
1 like
Comment
Lauren Hunter

Join Date: Jun 2023

Posts: 5
#6

19 Jun 2023, 15:41

Hermanshu,

I am beyond grateful for all your help, thank you so much. You were right, I didn't understand what encoding was actually doing and it wasn't what I needed to do. The destring command doesn't work with this variable but I was FINALLY able to recode with generate newvar = real(varname).

I'm doing dissertation research and am a complete beginner at Stata, so I appreciate you walking me through this issue.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30168
#7

19 Jun 2023, 16:32

The destring command doesn't work with this variable but I was FINALLY able to recode with generate newvar = real(varname).

RED FLAG. You may be losing data. -destring- is, at bottom, a wrapper for -gen newvar = real(varname)-. The difference is that -destring- checks for values of the string variable that are not actually translatable to numbers, or can only be translated with loss of information. So you should go back and run:

Code:

list varname if missing(real(varname)) & !missing(varname)

This will show you the values of varname that -destring -is rejecting. Inspect that list carefully. Some of them may be perfectly good looking numbers, they just will lose some precision (typically one decimal place) if converted. Those are not really a problem for you. But some of them may be malformed numbers, things like "3.4.5" Is that supposed to be 3.45 or 34.5? Some of them may contain non-numeric content altogether, like "2.5w." What is that supposed to be? Maybe it'a typo for 2.52 or 2.53, but who knows? You should do whatever you can do find out what the correct values really are for these and fix the data here as well.

Wherever you find a malformed number or non-numeric content you need to figure out what it is supposed to actually be and fix the error in the data.

After you have fixed those, run -list varname if missing(real(varname)) & !missing(varname)- again. Verify that the results you see now consist entirely of things that you cannot fix into actual numbers (or, if you are lucky, you see no results of this at all.) At that point, -gen newvar = real(varname)- will only lose data that is invalid and irretrievable anyway.

Last edited by Clyde Schechter; 19 Jun 2023, 16:38.
Comment
Lauren Hunter

Join Date: Jun 2023

Posts: 5
#8

19 Jun 2023, 20:37

Hi Clyde,

Thanks so much for jumping in when you did! Your code was very useful and indeed caught some issues. These are the values the -destring- command is rejecting:

w6867

1763. E819.9
7282. E958.9
7838. E819.9
8293. E958.9
8915. E819.9
9214. E899

These codes are death codes according to the ICD-9, so they are categorical. Upon your suggestion, I inspected my data and learned that after I initially ran -gen newvar = real(varname)-, that these codes disappeared in my newvar. So what I did was replace the "E" with a "1":

replace w6867 = "1819.9" in 1763
replace w6867 = "1899" in 9214
replace w6867 = "1819.9" in 7838
replace w6867 = "1819.9" in 8915
replace w6867 = "1958.9" in 7282
replace w6867 = "1958.9" in 8293

After doing this, I reran your code and nothing popped up. Then, the initial -gen newvar = real(varname)- ran perfectly with no missing values. Next, I was able to make actual changes with the recode command:

recode w6867m1m (250 250.1 = 141) (263.9 = 144) (332 = 163) (344.81 = 115) (348.1 = 169) (410.9 414.9 425.4 427.5 428 428.9 429.9 436 437.3 441.9 442.9 444.9 746.9 746.89= 121) (415.1 458.9 459 = 129) (486 = 133) (492.8 518.89 = 139) (557 557.9 569.5 = 151) (571.5 573.8 = 152) (577.9 = 156) (586 = 153) (710 = 119) (781.9 789.1 790.2 799.8 = 997) (959.9 994.1 996.62 997.1 998 1899 1819.9 1910.9 1919.9 1958.9 = 194)

w6867m1m: 104 changes made

Lastly, I applied the label for the death codes I'm using:
label values w6867m1m w6867m2m w6867m3m w6867m4m CauseofDeath

Again, thanks so much for your good eye and insight, Clyde. Life saver!

Lauren
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35780
#9

20 Jun 2023, 01:34

This still seems confused and roundabout to me. What exactly do you want to do and why?

Your original string variable can be encoded directly so that obvious string codes like "E899" and the numeric-looking codes such as "151.9" alike are mapped to integers and the original codes become value labels. So, if you need a numeric version of the variable, that is a safe one-step mapping.

If you want to coarsen the classification, then recode of that encoded variable is one way to proceed, but the syntax will not involve any non-integers.
Comment
Lauren Hunter

Join Date: Jun 2023

Posts: 5
#10

20 Jun 2023, 06:26

Hi Nick,

Thanks for your input. To answer your questions, I needed to recode the ICD-9 codes of a merged variable to the death codes of my main dataset. I need to do this so I have uniform death codes and can perform cox models later. .
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35780
#11

20 Jun 2023, 06:43

Thanks for the detail which would have been invaluable in #1. If the problem is a string variable in one dataset and a numeric variable with value labels in another I would have tried a decode in the second case.
Comment
Lauren Hunter

Join Date: Jun 2023

Posts: 5
#12

20 Jun 2023, 08:57

Thanks Nick, that is very useful to know.
Comment

Announcement

Recode: 0 changes made

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment