Encode data to continuous variable/avoid losing data with generate?

Welch Suggs

Join Date: Apr 2015
Posts: 17

Encode data to continuous variable/avoid losing data with generate?

24 Jul 2024, 16:15

I have a Likert scale item imported from text in a .csv file. To wit:

Code:

Indicate your agreement with the |
following statement: I feel burned out |
at work. | Freq. Percent Cum.
----------------------------------------+-----------------------------------
Agree | 1,524 41.91 41.91
Disagree | 403 11.08 53.00
Somewhat agree | 1,342 36.91 89.91
Somewhat disagree | 367 10.09 100.00
----------------------------------------+-----------------------------------
Total | 3,636 100.00

I want to run analysis on it as a continuous variable, i.e. correlations etc. If I encode it, then it assigns the value 1 to Agree, 2 to Disagree, etc., when I want it to go from 1 being Disagree to 4 being Agree. So I tried generating a new variable and setting labels:

Code:

gen burn=.
la var burn "Indicate your agreement with the following statement: I feel burned out at work."
replace burn=1 if burnedout=="Disagree"
replace burn=2 if burnedout=="Somewhat disagree"
replace burn=3 if burnedout=="Somewhat agree"
replace burn=4 if fulfilling=="Agree"
la def burn 1 "Disagree" 2 "Somewhat disagree" 3 "Somewhat agree" 4 "Agree"
la val burn burn

But when I do that, it loses a lot of data:

Code:

Indicate your |
agreement with |
the following |
statement: I feel |
burned out at |
work. | Freq. Percent Cum.
------------------+-----------------------------------
Disagree | 41 1.73 1.73
Somewhat disagree | 106 4.47 6.20
Somewhat agree | 721 30.42 36.62
Agree | 1,502 63.38 100.00
------------------+-----------------------------------
Total | 2,370 100.00

There do not appear to be any weird leading spaces or extra characters in my original data. Does anyone have any ideas for troubleshooting the input or redoing the encode language to ensure that the encoded data are assigned the correct values? Thanks in advance!

Last edited by Welch Suggs; 24 Jul 2024, 16:18.

Tags: None

Welch Suggs

Join Date: Apr 2015

Posts: 17
#2

24 Jul 2024, 16:34

Never mind: if you define the label prior to encode, it works right. Sorry, couldn't delete!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#3

24 Jul 2024, 16:37

I'm not entirely sure what's going wrong here. I do see one obvious problem: -replace burn=4 if fulfilling=="Agree"- refers to a variable, fulfilling that doesn't belong here and will certainly produce strange results. It makes me also wonder if the variable you tabulated in your first table of output is actually called burnout, or if it's something else.

Anyway, there's a better way to go about this. I'm assuming that the name of the original variable is actually burnout. If not, then in the code below replace all references to burnout by the correct variable name.

Code:

label define likert4 1 "Disagree" 2 "Somewhat Disagree" 3 "Somewhat Agree" 4 "Agree" encode burnout, label(likert4) gen(_burnout) drop burnout rename _burnout burnout

Added: Crossed with #2. Moreover, it's good that you could not delete your original question. You are not the first person to encounter this difficulty, and you won't be the last. The presence of your post here, along with your explanation of how you solved it, will enable others who go down this road in the future to learn from your experience. That's the whole point of Statalist. It's not like a help line. It's a learning community.

Last edited by Clyde Schechter; 24 Jul 2024, 16:49.
1 like
Comment

Announcement

Encode data to continuous variable/avoid losing data with generate?

Comment

Comment