Dummy variable

Marleen Yaramis

Join Date: Mar 2021

Posts: 22
#1

Dummy variable

19 Mar 2021, 14:44

In my dataset I have the question 'I am satisfied with the salary I receive from my work'. There are 4 answer options: strongly disagree, disagree, agree and strongly agree. I would like to make this a dummy variable where the categories strongly disagree and disagree represent 0 and the categories agree and strongly agree 1. I tried a few things via the generate command, but this didn't work. I also keep getting the message "agree is not found". And I also get this with the other 3 answer options. Does anyone have an idea how I can fix this?
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4548
#2

19 Mar 2021, 14:57

Code:

help recode

also, it sounds like you have numeric data with labels; you can refer to the labels in your command, but it is often much easier to use the underlying numbers (and I think that the recode command only allows the numbers) so you can use

Code:

label list

to get those numbers; if, on the other hand, your variable is a string variable, you will need something a little more complicated probably using the "inlist" function within a -generate- command;

if you supply a dataex example (see the FAQ), no one would have to guess and exact code would be easy to supply
2 likes
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17854

20 Mar 2021, 04:08

Marleen:
as an aside to Rich's helpful recommendation, what follows might be useful:

Code:

. set obs 4
number of observations (_N) was 0, now 4

. g id=_n

. g wanted=_n-1

. label define wanted 0 "strongly disagree" 1 "disagree" 2 "agree" 3 "strongly agree"

. label val wanted wanted

. list

     +------------------------+
     | id              wanted |
     |------------------------|
  1. |  1   strongly disagree |
  2. |  2            disagree |
  3. |  3               agree |
  4. |  4      strongly agree |
     +------------------------+

. recode wanted (0 1=0) (2 3=1)
(wanted: 3 changes made)

. label define wanted 0 "strongly disagree/disagree" 1 "agree/strongly agree", modify

. list

     +---------------------------------+
     | id                       wanted |
     |---------------------------------|
  1. |  1   strongly disagree/disagree |
  2. |  2   strongly disagree/disagree |
  3. |  3         agree/strongly agree |
  4. |  4         agree/strongly agree |
     +---------------------------------+

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Marleen Yaramis

Join Date: Mar 2021

Posts: 22
#4

20 Mar 2021, 12:57

Originally posted by Rich Goldstein View Post

Code:

help recode

also, it sounds like you have numeric data with labels; you can refer to the labels in your command, but it is often much easier to use the underlying numbers (and I think that the recode command only allows the numbers) so you can use

Code:

label list

to get those numbers; if, on the other hand, your variable is a string variable, you will need something a little more complicated probably using the "inlist" function within a -generate- command;

if you supply a dataex example (see the FAQ), no one would have to guess and exact code would be easy to supply

Hi Rich. First of all, thank you for answering me. With the code 'dataex tc3g45a' I have found the following:

label values tc3g45a ValueScheme8
label def ValueScheme8 1 "Strongly disagree", modify
label def ValueScheme8 2 "Disagree", modify
label def ValueScheme8 3 "Agree", modify
label def ValueScheme8 4 "Strongly agree", modify

So indeed, this is a numeric data with labels. 1 and 2 have to get 0 as a dummy variable and 3 and 4 should get 1 as a dummy variable. I have read that I should be using the tabulate and generate command, but this doesn't seem to work out. Do you have an idea on how to make this a dummy variable?

PS: tc3g45a is the name of the question that had to be answered.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4548
#5

20 Mar 2021, 13:48

try

Code:

recode tc3g45a (1 2 = 0) (3 4 = 1), gen(tc3g45a_2)

I strongly recommend using variable names that are easy to read, easy to type and meaningful in the context of your project
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30357
#6

20 Mar 2021, 13:51

Code:

recode tc3g45a (1/2 = 0 "Disagreement") (3/4 = 1 "Agreement"), prefix(d_)

will create a new variable, d_tc3g45a, with the properties you want. See -help recode- for more information about other capabilities and uses for this command.

By the way, is there a good reason for doing this? You are throwing away information. Sometimes this is nevertheless helpful, for example, if the extreme categories are rarely chosen and those small categories interfere with analysis. But when a Likert-like item produces that kind of response distribution, it may mean that the prompt was badly worded for, or inappropriately addressed to, the study sample.

In general, you are handicapping your data when you aggregate responses this way.

Last edited by Clyde Schechter; 20 Mar 2021, 13:52. Reason: Crossed with #5
1 like
Comment
Marleen Yaramis

Join Date: Mar 2021

Posts: 22
#7

20 Mar 2021, 14:51

Originally posted by Rich Goldstein View Post

try

Code:

recode tc3g45a (1 2 = 0) (3 4 = 1), gen(tc3g45a_2)

I strongly recommend using variable names that are easy to read, easy to type and meaningful in the context of your project

Thank you! This worked
Comment
Marleen Yaramis

Join Date: Mar 2021

Posts: 22
#8

20 Mar 2021, 14:52

Originally posted by Clyde Schechter View Post

Code:

recode tc3g45a (1/2 = 0 "Disagreement") (3/4 = 1 "Agreement"), prefix(d_)

will create a new variable, d_tc3g45a, with the properties you want. See -help recode- for more information about other capabilities and uses for this command.

By the way, is there a good reason for doing this? You are throwing away information. Sometimes this is nevertheless helpful, for example, if the extreme categories are rarely chosen and those small categories interfere with analysis. But when a Likert-like item produces that kind of response distribution, it may mean that the prompt was badly worded for, or inappropriately addressed to, the study sample.

In general, you are handicapping your data when you aggregate responses this way.

Thank you for your help, Clyde!

I'm doing this because the 4 answer options need to be bundled to two categories (0 and 1) to make a dummy variable, I think?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30357
#9

20 Mar 2021, 15:00

Well, yes, to make a dummy variable, you need to reduce to two categories. But the thrust of my question is why do you want to make a dummy variable? That's usually not a good idea with this kind of data.
Comment
Marleen Yaramis

Join Date: Mar 2021

Posts: 22
#10

20 Mar 2021, 16:01

Originally posted by Clyde Schechter View Post

Well, yes, to make a dummy variable, you need to reduce to two categories. But the thrust of my question is why do you want to make a dummy variable? That's usually not a good idea with this kind of data.

Below you will find my output. My dependent variable is job satisfaction (t3pjobsa) and my independent variable is salary. I have made a dummy of this in order to conclude that 50.89% are strongly agreed and agreed and that 49.11% are strongly disagreed and agreed. Without a dummy I don't know how you could interpret the coefficient 50.89% because then there are 4 answer options instead of 2. I hope I come across as clear.

t3pjobsa | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
tc3g45a | .5089994 .0247461 20.57 0.000 .4604913 .5575076
_cons | 12.19015 .063583 191.72 0.000 12.06551 12.31479

.

Last edited by Marleen Yaramis; 20 Mar 2021, 16:13.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30357
#11

20 Mar 2021, 17:16

Well, the interpretation of that coefficient depends on the kind of regression that you ran. You show neither the command itself nor enough of the output to glean that from your post. But I cannot conceive of any regression model in which the interpretation would be what you have stated in #10.

Also, if tc3g45a is a vary about salary, it's hard to see how it might be answered with a scale running from Strongly Disagree to Strongly Agree. Perhaps it was a question about satisfaction with salary? Or something like that.

In any case, I think you are on the wrong track. If you would like more specific advice about how to approach this, please post back explaining your specific research question, a clear and complete explanation of the variables in your data set and what they measure, and an example of your data using the -dataex- command. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
1 like
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment