Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dummy variable

    In my dataset I have the question 'I am satisfied with the salary I receive from my work'. There are 4 answer options: strongly disagree, disagree, agree and strongly agree. I would like to make this a dummy variable where the categories strongly disagree and disagree represent 0 and the categories agree and strongly agree 1. I tried a few things via the generate command, but this didn't work. I also keep getting the message "agree is not found". And I also get this with the other 3 answer options. Does anyone have an idea how I can fix this?

  • #2
    Code:
    help recode
    also, it sounds like you have numeric data with labels; you can refer to the labels in your command, but it is often much easier to use the underlying numbers (and I think that the recode command only allows the numbers) so you can use
    Code:
    label list
    to get those numbers; if, on the other hand, your variable is a string variable, you will need something a little more complicated probably using the "inlist" function within a -generate- command;

    if you supply a dataex example (see the FAQ), no one would have to guess and exact code would be easy to supply

    Comment


    • #3
      Marleen:
      as an aside to Rich's helpful recommendation, what follows might be useful:
      Code:
      . set obs 4
      number of observations (_N) was 0, now 4
      
      . g id=_n
      
      . g wanted=_n-1
      
      . label define wanted 0 "strongly disagree" 1 "disagree" 2 "agree" 3 "strongly agree"
      
      . label val wanted wanted
      
      . list
      
           +------------------------+
           | id              wanted |
           |------------------------|
        1. |  1   strongly disagree |
        2. |  2            disagree |
        3. |  3               agree |
        4. |  4      strongly agree |
           +------------------------+
      
      . recode wanted (0 1=0) (2 3=1)
      (wanted: 3 changes made)
      
      . label define wanted 0 "strongly disagree/disagree" 1 "agree/strongly agree", modify
      
      . list
      
           +---------------------------------+
           | id                       wanted |
           |---------------------------------|
        1. |  1   strongly disagree/disagree |
        2. |  2   strongly disagree/disagree |
        3. |  3         agree/strongly agree |
        4. |  4         agree/strongly agree |
           +---------------------------------+
      
      .
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Originally posted by Rich Goldstein View Post
        Code:
        help recode
        also, it sounds like you have numeric data with labels; you can refer to the labels in your command, but it is often much easier to use the underlying numbers (and I think that the recode command only allows the numbers) so you can use
        Code:
        label list
        to get those numbers; if, on the other hand, your variable is a string variable, you will need something a little more complicated probably using the "inlist" function within a -generate- command;

        if you supply a dataex example (see the FAQ), no one would have to guess and exact code would be easy to supply
        Hi Rich. First of all, thank you for answering me. With the code 'dataex tc3g45a' I have found the following:

        label values tc3g45a ValueScheme8
        label def ValueScheme8 1 "Strongly disagree", modify
        label def ValueScheme8 2 "Disagree", modify
        label def ValueScheme8 3 "Agree", modify
        label def ValueScheme8 4 "Strongly agree", modify

        So indeed, this is a numeric data with labels. 1 and 2 have to get 0 as a dummy variable and 3 and 4 should get 1 as a dummy variable. I have read that I should be using the tabulate and generate command, but this doesn't seem to work out. Do you have an idea on how to make this a dummy variable?

        PS: tc3g45a is the name of the question that had to be answered.

        Comment


        • #5
          try
          Code:
          recode tc3g45a (1 2 = 0) (3 4 = 1), gen(tc3g45a_2)
          I strongly recommend using variable names that are easy to read, easy to type and meaningful in the context of your project

          Comment


          • #6
            Code:
            recode tc3g45a (1/2 = 0 "Disagreement") (3/4 = 1 "Agreement"), prefix(d_)
            will create a new variable, d_tc3g45a, with the properties you want. See -help recode- for more information about other capabilities and uses for this command.

            By the way, is there a good reason for doing this? You are throwing away information. Sometimes this is nevertheless helpful, for example, if the extreme categories are rarely chosen and those small categories interfere with analysis. But when a Likert-like item produces that kind of response distribution, it may mean that the prompt was badly worded for, or inappropriately addressed to, the study sample.

            In general, you are handicapping your data when you aggregate responses this way.
            Last edited by Clyde Schechter; 20 Mar 2021, 13:52. Reason: Crossed with #5

            Comment


            • #7
              Originally posted by Rich Goldstein View Post
              try
              Code:
              recode tc3g45a (1 2 = 0) (3 4 = 1), gen(tc3g45a_2)
              I strongly recommend using variable names that are easy to read, easy to type and meaningful in the context of your project
              Thank you! This worked

              Comment


              • #8
                Originally posted by Clyde Schechter View Post
                Code:
                recode tc3g45a (1/2 = 0 "Disagreement") (3/4 = 1 "Agreement"), prefix(d_)
                will create a new variable, d_tc3g45a, with the properties you want. See -help recode- for more information about other capabilities and uses for this command.

                By the way, is there a good reason for doing this? You are throwing away information. Sometimes this is nevertheless helpful, for example, if the extreme categories are rarely chosen and those small categories interfere with analysis. But when a Likert-like item produces that kind of response distribution, it may mean that the prompt was badly worded for, or inappropriately addressed to, the study sample.

                In general, you are handicapping your data when you aggregate responses this way.
                Thank you for your help, Clyde!

                I'm doing this because the 4 answer options need to be bundled to two categories (0 and 1) to make a dummy variable, I think?

                Comment


                • #9
                  Well, yes, to make a dummy variable, you need to reduce to two categories. But the thrust of my question is why do you want to make a dummy variable? That's usually not a good idea with this kind of data.

                  Comment


                  • #10
                    Originally posted by Clyde Schechter View Post
                    Well, yes, to make a dummy variable, you need to reduce to two categories. But the thrust of my question is why do you want to make a dummy variable? That's usually not a good idea with this kind of data.
                    Below you will find my output. My dependent variable is job satisfaction (t3pjobsa) and my independent variable is salary. I have made a dummy of this in order to conclude that 50.89% are strongly agreed and agreed and that 49.11% are strongly disagreed and agreed. Without a dummy I don't know how you could interpret the coefficient 50.89% because then there are 4 answer options instead of 2. I hope I come across as clear.


                    t3pjobsa | Coef. Std. Err. t P>|t| [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                    tc3g45a | .5089994 .0247461 20.57 0.000 .4604913 .5575076
                    _cons | 12.19015 .063583 191.72 0.000 12.06551 12.31479


                    .
                    Last edited by Marleen Yaramis; 20 Mar 2021, 16:13.

                    Comment


                    • #11
                      Well, the interpretation of that coefficient depends on the kind of regression that you ran. You show neither the command itself nor enough of the output to glean that from your post. But I cannot conceive of any regression model in which the interpretation would be what you have stated in #10.

                      Also, if tc3g45a is a vary about salary, it's hard to see how it might be answered with a scale running from Strongly Disagree to Strongly Agree. Perhaps it was a question about satisfaction with salary? Or something like that.

                      In any case, I think you are on the wrong track. If you would like more specific advice about how to approach this, please post back explaining your specific research question, a clear and complete explanation of the variables in your data set and what they measure, and an example of your data using the -dataex- command. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

                      Comment

                      Working...
                      X