Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • [Help] Converting to binary

    Hi,


    I am trying to convert a variable that has 5 options into a binary variable where every value >=1 equals 1 and every value 0 = 0. This is what I am entering in currently but I seem to be unable to group the 0 dummy correctly.

    In the attatched picture I have written replace GRB=1 if GradeRepeat >=2 but also entering replace GRB=1 if GradeRepeat >=1 produces the same outcome. Incorrect grouping and it is using the total observations not solely the observations from GradeRepeat (8,984 instead of desired 5,943).

    Thanks in advance for any help, I am a Stata beginner.


    EDIT: I have sent a message to the Mods/Admins to correct my username.
    Last edited by Minesh Patel; 28 Nov 2014, 03:50.

  • #2
    Your code could be presented like this using CODE delimiters

    Code:
    gen GRB = 0
    replace GRB = 1 if GradeRepeat >= 2
    tab GradeRepeat
    It is then much easier to read, to copy and paste, etc.

    The first two lines can be abbreviated like this

    Code:
    gen GRB = GradeRepeat >= 2
    but that's secondary. The major point is that creating GRB will have no effect on GradeRepeat. To check what you did, you need

    Code:
    tab GradeRepeat GRB
    Note: At various points you say that you want (variously) GradeRepeat >= 2 and GradeRepeat >= 1. That decision is up to you, but manifestly the results will differ so long as there are instances of 1.
    Last edited by Nick Cox; 28 Nov 2014, 04:37.

    Comment


    • #3
      Thank you for the advice. I've entered the following;

      Code:
      gen GRB = GradeRepeat >=1
      tab GradeRepeat GRB
      and it appears to have worked perfectly.
      Click image for larger version

Name:	JXcxRDC.png
Views:	1
Size:	6.1 KB
ID:	491562

      Am I correct in assuming that GRB will now be used by Stata as a binary variable?

      Comment


      • #4
        There are only two distinct non-missing values, so yes. For completeness, use the missing option of tabulate to check whether missing values on GradeRepeat have been coded 1 on GRB,
        Last edited by Nick Cox; 28 Nov 2014, 06:59.

        Comment


        • #5
          Again, not sure what to enter in to check whether missing values have been coded 1 on GRB. I have entered;

          Code:
           misstable summarize
          and it appears to all be correct.

          Comment


          • #6

            tab GradeRepeat GRB, miss would be my first choice. GRB is correct if GRB is missing when GradeRepeat is also missing.
            ---------------------------------
            Maarten L. Buis
            University of Konstanz
            Department of history and sociology
            box 40
            78457 Konstanz
            Germany
            http://www.maartenbuis.nl
            ---------------------------------

            Comment


            • #7
              Originally posted by Maarten Buis View Post
              tab GradeRepeat GRB, miss would be my first choice. GRB is correct if GRB is missing when GradeRepeat is also missing.
              Seems to produce the same outcome. Pretty confident that missing values are not being counted.

              Comment


              • #8
                There is no need for uncertainty. The code suggested by Maarten and earlier by myself will show missings if there are any.

                Comment


                • #9
                  Just used Maarten's code and it displays this information;

                  Code:
                   tab GradeRepeat GRB, miss


                  Not sure how to interpret the tables, is Stata counting the missing values as 1 when it shouldn't?

                  Comment


                  • #10
                    your code (#3 above) sets GRB to 1 whenever GradeRepeat is >=1; therefor, since Stata, interprets missing as the highest possible value, GRB is set to 1 when GradeRepeat is missing; the easiest step, at this point, is "replace GRB=. if GradeRepeat==1"

                    Comment


                    • #11
                      Originally posted by Rich Goldstein View Post
                      your code (#3 above) sets GRB to 1 whenever GradeRepeat is >=1; therefor, since Stata, interprets missing as the highest possible value, GRB is set to 1 when GradeRepeat is missing; the easiest step, at this point, is "replace GRB=. if GradeRepeat==1"

                      This doesn't seem to be right since Stata is still counting missing values as 1 when it should not be and now it has taken the value of 1 under GradeRepeat to be classed as missing for GRB.



                      Just to clarify, I want GRB to be a dummy/binary variable where 0 = no one has repeated a grade(0 value in GradeRepeat) and where 1 = participant has repeated a grade once or more (1,2,3 and 4 in GradeRepeat).

                      Comment


                      • #12
                        my bet is that you did not enter what I said you should enter; however, since you don't show us what you entered, I, of course, cannot tell

                        Comment


                        • #13
                          Originally posted by Rich Goldstein View Post
                          my bet is that you did not enter what I said you should enter; however, since you don't show us what you entered, I, of course, cannot tell
                          I did, it is shown at the top of the attached picture. Here is the static link; http://i.gyazo.com/e3cbd462dea6e257be0643286badba18.png

                          EDIT:

                          I will be running a regression so I believe that Stata will not take into account the missing values which is the same as actually deleting those missing values?
                          Last edited by Minesh Patel; 30 Nov 2014, 13:08. Reason: regression

                          Comment


                          • #14
                            The link to a picture in your last post does not connect, at least not for me. In any case, picture attachments on this forum usually are difficult to read, at best. The best way to show us what you typed and what Stata gave you, is to copy both the command and the ouput from the Results window and paste it into a code block in the Forum's advanced editor. It always comes out readable that way.

                            In any case, I'm coming very late to this particular party. But just perusing the recent posts, it appears that you created a variable using the following code:
                            Code:
                            gen GRB = GradeRepeat >=1
                            but you want GRB to be missing if GradeRepeat is missing. So that code is incorrect for that purpose. As was explained earlier by others, Stata represents missing values of numeric variables internally by numbers that are larger than all non-missing numerical values. So any missing value of GradeRepeat will satisfy the condition GradeRepeat >= 1, and the corresponding value of GRB will be 1, not missing. So you need to change the code to:
                            Code:
                            gen GRB = GradeRepeat >= 1 if !missing(GradeRepeat)
                            The use of the -if- qualifier will cause Stata to leave GRB missing when GradeRepeat is missing. If you run that and then tab -GRB GradeRepeat, miss-, you will see that it all comes out the way you want.

                            Comment


                            • #15
                              1. your attachment is not visible to me - please read the FAQ

                              2. if GradeRepeat is one of the variables in your regression, you are right - but, again, you don't give us the needed information to help you

                              Comment

                              Working...
                              X