Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Which is the correct approach in coding a dummy variable

    Hi Statalist.

    I want to generated a dummy variable from a categorical variable with values ranging '0-10'. The range '0-2' is nil to low and '3-10' is mid-high. I note that I have two categorical variables: one relates to responses by husbands and the other by wives(relimp1 - importance for husband, relimp2 - importance for wife):
    Code:
    gen byte imp2 = inrange(relimp1, 3, 10) & inrange(relimp2, 3, 10) & relimp1 < . & relimp2 < .
    However as you can see below, "0" was given when relimp1 or relimp2 were 'missing', so I tried:
    Code:
    gen byte imp4 = 1 if inrange(relimp1, 3, 10) & inrange(relimp2, 3, 10) & relimp1 < . & relimp2 < . replace imp4 = 0 if (relimp12 == 1 & relimp22 == 1) | (relimp12 == 1 & inlist(relimp22, 2, 3)) | (inlist(relimp12, 2, 3) & relimp22 == 1)
    which provided "1" when true, "0" when false, and "." when missing - which is what I thought I should get. Based on my reading of https://www.stata.com/support/faqs/d...rue-and-false/ I thought the first piece of code would have given me this outcome.

    Given the first piece of code has considerably more "0" than the second piece of code, I believe I should go with the second piece of code (imp4). Am I reading too much into this? Help is appreciated.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long(id p_id) byte(wave relimp1 relimp2 imp2 imp4)
    106 1002 10  .  . 0 .
    106 1002 11  .  . 0 .
    106 1002 12  .  . 0 .
    106 1002 13  .  . 0 .
    106 1002 14  0  0 0 0
    106 1002 15  .  . 0 .
    106 1002 16  .  . 0 .
    106 1002 17  .  . 0 .
    106 1002 18  0  0 0 0
    108  109  1  .  . 0 .
    108  109  2  .  . 0 .
    108  109  3  .  . 0 .
    108  109  4  5  6 1 1
    108  109  5  .  . 0 .
    108  109  6  .  . 0 .
    108  109  7  .  5 0 .
    103  104  1  .  . 0 .
    103  104  2  .  . 0 .
    103  104  3  .  . 0 .
    103  104  4 10 10 1 1
    103  104  5  .  . 0 .
    103  104  6  .  . 0 .
    103  104  7 10 10 1 1
    103  104  8  .  . 0 .
    103  104  9  .  . 0 .
    103  104 10 10 10 1 1
    103  104 11  .  . 0 .
    103  104 12  .  . 0 .
    103  104 13  .  . 0 .
    103  104 14 10 10 1 1
    103  104 15  .  . 0 .
    103  104 16  .  . 0 .
    103  104 17  .  . 0 .
    103  104 18 10 10 1 1
    end
    Am I correct in my understanding that
    Code:
    ! missing(relimp1, relimp2)    is the same as   
    relimp1 < . & relimp2 < .
    Stata 15.1

    Note this was originally posted at https://www.statalist.org/forums/for...=1601514760045 though resposted as nature of question differs from that thread.

  • #2
    The first code is incorrect because whenever you encounter a missing value in relimp1 or relimp2, the corresponding -inrange()- term will be false, and since all the terms are connected by &, the entire if-condition is false, so you get zero.

    I don't get your second code at all because it refers to variables like relimp12 and relimp22 that do not exist in your data. So that will get you nothing but error messages. It's not entirely clear what you actually want this new variable to do. My best guess is that you want imp4 to be 1 if both relimp1 and relimp2 are between 3 and 10 (inclusive), missing if either relimp1 or relimp2 is missing, and 0 otherwise. The simplest way to get that is:

    Code:
    gen byte imp4 = inrange(relimp1, 3, 10) & inrange(relimp2, 3, 10) if !missing(relimp1, relimp2)
    And yes, -!missing(relimp1, relimp2)- is the same as- relimp1 < . & relmp2 < .- .

    Comment


    • #3
      Thank you Clyde Schechter. That is exactly what I was after. I see what I was doing wrong in the first line of code now. If I changed the '&' to 'if' it would have been ok.
      Code:
      gen byte imp2 = inrange(relimp1, 3, 10) & inrange(relimp2, 3, 10) if relimp1 < . & relimp2 < .
      Anyway, I'll follow your approach in future -!missing()- is tidier I think. Thank you again. Kind regards, Chris

      Comment

      Working...
      X