Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating a new variable (0/1) based on multiple variables and excluding missing values

    Hi I am having some trouble. What I want to do is create a variable lets call it Var_1. I want this variable to have either a 1, 0 or a missing value depending on whether someone is over or under a certain threshold in multiple other variables.

    I have tried the following code but it does not work:

    Code:
    gen Var_1 = Var_66<=2 & Var_66<. & Var_67 >=1 & Vat_67 <. & Var_68 >=1 & Var_68 <. & Var_69 >=1 & Var_69<. & Var_70<=3 & Var_70 <.
    .

    So I want it so if someone gets 2 or less for Var_66 they get a 1 for Var_1 regardless of how they score on the other variables. I want it to ignore missing values and if someone has a missing value for all variables then they will have a missing value in Var_1. Otherwise I want it to be a 0 if all values don't meet the thresholds specified.

  • #2
    Your code makes it so that the variable only equals 1 if all those conditions are specified, which is probably not what you're after if I've understood right. Also not sure what you mean when you say it doesn't work.

    The below might get you closer.

    Code:
    gen Var_1 = 0
    
    replace Var_1 = 1 if Var_66<=2 & Var_66 != . & Var_67 >=1 & Var_67 !=. & Var_68 >=1 & Var_68 !=. & Var_69 >=1 & Var_69 !=. & Var_70<=3 & Var_70 !=.

    Comment


    • #3
      I am not sure I follow this.

      You have a typo in there: Vat_67.

      Otherwise you seem to be writing & often where | is needed. Where you write

      I want it to ignore missing values and if someone has a missing value for all variables then they will have a missing value in Var_1
      I think you mean

      I want it to ignore missing values but if someone has a missing value for all variables then they will have a missing value in Var_1
      How about this? Note how inrange() simplifies code.

      Code:
      * Rule 1: 1 if #66 2 or less, otherwise 0 if #66 is not missing, otherwise missing if it's missing 
      gen Var_1 = Var_66 <= 2 if Var_66 < . 
      
      * but Rule 2: 1 if any of the other variables is in specified range and not missing 
      replace Var_1 = inrange(Var_67, 1, .) | inrange(Var_68, 1, .) | inrange(Var_69, 1, .) | inrange(Var_70, 3, .)  
      
      * but Rule 3: missing if all of the other variables are missing 
      replace Var_1 = . if mi(Var_67) & mi(Var_68) & mi(Var_69) & mi(Var_70)

      Comment


      • #4
        Thank you for your help. Apologies I have rectified the typo.
        Nick you are correct where you said what you think I mean with the but.

        Gah it has still not worked! Nick when I have inputted your code exactly as you wrote it, the missing values in all variables has given the missing value in Var_1 as I wanted, however, all other values has been given a 1. There are no 0s, despite people not meeting the conditions.
        When i try and do the code written by Daniel I get 0s in Var_1 where there should be missing values. I also have people who have 0s when they should be a 1, for example someone has scored 2 in Var_66 but has been given a score of 0 in Var_1.

        Comment


        • #5
          My last line should be

          Code:
          replace Var_1 = . if mi(Var_67) & mi(Var_68) & mi(Var_69) & mi(Var_70) & mi(Var_66)
          Otherwise, the key point is whether I have your rules correct. Perhaps you should focus on that. By the way, these variable names are dopey: it's much better to use informative, evocative names.

          Comment


          • #6
            Thanks, I'm sorry I couldn't think of other names.
            It still just shows that anyone who has data gets a 1 for Var_1. Is it because some variables the score needs to be more than a number and some variables it needs to be less than?

            Comment


            • #7
              We can't see your data!

              We can only try to understand your rules, which I take to be

              Rule 1: 1 if #66 2 or less, otherwise 0 if #66 is not missing

              but Rule 2: 1 if any of the other variables is in specified range and not missing

              but Rule 3: missing if all of the variables are missing

              Comment


              • #8
                I'm really sorry I don't think I've explained this very well. Here is the correct info:

                Code:
                Var_1 = 1
                
                IF
                
                Var_66 = <=2
                
                OR
                
                Var_67 = >=1
                
                OR
                
                VAR_68 =>=1
                
                OR
                
                Var_69 = >=1
                
                OR
                
                Var_70 = <=3
                Code:
                 Var_1 = 0
                
                IF
                
                ALL Values in above variables are not in specified range
                Code:
                 Var_1 = missing IF all values in above variables are missing.
                If there is data in some variables and missing in others, ignore the partial missing data
                Last edited by Joe Tuckles; 08 Aug 2019, 07:20.

                Comment


                • #9
                  My code was wrong for #70 Another try: .


                  Code:
                  gen Var_1 = (Var_66 <= 2) | inrange(Var_67, 1, .) | inrange(Var_68, 1, .) | inrange(Var_69, 1, .) | (Var_70 <= 3) 
                  
                  replace Var_1 = . if mi(Var_66) & mi(Var_67) & mi(Var_68) & mi(Var_69) & mi(Var_70)

                  Comment


                  • #10
                    Got it! Thanks so much

                    Comment

                    Working...
                    X