Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Changing values of yes/no binary variables from 1/2 to 1/0 using loop

    Hello,

    I believe this is a common problem but I have not found an answer online. I have a large dataset with multiple numeric binary variables with "no" response options coded as 2 instead of 0. Additionally, there are no value labels in the dataset. Fortunately, when a variable has values equal to either 1 or 2, it is always a yes/no question. Therefore, I would like to create a loop that changes the value 2 to 0 only for binary variables with response options equal to 1 or 2 (or have a max value of 2). I would also like to create value labels. I provided the code I have so far and an example dataset below. In the example dataset, all the variables are binary except for 'numkids', so I would not want to change the values of that variable. Could you help me fix the blue part of my code?

    Thank you,
    Tom

    local imputedvars *_i
    foreach var of `imputedvars’ {
    recode `var' (2=0) if `var' has values equal to 1, 2, or .
    label define yesno 0 “No” 1 “Yes”
    label values `var’ yesno if `var' has values equal to 0, 1, or .
    }


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(id male white older21 numkids married)
    1 1 2 1 0 1
    2 2 1 1 3 2
    3 1 2 2 2 1
    4 2 2 2 0 1
    5 1 1 2 1 2
    6 2 2 1 1 1
    7 1 1 2 2 2
    8 2 1 1 0 1
    end


  • #2
    findname from the Stata Journal can help here. I note that you want to exclude the identifier too and that you should define the value labels just once, outside the loop (indeed, you'll get an error trying to redefine it otherwise).

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(id male white older21 numkids married)
    1 1 2 1 0 1
    2 2 1 1 3 2
    3 1 2 2 2 1
    4 2 2 2 0 1
    5 1 1 2 1 2
    6 2 2 1 1 1
    7 1 1 2 2 2
    8 2 1 1 0 1
    end
    
    findname, all(inlist(@, 1, 2)) local(wanted) 
    label define yesno 0 "No" 1 "Yes" 
    
    foreach v of local wanted { 
        replace `v' = `v' == 1 
        label val `v' yesno 
    }

    Comment


    • #3
      Hi Nick,

      Thank you very much. I will have to remember that great command. I tried your code but the number of cases in the yes and no conditions were wrong. It was my fault for not including missing values in my example dataset. I used the following code and it seems to have worked. Thanks again!

      findname, all(inlist(@, ., 1, 2)) local(wanted)
      label define yesno 0 "No" 1 "Yes"

      foreach v of local wanted {
      replace `v' = 0 if `v' == 2
      label val `v' yesno
      }

      Comment


      • #4
        Indeed: missing values would oblige a change in the recipe.

        Comment


        • #5
          Originally posted by Nick Cox View Post
          Indeed: missing values would oblige a change in the recipe.
          Could you please show us how to solve this problem when missing values take place?
          * Example generated by -dataex-.
          To install: ssc install dataex
          clear
          input byte(id male white older21 numkids married)
          1 1 2 1 0 1
          2 2 1 . 3 2
          3 1 2 2 . 1
          4 . . . 0 1
          5 1 1 2 1 2
          6 2 2 1 . 1
          7 . 1 2 . 2
          8 2 . . 0 1
          end

          Thank you!
          Last edited by smith Jason; 15 Jul 2023, 22:24.

          Comment


          • #6
            Originally posted by smith Jason View Post
            Could you please show us how to solve this problem when missing values take place?
            I believe that Nick was alluding to the approach shown immediately before, in #3. Have you tried it?

            Comment


            • #7
              The mapping for whatever 2 to 0, 1 to 1 is accomplished by 2 - whatever. Notice that this works for missings too as 2 - . is returned as . (missing).

              Here is some other technique too. See also https://journals.sagepub.com/doi/pdf...36867X19830921

              Clearly one technique that works and is congenial is enough.

              Code:
              clear
              input byte(id male white older21 numkids married)
              1 1 2 1 0 1
              2 2 1 . 3 2
              3 1 2 2 . 1
              4 . . . 0 1
              5 1 1 2 1 2
              6 2 2 1 . 1
              7 . 1 2 . 2
              8 2 . . 0 1
              end
              
              findname , all(inlist(@, 1, 2, .)) local(tofix) 
              
              foreach v of local tofix { 
                  gen `v'_1 = 2 - `v' 
                  gen `v'_2 = cond(missing(`v'), ., `v' == 1)
                  assert `v'_1 == `v'_2 
              }
              
              label def yesno 1 "yes" 0 "no"
              label val *_? yesno 
              
              list *_1 , sep(0)
              
                   +----------------------------------------+
                   | male_1   white_1   older~_1   marrie~1 |
                   |----------------------------------------|
                1. |    yes        no        yes        yes |
                2. |     no       yes          .         no |
                3. |    yes        no         no        yes |
                4. |      .         .          .        yes |
                5. |    yes       yes         no         no |
                6. |     no        no        yes        yes |
                7. |      .       yes         no         no |
                8. |     no         .          .        yes |
                   +----------------------------------------+

              Comment


              • #8
                Thank you very much!

                Comment


                • #9
                  Originally posted by Nick Cox View Post
                  The mapping for whatever 2 to 0, 1 to 1 is accomplished by 2 - whatever. Notice that this works for missings too as 2 - . is returned as . (missing).

                  Here is some other technique too. See also https://journals.sagepub.com/doi/pdf...36867X19830921

                  Clearly one technique that works and is congenial is enough.

                  Code:
                  clear
                  input byte(id male white older21 numkids married)
                  1 1 2 1 0 1
                  2 2 1 . 3 2
                  3 1 2 2 . 1
                  4 . . . 0 1
                  5 1 1 2 1 2
                  6 2 2 1 . 1
                  7 . 1 2 . 2
                  8 2 . . 0 1
                  end
                  
                  findname , all(inlist(@, 1, 2, .)) local(tofix)
                  foreach v of local tofix {
                  gen `v'_1 = 2 - `v'
                  gen `v'_2 = cond(missing(`v'), ., `v' == 1)
                  assert `v'_1 == `v'_2
                  }
                  
                  label def yesno 1 "yes" 0 "no"
                  label val *_? yesno
                  
                  list *_1 , sep(0)
                  
                  +----------------------------------------+
                  | male_1 white_1 older~_1 marrie~1 |
                  |----------------------------------------|
                  1. | yes no yes yes |
                  2. | no yes . no |
                  3. | yes no no yes |
                  4. | . . . yes |
                  5. | yes yes no no |
                  6. | no no yes yes |
                  7. | . yes no no |
                  8. | no . . yes |
                  +----------------------------------------+

                  For the above faked dataset, use the code
                  findname , all(inlist(@, 1, 2, .)) local(tofix)
                  will filter out the variables that may still contain only 1s and missing values, or only 2s and missing values.
                  So, the code chunk was revised as follows.


                  clear
                  input byte(male white older21 numkids married varA varB theta phi psi omega)
                  1 2 1 0 1 1 2 1 1 1 1
                  2 1 . 3 2 1 2 2 . . .
                  1 2 2 . 1 1 2 3 2 2 4
                  . . . 0 1 1 2 . . . .
                  1 1 2 1 2 1 2 4 1 1 5
                  2 2 1 . 1 1 2 . 2 . 2
                  . 1 2 . 2 1 2 5 . . 3
                  2 . . 0 1 1 2 . . . .
                  end

                  loc tofix
                  foreach v of varlist _all {
                  cap assert inlist(`v',1,2,.)
                  if !_rc {
                  summarize `v', meanonly
                  if r(min)==1 & r(max)==2 {
                  loc tofix `tofix' `v'
                  }
                  }
                  }

                  lab def yesno 0 "No" 1 "Yes"
                  foreach v of loc tofix {
                  gen `v'_rec = cond(missing(`v'), ., `v'==1)
                  lab values `v'_rec yesno
                  }
                  di "`tofix'"

                  Comment


                  • #10
                    Originally posted by smith Jason View Post
                    For the above faked dataset, . . . the code chunk was revised as follows.

                    loc tofix
                    foreach v of varlist _all {
                    cap assert inlist(`v',1,2,.)
                    if !_rc {
                    summarize `v', meanonly
                    if r(min)==1 & r(max)==2 {
                    loc tofix `tofix' `v'
                    }
                    }
                    }

                    lab def yesno 0 "No" 1 "Yes"
                    foreach v of loc tofix {
                    gen `v'_rec = cond(missing(`v'), ., `v'==1)
                    lab values `v'_rec yesno
                    }
                    You might be able to tighten that code chunk up a little:
                    Code:
                    label define NY 0 No 1 Yes
                    
                    quietly foreach var of varlist _all {
                        tabulate `var', matrow(M)
                        if rowsof(M) == 2 & M[1, 1] == 1 & M[2, 1] == 2 {
                            generate byte `var'_rec = cond(`var' == 2, 0, `var')
                            label values `var'_rec NY
                        }
                    }
                    I'm not sure that your capture assert inlist(. . .) requirement actually contributes much: inasmuch as your variables' type is byte, summarize followed by the testing of the minimum and maximum alone ought to cover things.

                    Comment


                    • #11
                      Following the description in #1, all suggestions here assume that 1 corresponds to "yes" and 2 to "no" for every 1/2 variable. That's a pretty strong assumption. If any variable is coded such that 1 maps to "no", 2 to "yes", the proposed transformations silently reverse its meaning. That's a sure-fire way to misinterpret results downstream. All this is to say: readers should be careful not to treat the suggested code snippets as a general recipe.
                      Last edited by daniel klein; 12 Mar 2026, 03:22.

                      Comment


                      • #12
                        #11 daniel klein

                        Indeed; there is a danger that people won't read the thread thoroughly and carefully.

                        Being careful might include (among other possibilities)

                        Code:
                        findname, all(inlist(@, ., 1, 2)) local(wanted)
                        describe `wanted'
                        which would tell you the value labels associated with each variable with distinct values only from 1 and 2 and system missing.

                        Comment


                        • #13
                          As another suggestion for future readers, I'd mention that the capacity of -recode- to recode a list of variables without an explicit loop was bypassed here. That would not save code in the current instance, and doesn't create the list to be recoded, but I think the "recode a list" construction deserves recognition.

                          Code:
                          label define NY 0 No 1 Yes
                          local list12 "....." // a list of 1/2 variables to be recoded, constructed however
                          recode `list12' (1=1) (2=0), prefix(new_)  
                          label values new_* NY
                          (I'd note that -recode- is on my list of neglected Stata commands.)

                          Comment


                          • #14
                            Originally posted by Mike Lacy View Post
                            As another suggestion for future readers, I'd mention that the capacity of -recode- to recode a list of variables without an explicit loop was bypassed here. That would not save code in the current instance, and doesn't create the list to be recoded, but I think the "recode a list" construction deserves recognition.

                            Code:
                            label define NY 0 No 1 Yes
                            local list12 "....." // a list of 1/2 variables to be recoded, constructed however
                            recode `list12' (1=1) (2=0), prefix(new_)
                            label values new_* NY
                            (I'd note that -recode- is on my list of neglected Stata commands.)
                            label define NY 0 "No" 1 "Yes"

                            local list12
                            quietly foreach var of varlist _all {
                            tabulate `var', matrow(M)
                            if rowsof(M)==2 & M[1,1]==1 & M[2,1]==2
                            local list12 `list12' `var'
                            }

                            recode `list12' (1=1) (2=0), prefix(new_)
                            label values new_* NY

                            display "`list12'"

                            Comment


                            • #15
                              The code in #14 does not meet the problem raised in #11 which is that if variables with non-missing values just 1 and 2 have variously different interpretations of 1 and 2 then you would be messing up the data.

                              It is thus also necessary to check that such variables have the same coding, mostl obviously by being associated with the same value label.

                              If you know somehow, and ideally have checked, that all such variables have the same interpretation then solutions mentioned earlier in the thread should be fine.

                              I don't mind one bit if people are unable or unwilling to use findname (Stata Journal) but at the same time the code in #14 could be shortened mightily by using findname.


                              Comment

                              Working...
                              X