Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • replacing values of variables in varlist with -99 if all of them take on value 0

    hi there,

    i hope that somebody can help me with the following problem:

    in a questionnaire we have sets of questions where multiple answers can be chosen. they are coded 0 for "not ticked" and 1 if "ticked"
    (in addition, some of them are coded -98 or -97 if answer is not valid/not applicable).

    we now want to recode all of the variables in the set of variables with -99 (thats our third value for "missing") if all of them are 0 and/or -98 or -97

    i found a solution, but it seems rather complicated:
    Code:
    gen var1=1 if hh1 == 0 & hh2 == 0 & hh3 == 0 & (hh1_num == -97 | hh1_num == -98) & (hh2_num == -97 | hh2_num == -98) & (hh3_num == -97 | hh3_num == -98)
    replace hh1 = -99 if var1 == 1
    replace hh2 = -99 if var1 == 1
    replace hh3 = -99 if var1 == 1
    is there any possibility using
    Code:
    foreach
    and
    Code:
    recode
    ?
    i have found a few things, but i am not sure how to do it properly.

    thanks a lot for your help.
    hanne
    Last edited by hanne brandt; 30 May 2017, 07:05.

  • #2
    Hanne:
    maybe what follows can do the trick:
    Code:
    egen flag=rowmean(hh1 hh2 hh3)  
    foreach var of varlist hh1-hh3 {
    replace `var'=0 if flag==0|flag==-97|flag==-98
            }
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3

      I am interpreting this as hh1, hh2, hh3 can and must be any mix of 0, -97, -98 to be recoded as -99. That doesn't seem to be what your code is doing. It seems inconsistent even at first sight as an operator seems missing from
      Code:
        
      (hh1_num == -97 | hh2 == -98) (hh2_num == -97 | hh2 == -98)
      but it's puzzling beyond that. If the three variables are all zero

      Code:
      hh1 == 0 & hh2 == 0 & hh3 == 0
      then there seems no scope for any other conditions to be added on the same variables with &, so the rest of the code messes up the condition . Also, it is not clear why you also refer to

      Code:
      hh1_num, hh2_num, hh3_num
      which may be the same variables (or something else). That aside, my interpretation leads to

      Code:
      local vals 0, -97, -98  
      local cond inlist(hh1, `vals') & inlist(hh2, `vals') & inlist(hh3, `vals')  
      foreach k = 1/3 {    
           replace hh`j' = -99 if `cond'  
      }
      Top tip: it is common here for people to write & when they mean | Their thinking is possibly something like "and I also want observations with these values" but Stata sees only "and this must also be true of the same observation". So,

      Code:
      if myvar == 42 & myvar == 666
      does not mean "observations with values 42 or 666"!

      There will be no observations satisfying such a condition, but Stata won't tell you that you did not mean what you said

      EDIT: Carlo has a different interpretation, that

      all of the variables must be 0

      or

      -97

      or

      -98
      Last edited by Nick Cox; 30 May 2017, 07:27.

      Comment


      • #4
        Nick is correct (maybe I've tried to oversimplify my life!).
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          I can't be sure which interpretation Hanne intends, but a choice of solutions is there.

          Comment


          • #6
            I agree with the technical solutions proposed above (depending on one's interpretation). But I would like to suggest that Hanne would be better off not doing this.

            One of Stata's best features, in my opinion, is extended missing values. Encoding not valid/not applicable as -98/-97 is just an accident waiting to happen. At some point you will do some calculation with these variables, and Stata will treat -98 and -97 as bona fide numerical values of that variable. With luck you'll notice the problem quickly and fix it. With bad luck, you won't notice it and somebody else will realize that something must be wrong with your results and confront you with it later!

            Using Stata's extended missing values (see -help missing-), you can have a separate code for not valid, not applicable, and not missing, but Stata will always recognize these as missing values that must not be used in calculations. So, I would do this:

            Code:
            mvdecode hh1 hh2 hh3, mv(-97 = .v \ -98 = .a)
            egen any_ones = rowmax(hh1 hh2 hh3)
            forvalues i = 1/3 {
                replace hh`i' = .m if !any_ones
            }
            If you want to, you can also make these extended missing values easier to spot in listings of your data by including them in a value label:

            Code:
            label define hh  .v "Not Valid"  .a "Not Applicable"  .m "Nothing Ticked"
            label values hh1 hh2 hh3 hh

            Comment


            • #7
              Good take, Clyde!
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                I agree with Clyde and Carlo. I do try to think "Is this a good idea?" too but on this occasion was focused solely on the Stata question being asked.

                Comment


                • #9
                  Thank you Nick, Carlo and Clyde for your very quick answers and I am sorry I didn't make my intention absolutely clear.
                  (Concerning the extended missing value option: I know about it and will use it myself, but in our team we work with people using SPSS and they asked for -99/-98/-97)

                  Nonetheless, I will try to explain better what I meant above. We have six variables:

                  The first three (hh1, hh2, hh3) can take on the values 0 (not ticked) 1 (ticked)
                  The other three variables (hh1_num, hh2_num, hh3_num) are variables where people have to insert a number and otherwise they can take on the values -99, -98, -97

                  The intention is:
                  If hh1 & hh2 & hh3 take on the value 0 (all of them) and hh1_num & hh2_num & hh3_num (all of them take) on either -99 or -98 or -97
                  then all of the above variables should be replaced by -99 (missing).

                  Is that what Carlo suggested?

                  Sorry, it is not easy for me to explain it well. Did the intention become clearer now?

                  Thank you for your time, I really appreciate it.
                  Hanne
                  Last edited by hanne brandt; 30 May 2017, 12:30.

                  Comment


                  • #10
                    Code:
                    local vals -99, -97 
                    local cond hh1 == 0 & hh2 == 0 & hh3 == 0  
                    local cond `cond' & inrange(hh1_num, `vals') & inrange(hh2_num, `vals') & inrange(hh3_num, `vals')  
                    foreach k = 1/3 {    
                         replace hh`j' = -99 if `cond'  
                         replace hh`j'_num = -99 if `cond' 
                    }

                    Comment


                    • #11
                      thank you very much, nick!

                      do i have to adjust the code in any way (replace j or k by anything)?
                      i just copied it into stata and it told me: invalid syntax r(198)
                      (i hope it's not because i am using stata 12. if so, i will try it with a newer version of stata at work tomorrow.)

                      kind regards,
                      hanne

                      Comment


                      • #12
                        Sorry;

                        Code:
                         
                         foreach k = 1/3 {
                        is doubly wrong and should be

                        Code:
                        forval j = 1/3 {

                        Comment

                        Working...
                        X