Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loop to include only observations that meet a single criteria and nothing else

    I have a dataset with 6 variables containing specific phrases. I have one phrase "Germ stat" I would like to code out to a new variable if this phrase exists in any of the 6 variables
    For example, if variable 1 has the phrase "Germ stat" and variables 2 - 6 are empty, replace new_var to be 1 BUT if variable 1 has the phrase "Germ stat" and variable 2 has another phrase "Cross bred" or any of the other 5 has something else, do not replace new_var.

    I am using the code below but it only codes out new_var if the phrase is in any of the 6 variables without regard for other phrases. How do I fix my code?

    gen new_var=.
    foreach var of varlist var1-var6 {
    replace new_var=1 if ustrpos(`var', "Germ stat")
    }

  • #2
    Perhaps you mean to do the following.
    Code:
    gen new_var=0
    foreach var of varlist var1-var6 {
    replace new_var=1 if ustrpos(`var', "Germ stat")  
    }

    Comment


    • #3
      also
      Code:
      gen byte flag2 = inlist("Germ stat", v1, v2, v3, v4, v5, v6) // max 10 args
       
      gen byte flag3 = strpos(v1+v2+v3+v4+v5+v6, "Germ stat")

      Comment


      • #4
        Actually I'm not sure either #2 or #3 give you what you want. Here is an alternative, with comparisons to what the previous two posts would generate:

        Code:
        clear
        input str10(var1 var2 var3 var4 var5 var6)
        "Germ stat" "" "" "" "" ""
        "" "Germ stat" "Cross bred" "" "" ""
        "blah" "" "Germ stat" "blah" "" "blah"
        "" "" "" "" "Germ stat" " "
        end
        
        gen new_var=0
        foreach var of varlist var1-var6 {
            replace new_var=1 if ustrpos(`var', "Germ stat")  
        }
        
        gen byte flag2 = inlist("Germ stat",var1,var2,var3,var4,var5,var6)
        gen byte flag3 = strpos(var1+var2+var3+var4+var5+var6,"Germ stat")
        gen byte wanted = trim(var1+var2+var3+var4+var5+var6) == "Germ stat"
        which produces:
        Code:
        . li, noobs
        
          +-------------------------------------------------------------------------------------------------+
          |      var1        var2         var3   var4        var5   var6   new_var   flag2   flag3   wanted |
          |-------------------------------------------------------------------------------------------------|
          | Germ stat                                                            1       1       1        1 |
          |             Germ stat   Cross bred                                   1       1       1        0 |
          |      blah                Germ stat   blah               blah         1       1       5        0 |
          |                                             Germ stat                1       1       1        1 |
          +-------------------------------------------------------------------------------------------------+
        where the code in red is all you need for the variable I think you want. Even in that, the use of trim() is optional; it allows for strings that are composed of spaces, as in the last observation of my toy data. Remove it if you want to disallow that.
        Last edited by Hemanshu Kumar; 18 Dec 2022, 13:22.

        Comment


        • #5
          By way of explanation of the code in post #2, the problem lies in how we interpret

          For example, if variable 1 has the phrase "Germ stat" and variables 2 - 6 are empty, replace new_var to be 1 BUT if variable 1 has the phrase "Germ stat" and variable 2 has another phrase "Cross bred" or any of the other 5 has something else, do not replace new_var.
          I guessed that that user wants an indicator if the phrase appears in at least one of the six variables.

          I suspected that the user started with
          Code:
          gen new_var=.
          foreach var of varlist var1-var6 {
          replace new_var = ustrpos(`var', "Germ stat")  
          }
          but found that this replaces new_var = 1 with 0 unless its the phrase appears in the sixth variable.

          So then the user tried
          Code:
          gen new_var=.
          foreach var of varlist var1-var6 {
          replace new_var = ustrpos(`var', "Germ stat")  
          }
          but this leaves new_var missing ("it only codes out new_var if the phrase is in any of the 6 variables without regard for other phrases") if the phrase is not in any of the variables.

          So I suggested in post #2 initializing new_var to 0 and replacing it with 1 if there are one or more matches to the phrase.

          With that said, post #3 avoids the unnecessary looping in creating flag3. I'd adapt that to my interpretation of the desired result with
          Code:
          gen byte flag4 = strpos(var1+var2+var3+var4+var5+var6,"Germ stat")>0

          Comment


          • #6
            Perhaps OP can clarify what is it that is actually needed.

            Another case to clarify: what if multiple variables have the phrase "Germ stat" in the same observation? Is this situation possible in the data?

            The code I provided in #4 would return a 0 in such a case. Would the OP like that to be 1? In that case, a different solution is needed.

            Comment


            • #7
              Hi Hemanshu, Sorry for the late response. Your example worked great for what I needed. Thank you.

              Comment

              Working...
              X