Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Find string values which appear in two or three variables

    Hi all

    I've searched on the forum and found some things that are helpful but can't seem to adapt them to my needs.

    I have three string variables var1 var2 var3 and I would like to assign group1=caseid to any cases that have values that appear in var1 and var2. group2=caseid to any cases that have values that appear in both var1 and var3 and group3=caseid to any cases that have values that appear in both var2 and var3.

    Then I would like to assign group4=caseid to any cases that have values that appear in all var1, var2 and var3.

    I think the loop function would be useful but I've tried this and I think I'm on the wrong track as it assigns caseid to all cases

    gen group1 = .
    forvalues caseid=1/34387 {
    foreach var of varlist var1-var2 {
    replace group1 = caseid if var1==var2
    }
    }

    many thanks in advance
    Last edited by anna sinead; 17 Jun 2019, 13:44.

  • #2
    your attempted code does not seem consistent with what you say (which is why a data example, using -dataex- (see the FAQ)) is always a good idea; my guess is:
    Code:
    gen numthere=rownonmiss(var1, var2, var3), stroke
    this will give you a count of the non-missing values in these 3 variables for each observation in your data; this could then be followed by
    Code:
    gen group1=caseid if numthere>=2

    Comment


    • #3
      Yes sorry I'll be more clear. I've given an example below. Most of the values are missing. For caseid=10 var1="AA100005916" and for caseid=14, var2=""AA100005916" therefore I want both cases to have the value group1=caseid (or any value). I hope that is clearer.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float caseid str13 var1 str12 var2 str13 var3
      10 "AA100005916" "" ""
      11 "AA100005916" "" ""
      12 "AA103059310" "" ""
      13 "AA104027203" "" ""
      14 "AA104027203" "AA100005916" ""
      15 "AA112068403" "" ""
      16 "" "" ""
      17 "" "" ""
      18 "" "" ""
      19 "" "" ""
      20 "AA208038915" "" "AA103059310"
      end

      Comment


      • #4
        well, other than the fact that I made a couple of typo's, this is still a way to go; instead of the first command above, use
        Code:
        egen byte numthere=rownonmiss(var1 var2 var3), strok
        however, you appear to have changed your criterion so there may be an easier way:
        Code:
        gen group1=caseid if var1~="" | var2!="" | var3!=""

        Comment


        • #5
          I'm still not being clear I think this is why I found it difficult to find a solution when I was searching in previous posts

          I understand that your code assigns caseid as long as the value is not missing in var1, var2 or var3.

          Whereas I want to only assign group1=caseid when the case has a value which can be found in any case for var2


          Your code gives me:
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float caseid str13(var1 var2) str12 var3 float group1
           8 ""            "" ""  .
           9 "0A101068603" "" ""  9
          10 "AA100005916" "" "" 10
          11 "AA100005916" "" "" 11
          12 "AA103059310" "" "" 12
          13 "AA104027203" "" "" 13
          14 "AA104027203" "AA100005916" "" 14
          15 "AA112068403" "" "" 15
          16 "AA120118511" "" "" 16
          17 "AA121089103" "" "" 17
          18 "AA124019103" "" "" 18
          19 "AA207059103" "" "" 19
          20 "AA208038915" "" "AA103059310" 20
          end


          I want:

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float caseid str13(var1 var2) str12 var3 float group1
           8 ""            "" ""  .
           9 "0A101068603" "" ""  
          10 "AA100005916" "" "" 10
          11 "AA100005916" "" ""
          12 "AA103059310" "" ""
          13 "AA104027203" "" ""
          14 "AA104027203" "AA100005916" ""
          15 "AA112068403" "" ""
          16 "AA120118511" "" ""
          17 "AA121089103" "" ""
          18 "AA124019103" "" ""
          19 "AA207059103" "" ""
          20 "AA208038915" "" "AA103059310"
          end

          ***edited because I made a mistake in the second code

          I hope that is clearer
          Last edited by anna sinead; 17 Jun 2019, 14:47.

          Comment


          • #6
            I found this way to do it that is not the prettiest but seems to work. I seperated the data into three files (caseid + var1, caseid + var2, caseid + var3), dropped the duplicates in each var then merged each pair of variables then all three to mark the common values with _merge==3.

            This code:
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            
            use "var1data", clear
            duplicates drop var1, force
            tempfile file1
            save `file1'
            
            use "var2data", clear
            rename var2 var1
            duplicates drop var1, force
            merge 1:1 var1 using `file1'
            generate start = substr(var1,1,11)
            list start _merge, clean

            Gives me:

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input float caseid str13 var1 byte _merge
            20 "AJ115036711" 1
            21 "AJ205029015" 1
            22 "AL130078615" 3
            23 "AL215067109" 1
            24 "AM101017615" 3
            25 "AM102048715" 3
            26 "AM102078811" 1
            27 "AM104077015" 1
            28 "AM116038610" 1
            29 "AM124039311" 1
            30 "AM126068603" 1
            end
            label values _merge _merge
            label def _merge 1 "master only (1)", modify
            label def _merge 3 "matched (3)", modify

            Comment

            Working...
            X