Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identifying values that differ for a given group

    I have a dataset that looks something like this:

    ID var1 flag_same
    1 3 0
    1 3 0
    1 2 0
    1 2 0
    1 2 0
    2 1 0
    2 2 0
    3 2 1
    3 2 1

    I'd like to create a variable that indicates when the values of var1 for a given value of ID are completely the same - i.e., generate the variable flag_same. Any ideas about how to do this would be so helpful. It would be great if this could work for both numeric and string variables.

    Best,
    Erika

  • #2
    Its not clear what the 'flag_same' variable will contain if the condition matches. The following command will create 'flag_same' with 1=where ID and var1 mathces for first observation and 'zero' for the rest of the matches.

    Code:
    egen flag_same = tag(ID var1)
    lis, clean noobs
    
     ID   var1   flag_same  
         1      3          1  
         1      3          0  
         1      2          1  
         1      2          0  
         1      2          0  
         2      1          1  
         2      2          1  
         3      2          1  
         3      2          0
    Roman

    Comment


    • #3
      In the fake dataset above, ID 3 appears twice in the dataset. The two values of var1 for ID 3 are the same, so I'd like to create a variable, flag_same, which is equal to 1 for both of those records. For the other two IDs, 1 and 2, the values of var1 are different, so I'd like the variable flag_same to be equal to 0 for all of those records. Hopefully that makes sense.

      Comment


      • #4
        The following code is based on your ID ranges from 1 to 3. Replace the values according to your data:


        Code:
        gen flag =.
        
        forval i=1/3 { //Change the value here based on your ID
        su var1 if ID==`i'
        replace flag = 1 if ID==`i' & var1==r(mean)
        }
        replace flag = 0 if flag==. & !missing(var1) //This ensures that if you have a value missing
        for var1, you will not get a value in 'flag'
        
        lis, clean noobs
        
           ID   var1   flag  
             1      3      0  
             1      3      0  
             1      2      0  
             1      2      0  
             1      2      0  
             2      1      0  
             2      2      0  
             3      2      1  
             3      2      1
        PS: There may be other easier ways of doing it, but that is what coming to my mind at the moment.
        Last edited by Roman Mostazir; 09 Jun 2016, 14:04. Reason: added PS
        Roman

        Comment


        • #5
          This is an FAQ: http://www.stata.com/support/faqs/da...ions-in-group/

          There is an one-line solution, but anything shorter would appeal too.

          Code:
           
          bysort ID (var1): gen  flag_same = var[1] == var[_N]

          Comment


          • #6
            Thank you, Nick and Roman.

            Nick - your solution works perfectly. Thanks!

            Comment

            Working...
            X