Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Flagging Values of A Variable That Exist In Another Variable

    Hi there,

    I have a dataset similar to the following:

    Code:
    clear
    input byte rownumber int id int id_parent
    1 1234 7869
    2 1111 1234
    3 9999 8888
    4 8888 .
    5 7777 8888
    6 2222 .
    end
    I'm having trouble figuring out how to flag records where the value in the id_parent field exists in the id field. That is, using the example dataset above, I'd like to create new variable, flag_parentidinidfield, that flags rows 2, 3, and 5 as having a value of id_parent that exists in id - see below:

    Code:
    clear
    input byte rownumber int id int id_parent byte flag_parentidinidfield
    1 1234 7869 0
    2 1111 1234 1
    3 9999 8888 1
    4 8888 . 0
    5 7777 8888 1
    6 2222 . 0
    end
    Any advice or assistance about how to do this would be greatly appreciated.

    Thanks,
    Erika

  • #2
    Here's one way to do it. It hinges on knowing that _stack flags the two variables stacked with 1 and 2; hence any identifier in both lists will be flagged by both 1 and 2.

    Code:
    clear
    input byte rownumber int id int id_parent
    1 1234 7869
    2 1111 1234
    3 9999 8888
    4 8888 .
    5 7777 8888
    6 2222 .
    end
    
    save safecopyofthis
    stack id id_parent, into(id) clear 
    bysort id (_stack) : gen both = _stack[1] == 1 & _stack[_N] == 2 
    keep if _stack == 1 
    merge 1:1 id using safecopyofthis 
    drop _*
    
    list, sep(0)  
    
         +-----------------------------------+
         |   id   both   rownum~r   id_par~t |
         |-----------------------------------|
      1. | 1111      0          2       1234 |
      2. | 1234      1          1       7869 |
      3. | 2222      0          6          . |
      4. | 7777      0          5       8888 |
      5. | 8888      1          4          . |
      6. | 9999      0          3       8888 |
         +-----------------------------------+

    Comment


    • #3
      Here's another way:

      Code:
      clear
      input byte rownumber int id int id_parent
      1 1234 7869
      2 1111 1234
      3 9999 8888
      4 8888 .
      5 7777 8888
      6 2222 .
      end
      
      preserve
      keep id
      drop if missing(id)
      duplicates drop
      rename id id_parent
      tempfile id_list
      save `id_list'
      restore
      
      merge m:1 id_parent using `id_list', keep(master match)
      gen byte flag = (_merge == 3)
      drop _merge
      And here's yet another:
      Code:
      clear
      input byte rownumber int id int id_parent
      1 1234 7869
      2 1111 1234
      3 9999 8888
      4 8888 .
      5 7777 8888
      6 2222 .
      end
      
      capture program drop match_count
      program define match_count
          count if id == rr_id_parent
          gen flag = !!r(N)
          exit
      end
      
      rangerun match_count, interval(rownumber . .) sprefix(rr_)
      This one requires the -rangerun- command, by Robert Picard, from SSC. And -rangerun- itself requires -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also from SSC. It also requires writing that little match_count program. It does feel a bit like using a sledge hammer to swat flies.

      Just shows there is more than one way to "skin the cat."

      Comment


      • #4
        This can also be done with rangestat directly. The following is a variant of the "Looking up the education of a child's mother within a household" example in the help file. Since id_parent has missing values, the example uses invalid interval bounds ([1,0]) for those observations.

        Code:
        clear
        input byte rownumber int id int id_parent
        1 1234 7869
        2 1111 1234
        3 9999 8888
        4 8888 .
        5 7777 8888
        6 2222 .
        end
        
        gen long low = cond(mi(id_parent),1, id_parent)
        gen long high = cond(mi(id_parent),0, id_parent)
        rangestat (count) flag=id, interval(id low high)
        replace flag = 0 if mi(flag) & !mi(id_parent)
        
        list
        and the results
        Code:
        . list
        
             +-------------------------------------------------+
             | rownum~r     id   id_par~t    low   high   flag |
             |-------------------------------------------------|
          1. |        1   1234       7869   7869   7869      0 |
          2. |        2   1111       1234   1234   1234      1 |
          3. |        3   9999       8888   8888   8888      1 |
          4. |        4   8888          .      1      0      . |
          5. |        5   7777       8888   8888   8888      1 |
             |-------------------------------------------------|
          6. |        6   2222          .      1      0      . |
             +-------------------------------------------------+
        
        .

        Comment

        Working...
        X