Identifying if an observation in one variable appears in another variable

Hanna Loyland

Join Date: Apr 2017

Posts: 2
#1

Identifying if an observation in one variable appears in another variable

25 Apr 2017, 02:24

Dear all,
I am working on a project and have run into a problem with my dataset.
I have a dataset with two variables labled ID1 and ID2. I want to generate a dummy variable that is equal to 1 if the observation in ID1 is found among all the observations in ID2, and 0 otherwise. Do you have any advice on how to do this?
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35724

25 Apr 2017, 03:36

countmatch (SSC) counts matches. Here is a toy example:

Code:

 clear 
 
input id1 id2 
1    5      
2    6
3    7
4    8 
5. 5    9 
end 

. 
countmatch id1 id2, gen(count) 

. 
list 

     +-------------------+
     | id1   id2   count |
     |-------------------|
  1. |   1     5       0 |
  2. |   2     6       0 |
  3. |   3     7       0 |
  4. |   4     8       0 |
  5. |   5     9       1 |
     +-------------------+

In this case by construction there is just one match and the count could be used as an indicator variable. In other situations, something like

Code:

gen indicator = count > 0

will be needed.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35724
#3

25 Apr 2017, 06:15

In #2 anyone wishing to repeat the code should please note this revision.

Code:

input id1 id2 1 5 2 6 3 7 4 8 5 9 end
Comment
Hanna Loyland

Join Date: Apr 2017

Posts: 2
#4

25 Apr 2017, 07:17

Thank you! This was exactly what I wanted to do and really helpful.
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#5

25 Apr 2017, 09:11

If the variables are numeric, you can also do this with rangestat (from SSC). In this case, you define an interval to count the number of observations where the value of id2 is the same as id1 for the current observation:

Code:

clear input id1 id2 1 5 2 6 3 7 4 8 5 9 end rangestat (count) n=id2, interval(id2 id1 id1) gen wanted = cond(mi(n), 0, 1) list
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#6

25 Apr 2017, 10:02

Yes; rangestat (Picard, Cox, Ferrer) makes countmatch (Cox) pretty much redundant.
Comment
JJ Kovach

Join Date: Feb 2018

Posts: 29
#7

15 Nov 2018, 08:20

Robert or NIck,
I found this post as a potentially simple solution to a very similar problem. I have observations numbers of matched firms (from teffects nnmatch) as a variable. But I need to then code those observations listed as being part of a control group. I have created a running index variable (gen obs = _n) to compare with the variable containing the observation numbers of matched firms. This post appears to do what I want, but it is not working as desired. The rangestat works as described for the id1 and id2 example posted, but not necessarily with other specifications of data. Please see the additional data for comparison:
Code:

Code:

clear input id1 id2 id3 1 5 2 2 6 . 3 7 . 4 8 3 5 9 3 end rangestat (count) n2=id2, interval(id2 id1 id1) rangestat (count) n3=id3, interval(id3 id1 id1)

n2 shows a 1 for the fifth observation, which indicates that there was exactly one instance of the number 5 in id2. However, n3 is empty. I was expecting to see a 1 for the second observation and a 2 for the third.

Is there something I can do to get rangestat to perform in this manner?
Thank you in advance!
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

15 Nov 2018, 08:27

In general in Stata, missings will be ignored -- so

Code:

summarize x

ignores missings --- unless they are the focus of attention -- so

Code:

count if missing(x)

does what it is instructed to do.

Does this do what you want?

Code:

 clear
input id1 id2 id3
1    5    2
2    6    .
3    7    .
4    8    3
5    9    3
end


mvencode id?, mv(0)

rangestat (count) n2=id2, interval(id2 id1 id1)
rangestat (count) n3=id3, interval(id3 id1 id1)



 list

     +---------------------------+
     | id1   id2   id3   n2   n3 |
     |---------------------------|
  1. |   1     5     2    .    . |
  2. |   2     6     0    .    1 |
  3. |   3     7     0    .    2 |
  4. |   4     8     3    .    . |
  5. |   5     9     3    1    . |
     +---------------------------+

.

Comment

JJ Kovach

Join Date: Feb 2018

Posts: 29
#9

16 Nov 2018, 08:22

Nick,
Thank you for the mvencode refinement. All working now as desired.
Comment

Announcement

Identifying if an observation in one variable appears in another variable

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment