Assigne variable X's value in row A to its instance in row B if a certain variable in row B has the same value as an other one in row A

Andreas Fabian Fischer

Join Date: Sep 2023

Posts: 9
#1

Assigne variable X's value in row A to its instance in row B if a certain variable in row B has the same value as an other one in row A

12 Sep 2023, 09:56

I have three variables: v1, v2, v3:

v1 contains a numeric ID. It indicates about which person information is available.

v2 also contains numeric IDs that identify persons. v2, however, does not contain the ID of the person about whom information is available, but the ID of the person who provided that information. Thus, in the case where a person A testified about herself, v1 contains the same ID as v2. In the case where a person B testified about a person A, v1 contains the ID of A, v2 contains the ID of B.

v3 contains a response code. It can have characteristics like "completed", "not encountered" or "refused".

The response code is only present in those lines in which v1 == v2 (i.e. lines in which a person has provided information about him/herself).
In the lines where v1 != v2, the response code is missing.
For all lines where the response code is missing, I want to check if the person giving information (their ID is in v2) occurs in another line in v1 and if so, I want to assign their response code in the line where the person is giving information about another person.

Minimal example:

Code:

set obs 10 egen v1 = seq(), from(1) to(10) egen v2 = seq(), from(1) to(5) gen v3 = "completed" replace v3 = "." if v1 > 5 replace v3 = "refused" if v1 == 2 replace v3 = "not encountered" if v1 == 4

The result is:
v1 v2 v3

1 1 completed

2 2 refused

3 3 completed

4 4 not encountered

5 5 completed

6 1 .

7 2 .

8 3 .

9 4 .

10 5 .

I am looking for code like:
if (v3 == "."){
a = value of v2.
Search in v1 till you find value a.
Take value of v3 in that row and assign to v3 in initial row.
}

It should generate:
v1 v2 v3

1 1 completed

2 2 refused

3 3 completed

4 4 not encountered

5 5 completed

6 1 completed

7 2 refused

8 3 completed

9 4 not encountered

10 5 completed

Thank you very much in advance!
Tags: assign, data, syntax
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#2

12 Sep 2023, 10:34

I have made several changes to your example data because I find it confusing to remember what v1, v2, and v3 are, specially which is which as between v1 and v2. So I have given them informative names. Also, I have changed "." to "" because "" is the way Stata recognizes missing values. With those changes:

Code:

clear* set obs 10 egen v1 = seq(), from(1) to(10) egen v2 = seq(), from(1) to(5) gen v3 = "completed" replace v3 = "" if v1 > 5 // NOTE "", NOT "." replace v3 = "refused" if v1 == 2 replace v3 = "not encountered" if v1 == 4 rename v1 subject rename v2 informant rename v3 response gen byte self_report = (informant == subject) by informant (self_report), sort: replace response = response[_N] /// if missing(response) sort subject

does what you ask.

Note that this code reflects an ambiguity in your description of what you want. If an informant provides more than one self report, then it is unclear which of those responses should be used. In this code, such ties are broken randomly and irreproducibly. Perhaps in your full data set there is never more than one self-report by a respondent, so this issue never arises. But if it does, and if a random selection is not desired, then please post back explaining how you would prefer to resolve such problems.
1 like
Comment
Andreas Fabian Fischer

Join Date: Sep 2023

Posts: 9
#3

14 Sep 2023, 06:59

Originally posted by Clyde Schechter View Post

I have made several changes to your example data because I find it confusing to remember what v1, v2, and v3 are, specially which is which as between v1 and v2. So I have given them informative names. Also, I have changed "." to "" because "" is the way Stata recognizes missing values. With those changes:

Code:

clear* set obs 10 egen v1 = seq(), from(1) to(10) egen v2 = seq(), from(1) to(5) gen v3 = "completed" replace v3 = "" if v1 > 5 // NOTE "", NOT "." replace v3 = "refused" if v1 == 2 replace v3 = "not encountered" if v1 == 4 rename v1 subject rename v2 informant rename v3 response gen byte self_report = (informant == subject) by informant (self_report), sort: replace response = response[_N] /// if missing(response) sort subject

does what you ask.

Note that this code reflects an ambiguity in your description of what you want. If an informant provides more than one self report, then it is unclear which of those responses should be used. In this code, such ties are broken randomly and irreproducibly. Perhaps in your full data set there is never more than one self-report by a respondent, so this issue never arises. But if it does, and if a random selection is not desired, then please post back explaining how you would prefer to resolve such problems.

Thank you very much for your solution and for your additional comments! That helped me solve my problem and understand Stata a bit better.
Note 1: In my data there is actually never more than one self-report by a respondent.
Note 2: Is there a way to correct the typo in the headline of my post? ("Edit" only let's me change the content under the headline)
Comment

v1	v2	v3
1	1	completed
2	2	refused
3	3	completed
4	4	not encountered
5	5	completed
6	1	.
7	2	.
8	3	.
9	4	.
10	5	.

v1	v2	v3
1	1	completed
2	2	refused
3	3	completed
4	4	not encountered
5	5	completed
6	1	completed
7	2	refused
8	3	completed
9	4	not encountered
10	5	completed

Announcement

Assigne variable X's value in row A to its instance in row B if a certain variable in row B has the same value as an other one in row A

Comment

Comment