Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Assigne variable X's value in row A to its instance in row B if a certain variable in row B has the same value as an other one in row A

    I have three variables: v1, v2, v3:

    v1 contains a numeric ID. It indicates about which person information is available.

    v2 also contains numeric IDs that identify persons. v2, however, does not contain the ID of the person about whom information is available, but the ID of the person who provided that information. Thus, in the case where a person A testified about herself, v1 contains the same ID as v2. In the case where a person B testified about a person A, v1 contains the ID of A, v2 contains the ID of B.

    v3 contains a response code. It can have characteristics like "completed", "not encountered" or "refused".

    The response code is only present in those lines in which v1 == v2 (i.e. lines in which a person has provided information about him/herself).
    In the lines where v1 != v2, the response code is missing.
    For all lines where the response code is missing, I want to check if the person giving information (their ID is in v2) occurs in another line in v1 and if so, I want to assign their response code in the line where the person is giving information about another person.

    Minimal example:

    Code:
    set obs 10
    egen v1 = seq(), from(1) to(10)
    egen v2 = seq(), from(1) to(5)
    gen v3 = "completed"
    replace v3 = "." if v1 > 5
    replace v3 = "refused" if v1 == 2
    replace v3 = "not encountered" if v1 == 4
    The result is:
    v1 v2 v3
    1 1 completed
    2 2 refused
    3 3 completed
    4 4 not encountered
    5 5 completed
    6 1 .
    7 2 .
    8 3 .
    9 4 .
    10 5 .

    I am looking for code like:
    if (v3 == "."){
    a = value of v2.
    Search in v1 till you find value a.
    Take value of v3 in that row and assign to v3 in initial row.
    }

    It should generate:
    v1 v2 v3
    1 1 completed
    2 2 refused
    3 3 completed
    4 4 not encountered
    5 5 completed
    6 1 completed
    7 2 refused
    8 3 completed
    9 4 not encountered
    10 5 completed

    Thank you very much in advance!

  • #2
    I have made several changes to your example data because I find it confusing to remember what v1, v2, and v3 are, specially which is which as between v1 and v2. So I have given them informative names. Also, I have changed "." to "" because "" is the way Stata recognizes missing values. With those changes:
    Code:
    clear*
    set obs 10
    egen v1 = seq(), from(1) to(10)
    egen v2 = seq(), from(1) to(5)
    gen v3 = "completed"
    replace v3 = "" if v1 > 5 // NOTE "", NOT "."
    replace v3 = "refused" if v1 == 2
    replace v3 = "not encountered" if v1 == 4
    
    rename v1 subject
    rename v2 informant
    rename v3 response
    
    gen byte self_report = (informant == subject)
    by informant (self_report), sort: replace response = response[_N] ///
        if missing(response)
    sort subject
    does what you ask.

    Note that this code reflects an ambiguity in your description of what you want. If an informant provides more than one self report, then it is unclear which of those responses should be used. In this code, such ties are broken randomly and irreproducibly. Perhaps in your full data set there is never more than one self-report by a respondent, so this issue never arises. But if it does, and if a random selection is not desired, then please post back explaining how you would prefer to resolve such problems.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      I have made several changes to your example data because I find it confusing to remember what v1, v2, and v3 are, specially which is which as between v1 and v2. So I have given them informative names. Also, I have changed "." to "" because "" is the way Stata recognizes missing values. With those changes:
      Code:
      clear*
      set obs 10
      egen v1 = seq(), from(1) to(10)
      egen v2 = seq(), from(1) to(5)
      gen v3 = "completed"
      replace v3 = "" if v1 > 5 // NOTE "", NOT "."
      replace v3 = "refused" if v1 == 2
      replace v3 = "not encountered" if v1 == 4
      
      rename v1 subject
      rename v2 informant
      rename v3 response
      
      gen byte self_report = (informant == subject)
      by informant (self_report), sort: replace response = response[_N] ///
      if missing(response)
      sort subject
      does what you ask.

      Note that this code reflects an ambiguity in your description of what you want. If an informant provides more than one self report, then it is unclear which of those responses should be used. In this code, such ties are broken randomly and irreproducibly. Perhaps in your full data set there is never more than one self-report by a respondent, so this issue never arises. But if it does, and if a random selection is not desired, then please post back explaining how you would prefer to resolve such problems.
      Thank you very much for your solution and for your additional comments! That helped me solve my problem and understand Stata a bit better.
      Note 1: In my data there is actually never more than one self-report by a respondent.
      Note 2: Is there a way to correct the typo in the headline of my post? ("Edit" only let's me change the content under the headline)

      Comment

      Working...
      X