Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace variables within group and set some to missing.

    I've got a variable with that is sometimes present, sometimes not. For example,

    Code:
    group_id  target_var
    1                 NA
    1                 C
    1                 NA
    2                 B
    2                 B
    3                 A
    3                 A
    3                 NA
    3                 B
    3                 NA
    I would like to have a new variable that replaces all missing (i.e. target_var=="NA") within a particular group with the target var of the other observations in that group but if there are multiple values within that group, I would like them all to remain missing. So ideally the final dataset would look something like

    Code:
    group_id  target_var     final_var
    1                 NA       C
    1                  C       C
    1                 NA       C
    2                  B       B
    2                  B       B
    3                  A       A
    3                  A       A
    3                  NA      NA
    3                  B       B
    3                  NA      NA
    sort group_id
    So far, I've got a very inefficient method

    Code:
    gen final_var = target_var if target_var!="NA"
    replace final_var = final_var[_n-1] if final_var[_n] == "" & group_id[_n-1] == group_id[_n]
    replace final_var = final_var[_n+1] if final_var[_n]=="" & group_id[_n+1] == group_id[_n]
    but this does not help me get some of the observations back to missing as would be the case for group_id==3 above. Another way of phrasing this question would be how can I set final_var from missing to an existing value in target_var if there is only one non-missing value? Once I have this, I can replace the remainder with a more simple command.
    Last edited by Jack Reimer; 23 Oct 2018, 10:15.

  • #2
    One thing that is immediate is that working in groups usually benefits from exploiting by:.Then there is no need to check whether observations are in the same group or not.

    This solution takes your example quite literally and hinges on noticing that "A" "B" "C" all sort before "NA". Then the idea is that a group is OK for change if (and only if) all values in that group are equal to the first value after sorting or are "NA". We note that all values being "NA" suits both criteria, which is not a problem: it doesn't matter if all "NA" are replaced by "NA" in that circumstance.

    Code:
    clear 
    input group_id  str2 target_var
    1                 NA
    1                 C
    1                 NA
    2                 B
    2                 B
    3                 A
    3                 A
    3                 NA
    3                 B
    3                 NA
    end 
    
    gen long obs = _n 
    bysort group_id (target_var) : gen candidate = target_var[1] 
    gen OK = inlist(target, candidate, "NA") 
    bysort group_id (OK) : gen wanted = cond(OK[1], candidate, target) 
    sort obs 
    drop obs 
    list, sepby(group_id) 
    
         +----------------------------------------------+
         | group_id   target~r   candid~e   OK   wanted |
         |----------------------------------------------|
      1. |        1         NA          C    1        C |
      2. |        1          C          C    1        C |
      3. |        1         NA          C    1        C |
         |----------------------------------------------|
      4. |        2          B          B    1        B |
      5. |        2          B          B    1        B |
         |----------------------------------------------|
      6. |        3          A          A    1        A |
      7. |        3          A          A    1        A |
      8. |        3         NA          A    1       NA |
      9. |        3          B          A    0        B |
     10. |        3         NA          A    1       NA |
         +----------------------------------------------+
    If your real example doesn't match that, then sing out, as a fix is an easy twist on the code above.

    Comment

    Working...
    X