I've got a variable with that is sometimes present, sometimes not. For example,
I would like to have a new variable that replaces all missing (i.e. target_var=="NA") within a particular group with the target var of the other observations in that group but if there are multiple values within that group, I would like them all to remain missing. So ideally the final dataset would look something like
sort group_id
So far, I've got a very inefficient method
but this does not help me get some of the observations back to missing as would be the case for group_id==3 above. Another way of phrasing this question would be how can I set final_var from missing to an existing value in target_var if there is only one non-missing value? Once I have this, I can replace the remainder with a more simple command.
Code:
group_id target_var 1 NA 1 C 1 NA 2 B 2 B 3 A 3 A 3 NA 3 B 3 NA
Code:
group_id target_var final_var 1 NA C 1 C C 1 NA C 2 B B 2 B B 3 A A 3 A A 3 NA NA 3 B B 3 NA NA
So far, I've got a very inefficient method
Code:
gen final_var = target_var if target_var!="NA" replace final_var = final_var[_n-1] if final_var[_n] == "" & group_id[_n-1] == group_id[_n] replace final_var = final_var[_n+1] if final_var[_n]=="" & group_id[_n+1] == group_id[_n]
Comment