Loop over multiple observations

Adriana Peci

Join Date: Nov 2014

Posts: 32
#1

Loop over multiple observations

19 Mar 2015, 10:38

I have multiple unique test level observations per patient. I want to create a variable at patient level that identify all patients that had test x performed. I tried the code below but instead of getting the result at patient level I still get a result at test level. Can anyone help me with this ? Thank you bysort id:gen had_testx="."
foreach v of var analysis {
by id, sort: replace had_testx="Yes" if `v'=="X"
}
What I want What I get
patient analysis had_testx had_testx
1 X Yes Yes
1 Y Yes
2 X Yes Yes
3 Y Yes
4 X Yes Yes
4 Y Yes
5 Y
5 Y
6 X Yes Yes
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#2

19 Mar 2015, 10:57

if you were doing this with "had_testx" as a numeric variable, you could use -egen- with the max() function (don't forget the "by" option); so, why not switch to making it a numeric variable which you can then label as yes or no depending on its value?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#3

19 Mar 2015, 11:29

Rich Goldstein's advice to use numeric variables is good. But apart from that, your code is rather confused. You have a loop over a varlist that contains only a single variable--so it's not a loop at all. Your code actually tells Stata to do exactly what it's doing: it sets had_testx to Yes in exactly those observations where analysis = "X". Based on your example of what you want, I think you can do it this way:

Code:

by id, sort: egen had_testx = max(analysis == "X") label define Boolean 0 "No" 1 "Yes" label values had_testx Boolean

This isn't exactly what you asked for, because it will set had_testx equal to 1 ("Yes") in all observations for the id's who have an observation with analysis X, whereas you appear to want it in the first only. So you can clean that up with:

Code:

by id, replace had_testx = . if _n > 1

Now, be cautious about that. The sort order within id in your data is not specified, so which observation ends up being first for a given id is arbitrary, and will not be reproducible from one run of the code to the next. So if you subsequently do some kind of calculations conditioned on -if had_testx = 1-, you will get different results each time you run it. If there is some natural ordering within id (e.g. if there is a date variable you haven't mentioned), then you can overcome this problem by using -by id (date), sort- instead of -by id-. If there is no natural ordering of the observations within id, then it probably makes more sense to leave had_testx alone in the first place rather than blanking out later values: having a variable that applies to a group observations appear once only in the first is nice in spreadsheets where the main purpose is appeal to the human eye. But it often leads to problems in statistical packages.

If your concern is that you later want to count the number of id's for which had_testx is 1, then you can just tag an arbitrary observation within each id group:

Code:

egen flag = tag(id) .... tab had_testx if flag

On another note, you have posted several questions on this forum and gotten answers. You should be aware of the preference for using real first and last names by now. At this point, please repay the courtesy that others have shown you by showing us the courtesy of adhering to our cultural norm of using real names. (Click on Contact Us to arrange to have your user name changed.) Thank you.
Comment
Adriana Peci

Join Date: Nov 2014

Posts: 32
#4

19 Mar 2015, 15:15

Thank you so much guys. I really appreciate your help.
P.S. I didn't mind sharing my contact information just didn't know how to change my user name. I did request help to do that.

Thanks again,

Best regards,

Adriana Peci
Comment

Announcement

Loop over multiple observations

Comment

Comment

Comment