Matching Observations in Two Rows

Michael Miller

Join Date: Nov 2014

Posts: 9
#1

Matching Observations in Two Rows

23 Feb 2015, 12:51

Hello,

I'm sorry that this question is so simple, but I have tried and tried and do not know what to do:

In my dataset, I have a mix of information from two sources, marked by variable "source" (source== 1 | 2) about hospital admissions. Many of the observations talk about the same patient (as given by the variable MRN) and some talk about the same hospitalization (encounter). While MRN denotes the row referring to the same patient, another variable, "encounter", is unique to each separate hospitalization of each patient.

I want to do the following:
generate a variable (encounter_match) that equals "1" each time that there is a duplicate value of "encounter" (where one of the instances is in a row where source==1 and the other instance is in a row where source==2)

Can you help me do this?

Thanks,

Michael
Tags: None
Michael Miller

Join Date: Nov 2014

Posts: 9
#2

23 Feb 2015, 12:58

I don't know if the above text was entirely clear. Here is an example:

source MRN encounter
1 1195660 365387844
2 1195660 364775932
2 1195660 365387844

In this case, observation 1 and 3 are referring to the same hospitalization of the same patient. I want a variable (encounter_match) that displays a 1 in the first and third rows to signal that those rows provide data about the same hospitalization.
Comment
Vincent La

Join Date: Jan 2015

Posts: 33
#3

23 Feb 2015, 13:01

I think what you might want is

Code:

duplicates tag encounter, generate(encounter_match) replace encounter_match = (encounter_match > 0)

encounter_match should now be a dummy variable equal to 1 each time there is a duplicate value of encounter. Let me know if that works for you!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#4

23 Feb 2015, 13:01

Check out the duplicates command. But this yields to first principles:

Code:

bysort encounter (source) : gen encounter_match = _N == 2 & source[1] == 1 & source[2] == 2

For explanation, read up the by: prefix in the manuals and/or http://www.stata-journal.com/article...article=pr0004
Comment
Vincent La

Join Date: Jan 2015

Posts: 33
#5

23 Feb 2015, 13:41

The duplicates command might be safer. I believe Nick's solution will only work if there is at most two possible values of source given MRN encounter. For example, suppose that for a given encounter there existed 5 observations, three where source == 1 and two where source == 2.

Then I believe it will turn out that source[1] == 1 & source[2] == 2 will not work for you.

Last edited by Vincent La; 23 Feb 2015, 13:44.
Comment

Vincent La

Join Date: Jan 2015
Posts: 33

23 Feb 2015, 13:44

Also, Michael, it's a good idea to generally provide a reproducible example with your question in code format. In other words, try formatting your post like this:

Code:

clear
input source MRN encounter
1 1195660 365387844
2 1195660 364775932
2 1195660 365387844
end

list


     +-----------------------------+
     | source       MRN   encoun~r |
     |-----------------------------|
  1. |      1   1195660   3.65e+08 |
  2. |      2   1195660   3.65e+08 |
  3. |      2   1195660   3.65e+08 |
     +-----------------------------+

That way the data set is ready to be implemented in Stata and other people on this board will find it easier to help you.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35724
#7

23 Feb 2015, 13:46

Safer in what respect? What's the risk here? The risk is that the OP did not tell us everything we need to know.

I am pleased that you commend the duplicates command, which I regard fondly, but it's not quite right for the stated problem, which is about (1) duplicates on encounter (check) but also about (2) values of source being 1 and 2, i.e. different.

I think you missed the essential restriction _N == 2 in my code. If there were 5 observations with the same value of encounter that condition would not be satisfied.

Naturally it's a good idea to check that there are at most 2 occurrences of each distinct value of encounter and duplicates is one way of doing that. There are others, e.g.

Code:

bysort encounter: gen freq = _N tab encounter if freq > 2

Last edited by Nick Cox; 23 Feb 2015, 14:20.
Comment
Michael Miller

Join Date: Nov 2014

Posts: 9
#8

23 Feb 2015, 15:03

Hello Mr. Cox and Mr. La,

As far as the two sets of code went, both seemed to more-or-less work. Mr. Cox's code worked in every scenario. Mr. La's code worked in every scenario save 3 observations that had weird issues (false positives generated) due to situations like a patient having more than one MRN (and therefore encounters).

The duplicates command is great to know for the future.

I will make sure to enter all sample data in the right "Stata-ready" format next time; thank you, Mr. La.

Thanks for your help,

Michael
Comment
Vincent La

Join Date: Jan 2015

Posts: 33
#9

24 Feb 2015, 10:08

Re: Nick,

Yes, I did miss _N == 2. Sorry about that.

Re: Michael, glad everything worked out!

Vincent
Comment

Michael Miller

Join Date: Nov 2014
Posts: 9

#10

26 Feb 2015, 10:38

Good morning,

I have one further question regarding matching observations between rows in hospital data:

We have a common group identifier, mrn (each patient has one mrn). Now I want to say:

by mrn ( id3 date) : drop the observation if id3==0 & dcdate(of id3==0 observation) is NOT inrange( of minus3 to plus30 dates of an observation (OF THE SAME mrn) for which id3==1)

So, basically, each encounter in which id3==1 marks a special hospitalization. minus3 and plus30 draw a time window of 3 days prior to admission to 30 days after discharge. We need to know which observations (when id3==0) fit in these time windows (defined by minus3 and plus30 when id3==1, within only observations of the same mrn) and which do not.

My real problem, I think, is that I do not understand how to use _n and _N when groups/ positioning aren't standard. For example, I don't know how to adjust when sizes change from group to group (for instance if the first group (mrn 1) has 50 observations and the next (mrn 2) has 40). Or, if a group has three instances of id3==1 and the next group has just 1 instance of id3==1 , etc...

Note: this data has been edited and does not reflect real patient data

code:

Code:

clear
input mrn    date    dcdate    minus3    plus30    id3
1633188    19200    19205    19197    19235    1
1633188    19308    19311    19305    19341    1
1633188    19378    19381    19375    19411    1
1633188    19423    19424    19420    19454    1
1633188    19289    19297    .    .    0
1633188    19434    19435    19431    19465    0
1633188    19323    19323    .    .    0
1633188    19361    19361    .    .    0
1633188    19246    19246    .    .    0
1633188    19253    19256    .    .    0
1633188    19324    19325    19321    19355    0
1633188    19348    19354    .    .    0
1633188    19375    19376    .    .    0
1635367    19390    19393    19387    19423    1
1635367    19432    19434    19429    19464    1
1635367    19466    19470    19463    19500    1
1635367    19522    19522    .    .    0
1635367    19430    19431    .    .    0
1635367    19551    19551    .    .    0
1635367    19604    19604    .    .    0
1635367    19229    19233    .    .    0
1635367    19185    19185    .    .    0
1635367    19200    19200    .    .    0
1635367    19585    19589    .    .    0
1635367    19228    19228    .    .    0
1635367    19633    19633    .    .    0
1635367    19515    19515    .    .    0
1635367    19592    19592    .    .    0
1635367    19257    19257    .    .    0
1635367    19618    19618    .    .    0
1635367    19208    19208    .    .    0
1635367    19508    19508    .    .    0
1635367    19218    19218    .    .    0
1635367    19458    19458    .    .    0
1635367    19253    19253    .    .    0
1635367    19479    19479    .    .    0
1635367    19647    19650    .    .    0
1635367    19640    19640    .    .    0
1635367    19577    19577    .    .    0
1635367    19222    19222    .    .    0
1635367    19578    19578    .    .    0
1635367    19528    19528    .    .    0
1635367    19638    19638    .    .    0
1635367    19600    19600    .    .    0
1635367    19250    19250    .    .    0
1635367    19590    19590    .    .    0
1635367    19561    19561    .    .    0
1635367    19688    19688    .    .    0
1635367    19624    19624    .    .    0
1635367    19204    19204    .    .    0
1635367    19269    19269    .    .    0
1635367    19613    19613    .    .    0
1635367    19542    19542    .    .    0
1635367    19519    19519    .    .    0
1635367    19526    19526    .    .    0
1635367    19556    19556    .    .    0
1635367    19673    19673    .    .    0
1635367    19424    19424    .    .    0
1635367    19564    19564    .    .    0
1635367    19463    19463    .    .    0
1635367    19631    19631    .    .    0
1635367    19417    19417    .    .    0
1635367    19682    19682    .    .    0
1635367    19675    19675    .    .    0
1635367    19500    19500    .    .    0
1635367    19625    19625    .    .    0
1635367    19485    19485    .    .    0
1635367    19192    19192    .    .    0
1635367    19215    19215    .    .    0
1635367    19662    19662    .    .    0
1635367    19501    19505    .    .    0
1635367    19493    19493    .    .    0
1642544    20063    20075    20060    20105    1
1642544    19634    19634    .    .    0
1642544    19837    19837    .    .    0
1642544    19830    19830    .    .    0
1642544    20003    20003    .    .    0
1642544    19855    19855    .    .    0
1642544    20124    20124    .    .    0
1642544    19697    19697    .    .    0
1642544    19998    19998    .    .    0
1642544    19816    19816    .    .    0
1642544    19642    19642    .    .    0
1642544    19828    19828    .    .    0
1642544    19824    19824    .    .    0
1642544    19640    19640    .    .    0
1642544    19813    19813    .    .    0
1642544    19823    19823    .    .    0
1642544    19607    19612    .    .    0
1642544    20105    20110    .    .    0
1642544    20013    20013    .    .    0
1642544    20111    20111    .    .    0
1642544    19830    19830    .    .    0
1642544    19977    19977    .    .    0
1642544    19823    19823    .    .    0
1642544    20100    20103    .    .    0
1642544    19760    19760    .    .    0
1642544    20045    20045    .    .    0
1642544    19824    19824    .    .    0
1642544    19797    19802    .    .    0
1642544    19893    19893    .    .    0
end

That is a sample of what the data might look like

to format:

Code:

destring, replace
format date %td
label variable date "Date of admission"
format dcdate %td
label variable dcdate "Date of discharge"
format minus3 %td
label variable minus3 "Three days less date of admission"
format plus30 %td
label variable plus30 "Thirty days after date of discharge"

Comment

Scott Merryman

Join Date: Mar 2014
Posts: 895

#11

26 Feb 2015, 11:15

I not sure fully understand but are these the results you are looking for:

Code:

     +--------------------------------------------------+
     |     mrn    date   dcdate   id3   minus3   plus30 |
     |--------------------------------------------------|
  1. | 1633188   19434    19435     0    19420    19454 |
  2. | 1633188   19323    19323     0    19305    19341 |
  3. | 1633188   19324    19325     0    19305    19341 |
  4. | 1633188   19375    19376     0    19375    19411 |
     |--------------------------------------------------|
  5. | 1635367   19430    19431     0    19429    19464 |
  6. | 1635367   19458    19458     0    19429    19464 |
  7. | 1635367   19479    19479     0    19463    19500 |
  8. | 1635367   19463    19463     0    19429    19464 |
  9. | 1635367   19417    19417     0    19387    19423 |
 10. | 1635367   19485    19485     0    19463    19500 |
 11. | 1635367   19493    19493     0    19463    19500 |
     |--------------------------------------------------|
 12. | 1642544   20100    20103     0    20060    20105 |
     +--------------------------------------------------+

The code used:

Code:

clear
qui {
input mrn    date    dcdate    minus3    plus30    id3
1633188    19200    19205    19197    19235    1
1633188    19308    19311    19305    19341    1
1633188    19378    19381    19375    19411    1
1633188    19423    19424    19420    19454    1
1633188    19289    19297    .    .    0
1633188    19434    19435    19431    19465    0
1633188    19323    19323    .    .    0
1633188    19361    19361    .    .    0
1633188    19246    19246    .    .    0
1633188    19253    19256    .    .    0
1633188    19324    19325    19321    19355    0
1633188    19348    19354    .    .    0
1633188    19375    19376    .    .    0
1635367    19390    19393    19387    19423    1
1635367    19432    19434    19429    19464    1
1635367    19466    19470    19463    19500    1
1635367    19522    19522    .    .    0
1635367    19430    19431    .    .    0
1635367    19551    19551    .    .    0
1635367    19604    19604    .    .    0
1635367    19229    19233    .    .    0
1635367    19185    19185    .    .    0
1635367    19200    19200    .    .    0
1635367    19585    19589    .    .    0
1635367    19228    19228    .    .    0
1635367    19633    19633    .    .    0
1635367    19515    19515    .    .    0
1635367    19592    19592    .    .    0
1635367    19257    19257    .    .    0
1635367    19618    19618    .    .    0
1635367    19208    19208    .    .    0
1635367    19508    19508    .    .    0
1635367    19218    19218    .    .    0
1635367    19458    19458    .    .    0
1635367    19253    19253    .    .    0
1635367    19479    19479    .    .    0
1635367    19647    19650    .    .    0
1635367    19640    19640    .    .    0
1635367    19577    19577    .    .    0
1635367    19222    19222    .    .    0
1635367    19578    19578    .    .    0
1635367    19528    19528    .    .    0
1635367    19638    19638    .    .    0
1635367    19600    19600    .    .    0
1635367    19250    19250    .    .    0
1635367    19590    19590    .    .    0
1635367    19561    19561    .    .    0
1635367    19688    19688    .    .    0
1635367    19624    19624    .    .    0
1635367    19204    19204    .    .    0
1635367    19269    19269    .    .    0
1635367    19613    19613    .    .    0
1635367    19542    19542    .    .    0
1635367    19519    19519    .    .    0
1635367    19526    19526    .    .    0
1635367    19556    19556    .    .    0
1635367    19673    19673    .    .    0
1635367    19424    19424    .    .    0
1635367    19564    19564    .    .    0
1635367    19463    19463    .    .    0
1635367    19631    19631    .    .    0
1635367    19417    19417    .    .    0
1635367    19682    19682    .    .    0
1635367    19675    19675    .    .    0
1635367    19500    19500    .    .    0
1635367    19625    19625    .    .    0
1635367    19485    19485    .    .    0
1635367    19192    19192    .    .    0
1635367    19215    19215    .    .    0
1635367    19662    19662    .    .    0
1635367    19501    19505    .    .    0
1635367    19493    19493    .    .    0
1642544    20063    20075    20060    20105    1
1642544    19634    19634    .    .    0
1642544    19837    19837    .    .    0
1642544    19830    19830    .    .    0
1642544    20003    20003    .    .    0
1642544    19855    19855    .    .    0
1642544    20124    20124    .    .    0
1642544    19697    19697    .    .    0
1642544    19998    19998    .    .    0
1642544    19816    19816    .    .    0
1642544    19642    19642    .    .    0
1642544    19828    19828    .    .    0
1642544    19824    19824    .    .    0
1642544    19640    19640    .    .    0
1642544    19813    19813    .    .    0
1642544    19823    19823    .    .    0
1642544    19607    19612    .    .    0
1642544    20105    20110    .    .    0
1642544    20013    20013    .    .    0
1642544    20111    20111    .    .    0
1642544    19830    19830    .    .    0
1642544    19977    19977    .    .    0
1642544    19823    19823    .    .    0
1642544    20100    20103    .    .    0
1642544    19760    19760    .    .    0
1642544    20045    20045    .    .    0
1642544    19824    19824    .    .    0
1642544    19797    19802    .    .    0
1642544    19893    19893    .    .    0
end

tempfile tmp1
preserve
keep if id3 ==1
save `tmp1', replace
restore
drop minus plus
keep if id3 == 0
joinby mrn using `tmp1'
gen d1 = dc- min
gen d2 =  plu-dc
keep if d1>0 & d2 >0
drop d1 d2
}
l ,sepby(mrn)

Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

#12

26 Feb 2015, 12:52

If I understand correctly, this is another variation of the overlapping hospital stay problem, which I discussed recently here. Nick has a good article on how to approach this type of problem.

SJ-13-1 dm0068 . . . . . Stata tip 114: Expand paired dates to pairs of dates
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
Q1/13 SJ 13(1):217--219 (no commands)
tip on using expand to deal with paired dates

http://www.stata-journal.com/article...article=dm0068

In this case, the in and out dates are adjusted for the special hospitalization observations to datein - 3 and dateout + 30. Here's an implementation using the data provided

Code:

clear
input mrn    datein    dateout    id3
1633188    19200    19205    1
1633188    19308    19311    1
1633188    19378    19381    1
1633188    19423    19424    1
1633188    19289    19297    0
1633188    19434    19435    0
1633188    19323    19323    0
1633188    19361    19361    0
1633188    19246    19246    0
1633188    19253    19256    0
1633188    19324    19325    0
1633188    19348    19354    0
1633188    19375    19376    0
1635367    19390    19393    1
1635367    19432    19434    1
1635367    19466    19470    1
1635367    19522    19522    0
1635367    19430    19431    0
1635367    19551    19551    0
1635367    19604    19604    0
1635367    19229    19233    0
1635367    19185    19185    0
1635367    19200    19200    0
1635367    19585    19589    0
1635367    19228    19228    0
1635367    19633    19633    0
1635367    19515    19515    0
1635367    19592    19592    0
1635367    19257    19257    0
1635367    19618    19618    0
1635367    19208    19208    0
1635367    19508    19508    0
1635367    19218    19218    0
1635367    19458    19458    0
1635367    19253    19253    0
1635367    19479    19479    0
1635367    19647    19650    0
1635367    19640    19640    0
1635367    19577    19577    0
1635367    19222    19222    0
1635367    19578    19578    0
1635367    19528    19528    0
1635367    19638    19638    0
1635367    19600    19600    0
1635367    19250    19250    0
1635367    19590    19590    0
1635367    19561    19561    0
1635367    19688    19688    0
1635367    19624    19624    0
1635367    19204    19204    0
1635367    19269    19269    0
1635367    19613    19613    0
1635367    19542    19542    0
1635367    19519    19519    0
1635367    19526    19526    0
1635367    19556    19556    0
1635367    19673    19673    0
1635367    19424    19424    0
1635367    19564    19564    0
1635367    19463    19463    0
1635367    19631    19631    0
1635367    19417    19417    0
1635367    19682    19682    0
1635367    19675    19675    0
1635367    19500    19500    0
1635367    19625    19625    0
1635367    19485    19485    0
1635367    19192    19192    0
1635367    19215    19215    0
1635367    19662    19662    0
1635367    19501    19505    0
1635367    19493    19493    0
1642544    20063    20075    1
1642544    19634    19634    0
1642544    19837    19837    0
1642544    19830    19830    0
1642544    20003    20003    0
1642544    19855    19855    0
1642544    20124    20124    0
1642544    19697    19697    0
1642544    19998    19998    0
1642544    19816    19816    0
1642544    19642    19642    0
1642544    19828    19828    0
1642544    19824    19824    0
1642544    19640    19640    0
1642544    19813    19813    0
1642544    19823    19823    0
1642544    19607    19612    0
1642544    20105    20110    0
1642544    20013    20013    0
1642544    20111    20111    0
1642544    19830    19830    0
1642544    19977    19977    0
1642544    19823    19823    0
1642544    20100    20103    0
1642544    19760    19760    0
1642544    20045    20045    0
1642544    19824    19824    0
1642544    19797    19802    0
1642544    19893    19893    0
end

* get data in shape
format %td  date*
gen obs = _n

* reshape to long; first obs is start of window, second is end of window
expand 2
bysort obs: gen thedate = cond(id3==1, datein - 3, datein) if _n == 1
bysort obs: replace thedate = cond(id3==1, dateout + 30, dateout) if _n == 2
format %td thedate

* define and inout increment for events windows, but ignore id3 == 0 windows
bysort obs (thedate): gen inout = cond(_n==1,1,-1)
replace inout = 0 if id3 == 0

* order events by date; for multiple events on the same day, put the start
* of the window first and the end of the window last
gsort mrn thedate -inout 
by mrn : gen inoutsum = sum(inout)
by mrn: gen window = sum(_n == 1 | inoutsum[_n-1] == 0)

* Identify windows of special hospitalization
gsort mrn window thedate -inout 
by mrn window: gen special = id3[1]
by mrn window: gen specialobs = obs[1] if special[1]

* restore data to original format
bysort mrn obs (thedate): keep if _n == 1
drop inout* window thedate

* list special hospitalization
gen minus3  = datein - 3
gen plus30 = dateout + 30
format %td minus3 plus30
list if id3 == 1, noobs sepby(mrn)

* list not special hospitalization that fall within a special
list mrn datein dateout specialobs if id3 == 0 & special, noobs sepby(mrn)

* list not special hospitalization that do not fall within a special
list mrn datein dateout specialobs if id3 == 0 & !special, noobs sepby(mrn)

Announcement