Merging databases

Jeanine OConnell

Join Date: May 2020

Posts: 3
#1

Merging databases

18 Aug 2020, 23:18

Hi there, I really appreciate any help you can offer me.

Basically I have a relational database issue and need to merge a few csv files together in Stata but I can't seem to figure out how.

VISIT_DATA
Visit_ID
Patient_ID

SERVICE_DATA
Visit_ID
Visit_Service_ID

DISPENSATION_DATA
Visit_ID
Visit_DIspensation_ID

PATIENT_DATA
Patient_ID
Visit_ID

I started by merging PATIENT_DATA to VISIT_DATA

merge 1:m PATIENT_ID using VISIT_DATA

then PATIENT_DATA&VISIT_DATA to DISPENSATION DATA

merge 1:m VISIT_ID using DISPENSATION_DATA

So I have a database that has PATIENT DATA, VISIT_DATA, and DISPENSATION_DATA linked.

I need to also link SERVICE_DATA.

However, since for every patient there are many visits, and many services, and many dispensations - the 1:m data linkage won't work.

Any suggestions?

Thanks for any guidance you can offer.

Jeanine
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30084
#2

19 Aug 2020, 00:11

There are three possible situations here:

1. All that matters is the Visit_ID linkage and every dispensation should be paired with every service of the same visit. In that case -joinby- will do this. Or alternatively it might make sense to -reshape- the data you've merged so far to wide layout so you just have one observation per visit, with all of the dispensations as separate variables, and then you can link that 1:m with the service data using the Visit_ID link variable. (These two approaches actually give you the same logical result, but -joinby- leaves you with a fully long data set, whereas the -reshape- approach leaves you with a half-long half-wide layout. The choice between these approaches really depends on which layout will be more convenient for whatever else you plan to do with the data.)

2. There really is only a single combination of patient ID, visit ID and visit_dispensation_ID that goes with any particular service. Then there must be some other variable in the data that will link these. Think about what information you would use to do this if you were doing it by hand. You would have to look at something in those data sets to pick out which service_ID to link with any given Visit_Dispensation_ID. Whatever you would look at, that variable or those variables would then have to be included in the -merge- key alongside the VISIT_ID. If it isn't clear to you what that (those) variable(s) might be, you might ask a colleague who has worked with this data previously to point it (them) out to you.

3. As in 2, except there really are no other variables that pick out what goes with what. In that case the problem cannot be solved with the data at hand.
Comment
Jeanine OConnell

Join Date: May 2020

Posts: 3
#3

23 Aug 2020, 22:45

This is great Clyde. Thank you.
I am going with option #1. But I do have a couple more questions.
When I tried to use joinby with my 3 merged files
PATIENT DATA, VISIT_DATA, and DISPENSATION_DATA
then used joinby to add in the SERVICE_DATA. All of the visits that didn't have a SERVICE ID OR DISPENSATION ID were dropped. Maybe there is a work around for this but I can't find it.

I need to know if a visit was cancelled or there was a no show. So I need to capture those visits with a VISIT_ID but no SERVICE_ID or DISPENSATION ID.

That leaves me with the reshape solution. However, when I try to use reshape on the merged database (with many patients, many visits, and many dispensations) I get the following code and message.

reshape wide DISPENSATION1_ID , i(PATIENT_ID) j(VISIT_ID)
variable VISIT_ID takes on too many values

Any thoughts? Thanks for your help!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30084
#4

24 Aug 2020, 12:02

When I tried to use joinby with my 3 merged files
PATIENT DATA, VISIT_DATA, and DISPENSATION_DATA
then used joinby to add in the SERVICE_DATA. All of the visits that didn't have a SERVICE ID OR DISPENSATION ID were dropped. Maybe there is a work around for this but I can't find it.

There is a workaround. Read -help joinby- with special attention to the -unmatched()- option.

However, when I try to use reshape on the merged database (with many patients, many visits, and many dispensations) I get the following code and message.

The message means exactly what it says. You apparently have a massive data set at this point. So if you are going to proceed with the -reshape-, you will have to split the data set into two or more pieces, each containing all the observations for some subset of the VISIT_IDs. Reshape those each separately, and then append the resulting data sets together. I cannot tell you how many pieces you will have to break this into. So there may be some trial and error involved before you get it so each piece has sufficiently few VISIT_IDs for -reshape- to be able to digest it.
Comment

Announcement

Comment

Comment

Comment